## Stat 295, 3/5/09

Here is code for one of the problems in Problem Set #6, which we briefly discussed.

We discussed the hierarchical model in the homework, since we need the full conditionals for a problem to be posed for the next homework assignment. It is given by:

$\begin{array}{rcl}y \mid \lambda, \beta &\sim &\mbox{Poi}( \lambda )\\ \lambda\mid\beta&\sim& \mbox{gamma}(2,\beta)\\ \beta&\sim &\mbox{gamma}(a,b)\end{array}$

Need to find the full conditionals:

$\lambda|\beta,y$
$\beta|\lambda,y$

To do this, we want $\lambda,\beta|y$ We see that

$\begin{array}{rclr}g(\lambda|\beta,y)&\propto &f(y,\lambda,\beta)\\ &=&f_1(y|\lambda,\beta)f_2(\lambda|\beta)f_3(\beta)\\ &\propto&\lambda^ye^{-\lambda}\times \beta^2\lambda^{2-1}e^{-\beta\lambda} \times \beta^{a-1}e^{-b \beta} & (*)\\ &=&\lambda^{y+1}e^{-\lambda(\beta+1)}\beta^{a+2-1}e^{-b \beta}\\ &\propto&\lambda^{y+1}e^{-\lambda(\beta+1)}\end{array}$

So, looking only at terms involving $\lambda$, (see * above), we can read off that $\lambda|\beta,y \sim \mbox{gamma}(y+2,\beta+1)$

Similarly, from (*), looking at the terms involving $\beta$, we see that $\beta|\lambda,y \sim \mbox{gamma}(a+2,\lambda+b)$

The secret is that we used conjugate priors…building a hierarchical model using conjugate priors gave us the full conditionals in a form that we could sample from effectively.

So…the general procedure for getting the full conditionals is first to write down the joint distribution, ignoring constants. Look at this, and pick out those terms that depend on the parameter being sampled on. Ignore terms that don’t depend on that parameter. See if you can recognize this as the kernel of a standard distribution from which you can sample.

MCMC allows us to do this even if the full conditionals cannot easily be sampled from.

Jeff then provided some useful definitions of various distributions (e.g., gamma’s, $\chi^2$, inverse gamma).

Def: $X\sim \mbox{gamma}(\alpha,\beta)$ means
$f_x(x)=\beta^\alpha/\Gamma(\alpha)x^{\alpha-1}e^{-\beta x}, x>0$

Def: $y\sim \chi^2_N$ means $y\sim \mbox{gamma}(N/2,1/2)$

Def: $V \sim \mbox{invgamma}(\alpha,\beta)$ means $1/V\sim \mbox{gamma}(\alpha,\beta)$

Fact: If $V \sim \mbox{invgamma}(\alpha,\beta)$ then
$f_V(V)=\beta^\alpha/\Gamma(\alpha)(1/V)^{\alpha+1}e^{-\beta/V}$ from which we get
$E[V]=\beta/(\alpha+1)$ for $\alpha>1$, mode is $\beta/(\alpha+1)$

Fact: If $w\sim \mbox{gamma}(N/2,\beta/2)$ then $w/\beta\sim \chi^2_N$

Fact: If $V\sim \mbox{invgamma}(N/2,\beta/2)$ then $1/(\beta V)\sim \chi^2_N$

Fact: inverse gamma is the conjugate prior for normal models, i.e., if $y_i\sim N(0,\sigma^2) (i=1,...,n)$ then if the prior on $\sigma^2$ is $\mbox{invgamma}(\alpha,\beta)$, we have
$\sigma^2|\vec{y}\sim \mbox{invgamma}(\alpha+n/2,\beta+\sum_{i=1}^N{y_i^2}/2)$

Another example of Gibbs sampling: Normal model with uninformative priors

Suppose $X_i|\mu,\sigma^2 \sim N(\mu,\sigma^2), i=1,...,N$ and $g(\mu,\sigma^2) \propto 1/\sigma^2$ (looks like $\mbox{invgamma}(0,0)$ but is improper. )

Recall that $\sigma^2|\vec{X} \sim \mbox{invgamma}((N-1)/2, S/2)$, where
$S=\sum_1^N(x_i-\bar{x})^2$

So $\mu|\sigma^2,\vec{x} \sim N(\bar{x}, \sigma/\sqrt{N})$

We have the full conditional for $\mu$. Need the full conditional for $\sigma^2$. Write down likelihood*prior, see if we recognize it.

Joint is $f(\vec{x},\mu,\sigma^2)\propto(1/\sigma^2)^{N/2}exp(-(1/2 \sigma^2)\sum{(x_i-\mu)^2}) 1/\sigma^2$ By inspection this is inverse gamma with parameters N/2, S/2, where $S=\sum{(x_i-\mu)^2}$.

So $\sigma^2|\vec{x},\mu \sim \mbox{invgamma}(N/2,S/2)$

Note that we went from N-1 dof in the marginal-conditional sampling earlier to N dof in Gibbs sampling. This is because we already know $\mu$ when sampling on $\sigma^2|\vec{x},\mu$ but we did not know it when doing the marginal-conditional sampling. In the previous case the difference involves $\bar{y}$, but here it involves $\mu$.

We looked at a program to do this. Looked at some nice plots.

Started with $\sigma^2=1000$ (way off). What happens? The sampler very quickly moved to the region of interest.

But we wouldn’t want to include the first values in the posterior sample…they are anomalies. So often we run for a long time and discard some portion of the first samples. This is the “burn-in”.

Showed trace plot for $\mu$ and $\sigma$ for this example (starting way off). The ones he showed illustrated good mixing.

Convergence issues are the “Achilles’ heel” of MCMC. There are no accepted standards for convergence. You have to be very careful, because you may not have sampled the whole sample space. Sampling from various starting points. Winbugs code to start from multiple places to see if you get the same results. What’s really problematic are multimodal posterior distributions. There are methods to handle these, but they are not trivial to implement.