Stat 295, 3/5/09

Here is code for one of the problems in Problem Set #6, which we briefly discussed.

We discussed the hierarchical model in the homework, since we need the full conditionals for a problem to be posed for the next homework assignment. It is given by:

\begin{array}{rcl}y \mid \lambda, \beta &\sim &\mbox{Poi}( \lambda )\\ \lambda\mid\beta&\sim& \mbox{gamma}(2,\beta)\\ \beta&\sim &\mbox{gamma}(a,b)\end{array}

Need to find the full conditionals:

\lambda|\beta,y
\beta|\lambda,y

To do this, we want \lambda,\beta|y We see that

\begin{array}{rclr}g(\lambda|\beta,y)&\propto &f(y,\lambda,\beta)\\ &=&f_1(y|\lambda,\beta)f_2(\lambda|\beta)f_3(\beta)\\ &\propto&\lambda^ye^{-\lambda}\times \beta^2\lambda^{2-1}e^{-\beta\lambda} \times \beta^{a-1}e^{-b \beta} & (*)\\ &=&\lambda^{y+1}e^{-\lambda(\beta+1)}\beta^{a+2-1}e^{-b \beta}\\ &\propto&\lambda^{y+1}e^{-\lambda(\beta+1)}\end{array}

So, looking only at terms involving \lambda, (see * above), we can read off that \lambda|\beta,y \sim \mbox{gamma}(y+2,\beta+1)

Similarly, from (*), looking at the terms involving \beta, we see that \beta|\lambda,y \sim  \mbox{gamma}(a+2,\lambda+b)

The secret is that we used conjugate priors…building a hierarchical model using conjugate priors gave us the full conditionals in a form that we could sample from effectively.

So…the general procedure for getting the full conditionals is first to write down the joint distribution, ignoring constants. Look at this, and pick out those terms that depend on the parameter being sampled on. Ignore terms that don’t depend on that parameter. See if you can recognize this as the kernel of a standard distribution from which you can sample.

MCMC allows us to do this even if the full conditionals cannot easily be sampled from.

Jeff then provided some useful definitions of various distributions (e.g., gamma’s, \chi^2, inverse gamma).

Def: X\sim \mbox{gamma}(\alpha,\beta) means
f_x(x)=\beta^\alpha/\Gamma(\alpha)x^{\alpha-1}e^{-\beta x}, x>0

Def: y\sim \chi^2_N means y\sim \mbox{gamma}(N/2,1/2)

Def: V \sim  \mbox{invgamma}(\alpha,\beta) means 1/V\sim \mbox{gamma}(\alpha,\beta)

Fact: If V \sim  \mbox{invgamma}(\alpha,\beta) then
f_V(V)=\beta^\alpha/\Gamma(\alpha)(1/V)^{\alpha+1}e^{-\beta/V} from which we get
E[V]=\beta/(\alpha+1) for \alpha>1, mode is \beta/(\alpha+1)

Fact: If w\sim \mbox{gamma}(N/2,\beta/2) then w/\beta\sim \chi^2_N

Fact: If V\sim \mbox{invgamma}(N/2,\beta/2) then 1/(\beta V)\sim \chi^2_N

Fact: inverse gamma is the conjugate prior for normal models, i.e., if y_i\sim N(0,\sigma^2) (i=1,...,n) then if the prior on \sigma^2 is \mbox{invgamma}(\alpha,\beta), we have
\sigma^2|\vec{y}\sim \mbox{invgamma}(\alpha+n/2,\beta+\sum_{i=1}^N{y_i^2}/2)

Another example of Gibbs sampling: Normal model with uninformative priors

Suppose X_i|\mu,\sigma^2 \sim  N(\mu,\sigma^2), i=1,...,N and g(\mu,\sigma^2) \propto 1/\sigma^2 (looks like \mbox{invgamma}(0,0) but is improper. )

Recall that \sigma^2|\vec{X} \sim  \mbox{invgamma}((N-1)/2, S/2), where
S=\sum_1^N(x_i-\bar{x})^2

So \mu|\sigma^2,\vec{x} \sim  N(\bar{x}, \sigma/\sqrt{N})

We have the full conditional for \mu. Need the full conditional for \sigma^2. Write down likelihood*prior, see if we recognize it.

Joint is f(\vec{x},\mu,\sigma^2)\propto(1/\sigma^2)^{N/2}exp(-(1/2 \sigma^2)\sum{(x_i-\mu)^2}) 1/\sigma^2 By inspection this is inverse gamma with parameters N/2, S/2, where S=\sum{(x_i-\mu)^2}.

So \sigma^2|\vec{x},\mu \sim  \mbox{invgamma}(N/2,S/2)

Note that we went from N-1 dof in the marginal-conditional sampling earlier to N dof in Gibbs sampling. This is because we already know \mu when sampling on \sigma^2|\vec{x},\mu but we did not know it when doing the marginal-conditional sampling. In the previous case the difference involves \bar{y}, but here it involves \mu.

We looked at a program to do this. Looked at some nice plots.

Started with \sigma^2=1000 (way off). What happens? The sampler very quickly moved to the region of interest.

But we wouldn’t want to include the first values in the posterior sample…they are anomalies. So often we run for a long time and discard some portion of the first samples. This is the “burn-in”.

Showed trace plot for \mu and \sigma for this example (starting way off). The ones he showed illustrated good mixing.

Convergence issues are the “Achilles’ heel” of MCMC. There are no accepted standards for convergence. You have to be very careful, because you may not have sampled the whole sample space. Sampling from various starting points. Winbugs code to start from multiple places to see if you get the same results. What’s really problematic are multimodal posterior distributions. There are methods to handle these, but they are not trivial to implement.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: