## Stat 295 2/19/09

The assignment for 2/26 will be found here.

We had been talking about the Laplace approximation, and gave an example in terms of the beta-binomial problem. Jeff left out the subscript j so you need to go back to your notes and fix this.

yjj~bin(njj)

θj~beta(α,β)

From this model we derived the marginal density for yj|α,β. We work with these marginal densities. We then reparameterized twice to get something that looked reasonable. Chose a prior (see notes last time), plotted the contours.

Can we get an approximation to this? Here’s where the Laplace approximation comes in.

p(θ|y) ∝ f(y|θ)g(θ)= exp(h(θ)) ≈ (expand h(θ) around θ0 as in the previous blog entry).

i.e., the posterior ≈ N(θ0,-(h”(θ0))-1)

We wrote the explicit form of the Hessian where θ is a p-vector. First write the first derivative (gradient) which is a vector. So the second derivative consists of partials of the first partials, which makes it a matrix (the Hessian). It is a p x p matrix. Explicitly, here it is:

$\frac{\partial^2 h}{\partial \theta^2} = \left( \begin{array}{cccc} \frac{\partial^2 h}{\partial \theta_1^2} & \frac{\partial^2 h}{\partial \theta_1\partial \theta_2} & \cdots & \frac{\partial^2 h}{\partial \theta_1\partial \theta_p} \\ \frac{\partial^2 h}{\partial \theta_2\partial \theta_1} & \frac{\partial^2 h}{\partial \theta_2^2} & \cdots & \frac{\partial^2 h}{\partial \theta_2\partial \theta_p} \\ \vdots & \vdots & & \vdots \\ \frac{\partial^2 h}{\partial \theta_p\partial \theta_1} & \frac{\partial^2 h}{\partial \theta_p\partial \theta_2} & \cdots & \frac{\partial^2 h}{\partial \theta_p^2} \end{array} \right)$

In class, we noted that the Hessian is a symmetric matrix (assuming that the function h is not pathological). It was not noted, but should have been, that the matrix is negative definite, that is, if x is any (column) vector, then xTh”(θ)x≤0, with equality only if x=0. This is a consequence of the fact that the matrix is evaluated at the maximum of h. Just as the second derivative of a function of one variable is strictly less than zero at a maximum, the Hessian, analogously, is negative definite.

The R function ‘deriv’ will take analytical derivatives and give you R code to compute it.

LearnBayes has a function Laplace that computes the posterior mode and the Hessian and spits out the approximation. It uses an R function ‘optim’ that does the dirty work.

We talked about the Newton-Raphson method for finding a root. Need second derivatives to do this. Converges quadratically (when it converges…)

Albert has redefined the definition of his ‘laplace’ function. I will post the code when Jeff sends it to me. We ran the code and got the variance-covariance matrix. Note, always check that the N-R method converged. The Laplace approximation just uses this output to give a contour plot, using Albert’s function is ‘lbinorm’. Did not look much like the actual contour plot for the real function.

Note that if Laplace is a good approximation, then a 95% credible interval is given by going out 1.96 standard deviations in each direction.

Move on to rejection sampling. Our problem was not well-approximated by Laplace, so we don’t want to use it for inference. A better terminology would be “acceptance-rejection” sampling.

It is a clever way to generate an independent sample from the posterior.

Objective is to generate a random sample from the density g(θ|x)=f(x|θ)g(θ)/m(x) where m(x)=∫f(x|θ)g(θ)dθ

Algorithm: Let p(θ) be a density that we can easily sample from (e.g., a normal or a beta or a gamma) and such that there exists a constant c satisfying g(θ|x)≤cp(θ) for all θ.

p(θ) is called a proposal or enveloping function.

To obtain a sample from the posterior distribution, do the following:

1) generate θ* from the proposal distribution and u~U(0,1).

2) Check if u≤f(x|θ*)g(θ*)/cp(θ*). If it is, accept θ=θ* as a draw from g(θ|x). If not, return to step 1.

3. Repeat steps 1-2 to obtain a sample of the desired size.

Remarks:

1) Finding a suitable p(θ) is the difficult part. It must dominate the target distribution. Also, you want the proposal to approximate the target well, otherwise you will get many rejections and the program will run quite slowly.

2) It can be shown that the acceptance rate is m/c where m=∫f(x|θ)g(θ)dθ. Since the acceptance rate goes as 1/c, if the match is bad you may need to pick a large c and hence get a low acceptance rate.

3) An advantage is that we don’t have to worry about mixing or convergence.

4) Often a fat-tailed distribution like a t is a better choice than, say, a normal…you need to have the proposal dominate out in the tails. This is what we’ll do with the beta-binomial problem.

5) (Not mentioned in class): We don’t need to know the normalizing constant m(x) for this scheme to work!!!

Example: “betabinomial.r” which will appear on the web page. Uses a t with 4 dof as the proposal distribution, with same mean and scale matrix from the Laplace approximation.

We still need to find the bounding constant c.

Procedure to find c: Find d such that log[f(x|θ)g(θ)]-log(p(θ)≤d for all θ:. Then c=exp(d). This procedure is implemented in the LearnBayes function ‘betabinT’ with the function ‘laplace’.

We ran the code to get the appropriate bounding constant. Then we obtained a sample by rejection sampling. We showed that it looks like the contours, and computed means, showed the marginal distribution, quantiles, etc.