Stat 295, 3/24/09

We started with a few words from Bill, who remarked that the idea of Gibbs sampling includes the idea of sampling in turn from various conditional distributions, so that each (vector) parameter \theta_j sampled conditional on the current values of all the other parameters \theta_{-j}, where the dash indicates that \theta_j is omitted. When we did this, we assumed that we could actually sample easily and exactly from the conditional distributions. However, this is usually not possible, and the way around it is to use Metropolis-Hastings sampling on those parameters. This is called “Metropolis-within-Gibbs.”

I remarked that you have to be careful when using Metropolis-Hastings on a parameter (like \sigma^2) that is strictly positive. One way to do this is just to define the posterior conditional distribution on this parameter as zero for \sigma^2\leq 0 so that any attempt to propose in that region will be rejected; a simpler and probably better method is to log-transform the variable: \theta=\log(\sigma^2), for example. This has many advantages. One is that the Jeffreys prior on \sigma^2 is singular at 0, whereas the corresponding prior on \log(\sigma^2) is flat.

Jeff took over to discuss linear models. He remarked that linear models like regression, multiple regression, ANOVA,… are still a huge part of the statistican’s toolkit, even though modern computational power enables us to solve models that previously would have been intractable.

Jeff started with simplest case: Normal, one variable. The problem is estimation from a normal, mean \mu, variance \sigma^2 population. The data are n iid random variables y_1,y_2,...y_n, each N(\mu,\sigma^2). The objective is to estimate (\mu,\sigma).

The likelihood is

f_n(y|\mu,\sigma^2)=(2\pi\sigma^2)^{-n/2} \exp[-S_{yy}/2\sigma^2]exp[-n(\mu-\bar{y})/2\sigma^2],

where \bar{y} is the sample mean and S_{yy}=\sum{(y_j-\bar{y})^2} (you’ve seen this notation before).

We have several cases:

1. \sigma^2 known (this never happens, but it is pedagogically interesting).

We have two subcases:

1a) flat prior on \mu
1b) normal prior on \mu

(1a) is the limiting case of (1b) for when the variance \rightarrow\infty. So the only case Jeff discussed under (1) was (1b).

2. \mu, \sigma^2 both unknown.

Again, we have two cases:

2a. Flat prior on \mu, Jeffreys prior g(\sigma^2)\propto 1/\sigma^2 on \sigma^2.
2b. Conjugate priors, i.e.,

\mu \mid \sigma^2 \sim N(\mu_0,\sqrt{\sigma^2/c_0}),
\sigma^2 \sim\mbox{inv-gamma} (\alpha_0,\beta_0)

(2a) is the limiting case of (2b) when \alpha_0=\beta_0=c_0=0. So Jeff only discussed (2b).

So we really only have to look at two cases:

Jeff then stated, but did not prove, the preliminary result that you are to prove and turn in on 3/26/09. We will find this result very useful.

Case 1b:

g(\mu)=(2 \pi \sigma_0^2)^{-1/2}\exp[-\frac{1}{2 \sigma_0^2} (\mu-\mu_0)^2]

\begin{array}{rclr}g(\mu \mid \vec{y}) & \propto & f_n(\vec{y}\mid\mu)g(\mu) \\ & = & (2 \pi \sigma^2)^{n/2}\exp[-\frac{S_{yy}}{2 \sigma^2} \exp(-\frac{n}{2 \sigma^2} (\mu-\bar{y})^2-\frac{1}{2 \sigma_0^2}(\mu-\mu_0)^2] (2 \pi \sigma_0^2)^{-1/2}\end{array}

Now expand and look at the useful theorem, just read off the answer, which is:

\mu\mid\bar{y} \sim N(w \mu_0 +(1-w)\bar{y},\sqrt{(1-w)\sigma^2/n})

where w=\frac{\sigma^2/n}{\sigma^2/n + \sigma_0^2}

As n\rightarrow\infty, w\rightarrow 0, \mu\rightarrow\bar{y}

as \sigma_0\rightarrow\infty, w\rightarrow 0, \mu\rightarrow\bar{y} and the standard deviation \rightarrow \sqrt{\sigma^2/n}

Note that in this limit we recover the frequentist result (known \sigma, flat prior on \mu).

For large n, \mu\mid \vec{y} \sim N(\bar{y},\sigma/\sqrt{n})

Case 2b: \mu,\sigma^2 unknown, \mu \mid \sigma^2 \sim N(\mu_0,\sqrt{\sigma^2/c_0}),

\sigma^2 \sim\mbox{inv-gamma} (\alpha_0,\beta_0)

The calculation for \mu \mid \vec{y},\sigma^2 is very similar to case (1a). See the notes. The result is

\mu\mid\sigma^2,\vec{y} \sim N(w \mu_0 + (1-w)\bar{y},\sqrt{(1-w)\sigma^2/n}),

where w=\frac{\sigma^2/n}{\sigma^2/n + \sigma_0^2}, \sigma_0^2=\sigma^/c_0

\sigma^2 \mid \vec{y} \sim \mbox{inv-gamma} (\alpha_0+(n-1)/2, \beta_0+S_{yy}/2)

Priors and posteriors are in same family, they are conjugate.

Specific example: two predictors.

Linear regression. We’ll start with a specific example and perhaps do some general theory later. The example comes from “Bayesian Modeling Using WinBUGS” by Ntzoufras.

Example: We are interested in understanding the relation between total service time (minutes) of restocking vending machines and two predictors, x_1 is the number of cases stocked, x_2 is the distance in feet from the truck to the machine.

Model: y_i=\beta_0+\beta_1 x_{1i}+\beta_2 x_{2i} + \epsilon_i for i=1,…,25

Assume that the errors are N(0,\sigma^2) and independent (\perp). Let \tau=1/\sigma^2\equiv precision.

We’ll use INDEPENDENT priors on (\beta_0, \beta_1, \beta_2) and \tau, specifically \beta_i \sim N(0,\sqrt{10^4}) and \tau\simgamma(0.01,0.01)

We learned that Control-R or right-click selected region to run it in R (at least for Windows).

Note that there is a WinBUGS listserv discussion list.

A package called “arm” allows you to call WinBUGS via R. Will also need WinBUGS itself. A patch must be installed, and you must run their immortality key.

To be continued…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: