Stat 295, 3/31/09

One-way ANOVA:

From last time, our model was

\begin{array}{rclr} y_{ij} & = & \alpha_i+\epsilon_{ij} \\ \alpha_i & \sim & N(\mu_\alpha,\sigma_\alpha) \\ \epsilon_{ij} & \sim & N(0,\sigma) \end{array}

This is a random effects model, and i=1,...,4, j=1,...,6.

Without the second line it’s just a fixed-effects ANOVA model.

Frequentists motivate the random effects model as if the four \alpha\mbox{'s} are a sample from a larger group. We don’t care about the identity of the individual groups. But in Bayesian approach, the random effects part is thought of as a prior on the \alpha\mbox{'s}. How to evaluate \mu_\alpha, \sigma_\alpha? We might be able to guess \mu_\alpha from knowledge of the average time it takes for a child to walk (around 11 months), but what about \sigma_\alpha? Can we fix a value? Or should we put a prior on them? \sigma is the within-group variation, and \sigma_\alpha is the between-group variation. \mu_\alpha is the mean time to walking over all groups. So, we’ll augment the model by putting priors on the other variables, e.g.

\begin{array}{rclr} \frac{1}{\sigma^2} & \sim & \mbox{gamma}(0.001,0.001) \\ \mu_\alpha & \sim & N(0,10) \\\frac{1}{\sigma_\alpha^2} & \sim & \mbox{gamma}(0.001,0.001) \end{array}

The prior on \mu_\alpha is rather peculiar, as it allows for the time to first walking to take place before birth! Maybe a gamma prior would be more appropriate. We investigated this.

If doing frequentist version of the model, how do you estimate \alpha_i? It can be shown that for this model the posterior means are essentially \hat{\alpha}_i \approx w \bar{y}_i + (1-w) \bar{y}_{..} where \bar{y}_i = \mbox{group mean}, \bar{y}_{..} = \mbox{overall mean} and w = \frac{\frac{1}{\sigma^2}}{\frac{1}{\sigma^2} + \frac{1}{\sigma_\alpha^2 n_i}}. w tells us how much we are pooling the data. In fixed effects, we treat all groups separately, but in random effects or the Bayesian method we are letting the overall population affect the individual group estimates. This produces the “shrinkage” effect. The fixed effects case corresponds to \sigma_\alpha getting very large so that there’s not much effect of the pooled data on the individual groups.

So we ran the above model in R-WinBUGS. First looked at the fixed effects model, using lm(). The -1 in the call to lm() means “don’t fit the intercept.”

For comparison with the random effects model, we got 10.12, 11.38, 11.70, 12.35 for the four groups, with overall mean 11.38.

lmer() means “linear mixed effects model”, for fitting the random effects model. The (1 | groupf) means fit a different random intercept for each group.

Information on using lmer() can be found in two books in particular that Jeff mentioned. They are: “Mixed Effects Models in S and S+” and Gelman and Hill’s recent book, which has examples for using lmer().

Ran the code. Got 10.65, 11.38, 11.58,11.95

(Construct table of these for easy comparison).

We then set up the Bayesian analysis. This time we had the WinBUGS code in the R file, and saved it to the disk using the write.model() function. The function shows us the file.

Remember when looking at the code that while R uses the standard deviation \sigma, BUGS uses the precision=\frac{1}{\sigma^2}.

We ran the code. Got means 10.90, 11.40, 11.48, 11.70 We noted that the between groups sigma was 1.59 whereas the within groups sigma was 0.62, a lot less variation within than between groups.

The example is from a book by Fisher and van Belle on biostatistics. It’s not a Bayesian book. We tried it with a \mbox{gamma}(1,0.1) prior on \mu_\alpha

Jeff noted that print(date()) at the beginning and end of a calculation allows him to monitor how long the calculation takes.

With the gamma prior we got 10.98,11.40,11.50, 11.74. On a second try, we got 10.94,11.38,11.49,11.72

Here’s a summary of the five computer runs we made. Run 1 was the fixed effects model, Run 2 the random effects (frequentist) model, Run 3 the Bayesian model with Jeff’s normal prior on \mu_\alpha, and Runs 4-5 the Bayesian model with the gamma prior on \mu_\alpha. You’ll notice that relative to Run 1, all of the other runs “shrink” the group estimates towards the overall mean. The shrinkage is greater for the Bayesian models than for the frequentist random effects model.

\begin{tabular}{||l||c||c||c||c|c||}\hline\hline&Run 1&Run 2&Run 3&Run 4& Run 5 \\ \hline Group 1&10.12&10.65&10.90&10.98&10.94 \\ \hline Group 2&11.38&11.38&11.40&11.40&11.38 \\ \hline Group 3&11.70&11.58&11.48&11.50&11.49 \\ \hline Group 4&12.35&11.95&11.70&11.74&11.72\\ \hline\hline\end{tabular}

Finally, I pointed out and Jeff reinforced that the shrinkage phenomenon was first understood and explained by the great 19th century statistician, Francis Galton, a cousin of Charles Darwin. He noticed that children of tall parents tended to be shorter than their parents, and children of short parents tended to be taller than their parents. It can be understood by thinking of the variance of an observed population being composed of an underlying variance plus an individual variance. If we’re interested in the underlying variance, it will be smaller than the variance we observe in the population.

Turner asked whether the frequentist results would be the same as the Bayesian ones. The answer is “no,” because the frequentist results are in general inadmissible (a subject we’ll discuss later). The Bayesian results will be admissible as long as they are calculated using proper priors. It is also related to the Stein phenomenon.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: