Stat 295, 4/2/09

I mentioned a link on Andrew Gelman’s blog on R programming style tips.

Continuing….on the random effects ANOVA model for time to first walking.

Was there a significant difference between groups? We’d like to do pairwise comparisons of groups. Eg, make inferences on \alpha_1-\alpha_2. How to get confidence intervals on that?

The frequentist approach would be to choose point estimator for \hat{\alpha}_i, and then \hat{\alpha}_1-\hat{\alpha}_2 differences, figure out the sampling distribution of that quantity, and compute a confidence interval from that sampling distribution. This could be quite complicated to do analytically (but may be possible in general using bootstrapping).

The Bayesian approach is dead simple. Everything we need is already available in the samples. A sample of the difference is equal to the difference of the samples (sample by sample). So, sample by sample we compute \alpha_i-\alpha_2, and use that sample to compute credible intervals. Any function of the parameters is easy to make inferences on; just compute that function on each sample. Here, our function is the difference between two \alpha\mbox{'s}, a very simple function.

So we did this. See the code.

attach.bugs() will attach the output of winbugs so that we can refer to them by names.

dim(alpha) gives the dimension of alpha, which in this case is 1002 x 4 (samples x groups).

Jeff wrote pairwise(), which looks at a pair of differences and returns the percentiles. You can call it with any pair of columns.

We did this; the quantiles for 1-2 were

\begin{array}{rrrrrrr} 2.5\%&5\%&25\%&50\%&75\%&95\%&97.5\%\\ -2.18&-1.88&-0.83&-0.26&-0.01&0.33&0.48\end{array}

The idea is that we can get posterior distribution of any function g(\alpha,\theta \mid \vec{y}) of the parameters.

eg.

\mbox{Pr}(\alpha_1>\alpha_2)=\int{\int{ g(\alpha,\theta \mid \vec{y})d\theta} d\alpha}

Turner asked whether you have to keep the order in the sample together, that is, since we have several quantities in the array of samples, do we have to use the first row of samples for the various parameters together, or doesn’t it matter? The answer is, yes, you do have to use the first row of samples to get the first sample for the function of the parameters that you are computing, the second row for the second sample, etc. The reason is that the samples are correlated. If you weren’t to do this, the correlation would disappear and the answer would be wrong.

Our next example is posterior predictive distributions.

PPD’s are used, among other things, to assess whether a probability model for the data is appropriate.

Def. The PPD for a new observation \tilde{y} is

\begin{array}{rclr} f(\tilde{y}\mid \vec{y})&=&\int{ f(\tilde{y},\theta \mid \vec{y})d\theta}\\ &=&\int{ f(\tilde{y} \mid \theta,\vec{y})g(\theta \mid \vec{y})d\theta}\\ &=&\int{f(\tilde{y} \mid \theta)g(\theta \mid \vec{y})d\theta}\end{array}

where the last equality follows because \tilde{y}\bot \vec{y} (they are independent) given \theta, i.e., f(\tilde{y}, \vec{y} \mid \theta) = f(\tilde{y}\mid\theta)f(\vec{y}\mid\theta)

Note that f(\tilde{y} \mid \vec{y}) specifies how we can replicate data under our assumed model. Replicated data under the assumed model should look like the data we actually collected.

A procedure for using the PPD for model checking is as follows:

Let D(y,\theta) be a function that measures some aspect of discrepancy between the model and data. Define

\mbox{posterior p-value} =\mbox{Pr}(D(\tilde{y},\theta)>D(\vec{y},\theta))

The posterior p-value can be computed as follows

1) for t=1,…,M obtain MCMC samples \theta_t
2) For each \theta_t, generate n observations; \tilde{y}_i^t \sim f(\tilde{y}|\theta_t) for i=1,2,...n and let \tilde{y}^t=(\tilde{y}_1^t,...,\tilde{y}_n^t)
3) Choose a discrepancy measure D(y,\theta)
4) Compute the posterior p-value as

\frac{1}{M} \sum_{t=1}^M{I[D(\tilde{y}^t,\theta_t)>D(\vec{y} \mid \theta_t)]}

We have 66 observations on speed of light, made by Simon Newcomb, who was the director of the United States Naval Observatory around the turn of the 20th century. Bill remarked that the speed of light is no longer measured. For practical reasons, a few decades ago the international bodies charged with defining units defined the speed of light in vacuuo as exactly 299,792,458 meters/second. He couldn’t have directly measured a single round trip of the route suggested. It must have been bounced back and forth many times to get to a time difference that he could have measured with the clocks available to him at the time.

We’ll model differences from 24,800 nanoseconds for a single round trip as normal:

\begin{array}{rclr}y_i &\sim &N(\mu,\sigma)\\ \mu &\sim& \mbox{flat}\\ \frac{1}{\sigma^2}&\sim&\mbox{gamma}(\epsilon,\epsilon)\end{array}

The question is, is a normal model a good one for these data?

To motivate a discrepancy function, look at histogram of the data. There is a big outlier on the left. One discrepancy measure would be the minimum value of the data. We want to see how many minimums are less than the observed minimum, i.e., does the position of the outlier comport with the probability model that we have?

See the code.

In the code, Jeff used the R apply() function. He had a matrix ‘ynew’ with 1002 rows and 66 columns. Each row is a sample of 66 replicates of the 66 observations that Newcomb made (that is, they are draws from our assumed data model). Want to apply the function ‘min’ to each replicate. We use mins=apply(ynew,1,min); the ‘1’ says “apply the function ‘min’ to the rows of ‘ynew’.”

To show the distributions, he used the function truehist(mins)

We noted a discrepancy. That point is an outlier, it seems, and the model isn’t (on this measure) a good one.

The posterior predictive p-value is 0 for this discrepancy measure. That’s an indication that getting an outlier this large is extremely improbable under our model.

Another discrepancy measure might be to look at the standardized residuals:

D(y,\theta)=\sum{((y_i-\mu)/\sigma)^2}

Jeff used rowSums(), which is faster than rowsums()

We looked at histograms for discrepancy measures for simulated and observed data. What do you think? Do you think they looked similar? (we got a posterior predictive p-value for this discrepancy measure of 0.5, which means that using this measure, the observed data look like they came from the model).

Posterior Predictive Distribution

Posterior Predictive Distribution

Finally, at the end of class, Bill pointed out another use of the posterior predictive distribution. Suppose we observe an orbit of an asteroid and are worried about whether this asteroid will hit the Earth within the next 50 years, say. Unfortunately, our orbit is based on data so the orbital parameters have error. We can use the orbital parameters to predict the orbit into the future, and by a Monte Carlo method, using the samples of the parameters we can get samples of the predicted position of the asteroid for any time in the future. We can calculate what proportion of these samples hit the Earth. That proportion will be the probability of impact at that later time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: