Remarks on the homework: It’s not supposed to be time consuming wrt coding, adapt last week’s and Jeff’s code. More it is an exercise on thinking.

Comments on R

Create an output from BUGS with hw9.sim=bugs(…). Then attr(hw9.sim) will tell you what’s in it. Another function, attributes(…) is similar. The function objects() will tell you what’s in your workspace.

We’ve just touched on the posterior predictive distribution (PPD) and its uses.

Bill asked if there is a principled method to choose a posterior p-value. In our example we were struck by the large residual and ginned up one of our discrepancy functions to test it. Would we have done that if we hadn’t been struck by this? What if we try multiple discrepancy functions? Should we deflate the posterior p-values with something like Bonferroni to account for that?

Jeff asked, what’s wrong with looking at that large residual, since we condition on the data anyway. But he agreed that large data sets will give more opportunity for *some* discrepancy function to display an anomaly. He noted that there is a huge menu of frequentist tests (e.g., for normality). One such test might be consistent with the data set, and another might reject it. Obviously, this is an area that should be (and is) a subject of active research.

[Added later]: I guess that the thing that bothered me about the particular example is that in Bayesian theory, you are supposed to ask your questions before looking at the data (i.e., what’s the likelihood? What are the priors? Now look at the data. Multiply, calculate, get posterior stuff). Looking at the data and seeing a large outlier and then asking the question seems to violate that principle. I don’t know the answer to this.

Today Jeff talked about Hierarchical models. It extends the two-stage (likelihood + prior) structure to a hierarchy of L levels. The joint distribution of the data and parameters is

The notation suppressed the dependence on lower down in the hierarchy, i.e.,

means and means , etc.

If we have we could specify a value for OR we could put a prior on it, which has the effect of lessening the effect on the results of choosing a particular value for .

Example 1: We already looked at the model

with *a* and *b* specified by the statistician.

Hierarchical models could arise naturally by considering the kind of data you have. So if you are comparing hospitals, the data could be multi-level just because the data are structured that way.

Counties within state, schools within counties, teachers within schools, students of a teacher might also induce a hierarchical or multi-level structure.

Example 2:

Suppose we want to model the performance of trauma centers. We collect data on patients at n=40 different trauma centers.

Let

for *j*th patient at *i*th trauma center. So

where *n*=94 (the number of hospitals). Note that the number of patients per hospital, , varies from hospital to hospital. This means, in particular, that the hospitals with large patient loads will have outcomes that are closer to the mean for the particular hospital, and those with small patient loads will tend to regress (shrink) more to the mean over all hospitals.

Patient level model:

, where

Here, is a vector of covariates (e.g., indicator variables like {male, female}, discrete variables such as age, continuous variables such as burn area) indicating the patient’s particular situation.

will represent the overall outcome for the individual hospital.

Note that we have

Now want to model the because they will vary amongst the hospitals.

Hospital level model:

The are very interesting as they tell us how the probability of dying varies from hospital to hospital.

When Jeff started in statistics there was no hierarchical method to do this. He had to estimate the directly.

We might even let the vary amongst hospitals (be a function of *i*). So if you did this and there were not much data on “gunshot wound to the toe” presentation, then they’d shrink to a point.

No pooling ( estimated separately), complete pooling (all equal). The Bayesian model with is a compromise between these two extremes.

Amongst hospitals with lots of patients, the will be different since the data for the individual hospital will dominate, but with relatively few patients, the for an individual hospital will shrink towards the mean.

Priors:

for example.

Next: Heart Transplant Mortality Data

The book has a chapter on hierarchical model where this is analyzed. But it’s confusing. Jeff said that Bill’s notes have a better discussion. Jeff will do his own example.

Example: Heart Transplant Mortality Data

We have data on the number of observed deaths for *n=*94 hospitals. Also, for the ith hospital we have a value that represents the expected number of deaths for that hospital given that hospital’s case mix (e.g., age, smoking status,…).

obs # of deaths, i=1,…,94

expected # of deaths based on case mix

Model:

This differs from the homework in that we have a for each hospital, not just one for all hospitals.

The posterior is proportional to

What is full conditional posterior on ? It is

or

and similarly for the other hospitals. Also,

Comment: In R, you can sample all in one statement because , i.e., the are independent. So the alternate sampling that is implied by Gibbs sampling can be replaced by a single vector draw.

WinBUGS does all this for us of course. But sometimes you have to “roll your own.”

April 8, 2009 at 7:26 pm |

bill and everyone in class,

regarding the questions about model assessment bill is raising–i think the paper by r little in the american statistician, 2006 number 3, gives a good discussion of how a bayesian might assess his/her models.

if you have a look, let me know what you think.

jeff

April 9, 2009 at 2:45 pm |

This is an interesting article, worth reading.

Bill