HCOL 196, April 25, 2011

Reminder: On Wednesday and Friday, when you do your presentations, it is very important to be prompt to class as we have a limited amount of time.

On Monday I talked about another astronomical problem that we have used Bayesian methods to solve. Like the problem we discussed last time, this one involves a special kind of variable star, this time called RR Lyrae stars (again, named for the first one of its type discovered). These are fainter than the Cepheids, but they are known to have the same luminosity. So again, if we can calibrate the luminosity, we can use these to measure the distances of objects that contain them. Again like the last time, the nearest of these stars is a little too far away to measure its distance directly, and we only were able to do this with the Hubble. But we’d like an independent check. This time we can’t use the method that we did for the Cepheids, but we know that these stars have low velocities relative to the galaxy (and thus high velocities relative to the Sun, which is moving rapidly around the galaxy). To a first approximation, consider four such stars, located at 12:00, 3:00, 6:00 and 9:00 on a clock face relative to the Sun (see picture). The stars at 12:00 and 6:00 can have their velocities towards or away from the Sun measured by Doppler shift, just as we measured the velocity of the pulsations of the Cepheid variable. These velocities don’t depend on how far away the star is. The stars at 3:00 and 9:00 aren’t moving towards or away from the Sun, they are appearing to fly by us as trees at our sides would if we were driving down the highway. They have an angular velocity, and their angular velocity depends on their distances. We can measure these angular velocities by observing a star at two times, noting the change in angle, and dividing by the time interval. Then, just as in the Cepheid case, we have an angular velocity and a linear velocity (on two stars), and this allows us to calculate the distance of the stars at 3:00 and 9:00, and thus calibrate the brightness of these stars (and by extension, all RR Lyrae stars).

It’s a bit more complicated than this of course. In the first place, we aren’t going to find stars at exactly those points, so some geometric modeling is needed. Also, the stars do not have exactly zero velocities relative to the galaxy, they do have their own motions, so we’re going to have to model that statistically as well. This is easily done using the Bayesian method.

I noted that in general we cannot use the “spreadsheet” methods of the class to solve problems like this. Although the basic ideas remain the same, if you have more than two parameters of interest, then it rapidly becomes impractical. For the past quarter century, statisticians have been using another technique, which involves using various strategies to draw a large sample from the posterior distribution and then make inferences from the sample. But we start at the same basic place. With x representing the data and \theta the states of nature, we start with a prior on \theta, a likelihood, multiply them to get the joint. The problem is that to compute a posterior distribution we need to divide the joint by the marginal, and the marginal is computed by integration in general (unless the parameters are discrete so that we can sum). But integration is hard. So statisticians have learned how to avoid that step altogether by drawing from the sample.

The basic technique is called Markov chain Monte Carlo (I didn’t name it in class). Markov chains are a particular type of mathematical object, and Monte Carlo refers to the famous casino on the Mediterranean coast. A simple example is shown in the diagram, where we are trying to draw a sample from a normal (bell-shaped) distribution. We start anywhere, with some value \theta_0. Starting there, we generate a new \theta^* from a uniform distribution between \theta_0-d and \theta_0+d. We then compute the ratio of the joint distribution at the new point to that at the old point. We generate a uniform random number between 0 and 1, and compare it to the ratio. If it is less than the ratio, we jump to the new value, so \theta_1=\theta^*, otherwise we set \theta_1=\theta_0. We then repeat the procedure as long as we wish, obtaining a large sample.

If you think about it, you can easily see that if the distribution at the new point is bigger, you will definitely move there, but if it is less, you will sometimes move there and sometimes not. This means that the points will concentrate near the peak of the curve, and in fact the sample will be a very good approximation to a sample from a normal distribution.

This is a very simple example. There is an art to designing programs that will sample efficiently, but it’s not hard to design programs that will sample effectively, given enough time.

There are many variations on these schemes. Usually it is easiest to sample on a small number of parameters at a time. You can, in the following figure, sample first on \theta_1 and then on \theta_2 and then repeat.

The method was first developed at Los Angeles National Laboratory in the 1950s, and was used to develop nuclear weapons.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: