Today I introduced the idea of Ockham’s razor (sometimes spelled ‘Occam’, which is the Latinized version). It dates to the 14th century. It has been expressed in various ways by it’s inventor, William of Ockham. Two formulations that can be attributed to him are “Plurality must never be posited without necessity,” and “It is futile to do with more things that which can be done with fewer.” (These would have been written in Latin, of course.) These days, scientist interpret it to mean that we should use hypotheses that are just complex enough to explain the phenomena we want to explain, but no more complex.

For example, we determined that the hypothesis of a fair coin is simpler than the hypothesis of a biased coin, where the amount of the bias is unknown. So if the data we get from coin flips is explained pretty well if the coin is fair, we would probably not opt for the more complicated, biased coin experiment. This is in fact what happened in the case of the parapsychology experiment we discussed last time. What happened there is that the hypothesis “fair, no psi” makes a bold prediction, whereas the “biased, psi” hypothesis spends a lot of prior probability on values of the bias that are far away from the observations. The net effect is that even though the p-value was quite small, the Bayes factor was still supportive of the “fair, no psi” position.

It is like going into a casino and putting money on the roulette wheel. If you put all your money on one number, and it wins, you’ll make a bundle. But your winning is a low probability event. On the other hand you could almost guarantee winning something by putting a small amount of money on every number that you can. You’ll almost certainly win, but your reward will be small because you had to divide your money amongst a lot or possibilities.

The roulette wheel, and the parapsychology experiment are similar in this way: Putting all your money (or prior probability) on one number (or one simple hypothesis) results is a huge reward (in money or posterior probability) if the outcome is predicted by your bet; but if you spread your money (or prior probability) around over many possibilities, even though you’ll get some money (or posterior probability) with near certainty, it won’t beat the successful, bold player (or hypothesis) that bets all the money (or prior probability) on the simple outcome.

I spent most of the period discussing Einstein’s general theory of relativity. It is a theory of gravity that is considered as a replacement for Newtonian theory. It makes a number of predictions, and we will consider two of them. The first is that gravity will bend the path of light, acting as a sort of lens. In the figure below, the path of light from a distant star that goes past the Sun appears farther away from the Sun (as indicated by the red line) than it actually is. The amount of bending, according to Einstein, for a ray that just grazes the surface of the Sun is 1.75 seconds of arc (written 1.75″). This is twice the amount predicted by Newton’s theory.

The British astronomer Eddington mounted an expedition in 1919 to try and detect this effect; they took a photograph of a field of stars when the Sun was on the other side of the Earth, and another photograph during an eclipse of the Sun when the Sun was in that field of stars (with the light of the Sun, which would otherwise drown out the stars, eclipsed by the Moon). Laying the two photographs on top of each other, we would see stars moved away from the Sun, with greater motion for the closer stars, in the eclipse photograph relative to the non-eclipse photograph. Eddington reported that the motion was consistent with Einstein’s theory; we now know that his observations were not so accurate as he thought (this is technically a very difficult experiment). But today, using very accurate radio telescopes, we can make these observations so accurately that it is clear that they are inconsistent with Newton’s theory and consistent with Einstein’s.

The other famous observation concerns the motion (precession) of the perihelion (closest point in an orbit to the Sun) of Mercury. It had been known since 1859 that this motion was inconsistent with what was known at the time about planetary orbits. It wasn’t that Newtonian theory couldn’t explain the motion, but it needed something more than the planets that were known, or some modification of the law of gravity. The situation is shown in the figure below:

One could propose explanations that would solve the problem. There could be an unknown planet close to the Sun and hard to observe, for example. Other planets had been discovered recently, Uranus (by accident) and then Neptune (by its effect on the orbit of Uranus). Surely, we ought to be able to repeat the discovery of Neptune on a planet close to the Sun, some astronomers thought! And some astronomers claimed to have seen the elusive planet, and even gave it a name, “Vulcan,” the god of the furnace. But these discoveries were never confirmed.

Other possibilities would be a faint ring of material near the Sun, or the Sun having a slight oblateness (elliptical cross-section). Yet another would be some subtle change in the law of gravity.

All of these explanations involve an adjustable parameter that can be chosen to match the observed precession. For example, the mass of Vulcan, the mass of the ring, the amount of the oblateness of the Sun, or the amount of the change in the law of gravity that would be required.

In the figure, we illustrate this. We can be sure that in the complex theories that involve an adjustable parameter, such as modifying the law of gravity, the adjustable parameter has to be such that the effect on Mercury’s orbit is not greater than 100″/century either way (positive or negative). The reason we know this is that if it were greater, we would see effects on other planets (Venus and Earth) that are not seen. So the prior probability is spread out under this theory over a wide range (blue graph).

On the other hand, Einstein’s theory puts the same amount of prior probability (area the same) into a tall, skinny rectangle (in red, and I’m sorry it’s hard to see) near 43″/century, which is what is predicted. [Note: The amount of this motion is predicted from the theory and has to do with the constant of gravity and the speed of light; it is not put into the theory from the known value of the motion]. Now, if the perihelion motion had been well away from 43″/century, as shown in the blue bell-shaped likelihood curve (representing the errors of the observations), then Einstein’s theory would be dead, because the product of prior x likelihood for the complex theory there is greater than the small amount of prior x likelihood that Einstein’s theory can get from the tail of the likelihood. But, that’s not where the data lie. They lie almost smack dab on top of the tall skinny triangle, and now the product of prior x likelihood strongly favors Einstein’s theory.

Here’s a pointer to a paper that Jim Berger and I wrote about 20 years ago, explaining the Bayesian Ockham’s razor and the Einstein experiment. (The title when it was finally published is different from the running title. The editor of the journal didn’t like our title!)

## Leave a Reply