I started out discussing the cartoon that I linked to a few days ago. I pointed out that both the sensitivity and specificity of the test in the cartoon were very high (35/36 or about 97.3%). Nonetheless, the cartoon test is rather silly, and it reinforces the idea that frequentist tests only talk about what happens if you repeat them many times. The Bayesian probably knows (background information) that it is physically impossible for the Sun to go nova (it will die in an entirely different fashion, its mass is too small), and even if it were possible, the bet is an entirely safe one since if the Sun had gone nova, no one would be around to collect the bet!

I then showed a cartoon that Larry Wasserman put on his blog. Larry’s point here is that (under most circumstances) Bayesian credible intervals don’t say anything about frequentist coverage. There are no coverage guarantees. It is true that under some special circumstances, such as the Berger-Mossman example that you calculated for an assignment, it is possible for a Bayesian credible interval to have good frequentist coverage; but in this example, it was by design, and happened because Berger and Mossman used a standard objective prior. These objective priors probably will give decent coverage in most situations (but it should be checked if coverage is important to you), just as they usually give similar results in parameter-estimation problems (e.g., regression). But in general, informative priors will not necessarily have these properties.

We returned to the perihelion motion of Mercury. The bottom line here is that the “fudge factor” theory F spreads its bets over a large area of outcomes. It’s got to match the actual outcome, but it wastes prior probability on outcomes that do not pan out. On the other hand, Einstein’s theory E makes a very sharp and risky prediction. And, since the data lie close to that prediction, it wins big time, just as when a gambler bets all his chips on one outcome and that outcome is the one that happens.

I noted Berger’s “objective” prior that is symmetric about “no effect” and decreases monotonically away from “no effect”. It doesn’t support Einstein quite as much, but it provides an objective lower bound on the evidence for E.

Even if you put all the prior probability under F on the alternative hypothesis, you get probabilities that are significantly higher than the corresponding p-values. So p-values overestimate the evidence against the null.

Another danger is that the likelihood ratio (Bayes factor) in favor of the simpler hypothesis will increase proportionally to , so the larger the data set (for a given p-value), the more strongly the null hypothesis will be supported. Jack Good suggested a way to convert p-values to Bayes factors and posterior probabilities that, as we calculated, does a pretty good job (but it is approximate).

This led to a discussion of the Jeffreys-Lindley “paradox”, whereby you can have data that simultaneously give strong evidence in favor of the null hypothesis and a very small p-value that would reject it. I gave a real-life example that I wrote a paper on, from some parapsychology research.

Finally I discussed sampling to a foregone conclusion and the Stopping Rule Principle. If you are doing frequentist analysis, you are not supposed to watch the data and stop when the data give you a small enough p-value. Frequentist theory disallows this (but people do it a lot, and the parapsychologists did it in a huge way). The good news is that Bayesian analysis does not have this defect. This means that ethical problems using frequentist principles can be avoided by using Bayesian methods. The notes discuss this.

## Leave a Reply