Today we looked at the homework. The second problem was similar to one you’ve already done, so we just looked at the first one. This is an example where Bayesian answers are very different from those gotten by frequentists. The idea here is that we have a precise hypothesis (the coin or die is fair) and an alternative one (that it is biased). In the first case, the probability of one outcome is specified precisely, but in the other, the probability of the outcome is unknown. Since it is unknown, the Bayesian thing to do is to regard the bias itself to be a state of nature, and put a prior on it. Then we have a prior on the two hypotheses (fair, biased).

This is one case where we actually need to put a *normalized* prior on the value of the bias. Unlike the cases we have treated so far, in the final analysis in this case, there is no cancellation of factors. So in the biased case, we assumed biases of 0.05, 0.15, 0.25,…,0.95, and put a prior 1/10 on each. (If we wanted to be more precise, we could put the prior on 0.005, 0.015, 0.025,…,0.995 and use a prior of 1/100 on each; that would require a spreadsheet calculation). The likelihood under this case is P(data|p, biased)=p^{h}(1-p)^{t}, where h is the number of heads and t the number of tails (the data). The prior is P(p|biased)=1/10, and the joint probabililty is P(data|p,biased)P(p|biased)=P(data,p|biased). Summing over all values of p (getting the marginal) gives us P(data|biased). Here’s a snapshot of the whiteboard that results (not all numbers are filled in):

The calculation for the fair case is easier. In our example, if the die is fair, p=1/3 and (1-p)=2/3, so the likelihood is P(data|fair)=(1/3)^{h}(2/3)^{t}.

We adopted P(fair)=P(biased)=1/2. With these, we are now able to calculate the joint probabilities from the marginal for the biased case and the likelihood from the fair case:

P(data,fair)=P(data|fair)P(fair), P(data,biased)=P(data|biased)P(biased).

But also, P(data,fair)=P(fair|data)P(data), P(data,biased)=P(biased|data)P(data). Dividing these two we get just the ratio P(fair|data)/P(biased|data), which is the posterior odds ratio. Because we chose P(fair)=P(biased), this is also equal to B=P(data|fair)/P(data|biased), which is the Bayes factor. The probability of fair, given the data, is equal to B/(1+B). Here’s the whiteboard after this calculation:

I then described a practical application of this theory. There was a project at Princeton University which was attempting to find evidence for paranormal powers. In one of the experiments, a student was placed in front of a device that randomly flashed red and green lights, and attempted, by pure thought, to “influence” the device so as to make the number of flashes of one of the colors greater than the number of flashes of the other color. The desired color was changed from time to time, so that on some runs, the student tried to make red flash more often, and on some others, green:

I read a paper by these experimenters; they reported data on over 100 million trials that they had conducted over the years (various students). In these trials, there were an excess of 18,471 flashes in the desired direction, less than 0.02% of the total. Even though this was a very small excess in absolute terms, the p-value, that is to say, the probability of getting an excess of 18,471 *or more* flashes, was also very small, about 0.0003 (I got the number wrong on the board, there should be one more zero!) This would be regarded as a *highly significant rejection* of the hypothesis that the device is fair.

Yet, the Bayesian calculation is very different! Doing the exact calculation that is approximated by the spreadsheet method we described above, I found that the Bayes factor was about B=12, which corresponds to a posterior probability in favor of the fair hypothesis of about 0.92! The Bayesian calculation *supports* the hypothesis that the device is fair, in contradiction to the significance test.

I wrote a paper on this subject and published it in the same journal where the original research was reported. This led to an exchange of letters to the editor.

How to explain the discrepancy? First, note that the p-value and the posterior probability are different things. The posterior probability is the probability of fair, *given the data*. But the p-value is the probability of the data, or any data even more extreme, *given that the device is fair*. We really want the Bayesian answer, but the frequentist calculation can’t give us that.

Bayesians regard the frequentist calculation as the right answer to the wrong question. It has a number of defects: First, it doesn’t say anything about any probabilities if the device is biased, yet it purports to tell us something about the coin being biased. Secondly, the probability calculated is based not only on the data that were observed (18,471), but also on *all the possible data that were more extreme and which were not observed*! Moreover, the more extreme data are not expected to be observed, just because they are more extreme. There seems to be something incoherent about basing a conclusion mostly on data that were not observed and were not even expected to be observed!

Dennis Lindley, a British statistician, pointed out that just this kind of outcome can happen: A statistical significance test (the p-value) can reject the “fair” hypothesis with a very small p-value, yet the Bayesian calculation can strongly favor that hypothesis.

## Leave a Reply