We have chosen FRIDAY, APRIL 15, as the date for the second (and last) quiz

You might be interested in this paper, which I wrote about the parapsychology experiment we have been discussing. The math in this paper does not go beyond what we have already seen in the course, so you should be able to understand it without difficulty.

I first drew a picture that shows why the null hypothesis is favored when the data (blue likelihood) are near the null (when compared to a complex alternative), and the alternative hypothesis is favored when the data (red likelihood) are far from the null. The product of prior times likelihood for each hypothesis is the amount by which each hypothesis is favored. Also, the figure shows that if you go from a flat prior to one that is peaked near the data, that will favor the alternative (complex) hypothesis.

We considered an alternative hypothesis where the amount of bias is exactly where the data say they are. This is in the context of the parapsychology experiment, where the data are at 0.500176, a minuscule difference from 0.5 by less than 0.02%. Such a small bias, even if real, has no practical significance. You could not use it to beat the casino, for example, or to make lots of money on the stock market.

This prior on the alternative is the most favorable one to the alternative that you can devise; yet the p-value is five times smaller than the Bayes factor (which measures the evidence in favor of the alternative hypothesis). This is one reason why I believe that p-values are unreliable measures of the strength of evidence against the null.

(To do the calculation, we had to compute the log of the Bayes factor and then exponentiate it. Just multiplying out is beyond the capabilities of a calculator.)

Jim Berger has an “objective” prior that has the following properties: Consider all priors that decrease (or do not increase) as you go away from the null. Calculate the posterior for each of these priors, and choose the one that most favors the alternative. The prior that results is flat up to the data, and then zero (and is symmetrical about the null). For this prior, the Bayes factor is more than 20 times the p-value. This again shows how unreliable a p-value is as a measure of the evidence against the null.

Finally, I argued that observed p-values aren’t actually probabilities at all. Although they are calculated as the probability that in some sequence of experiments (that have not been performed) one would get results at least as extreme as the one we actually observed, the calculation shows (for flipping a fair coin 101 times and getting 61 heads, then repeating the experiment and happening to get 61 heads again…these numbers are just for illustration and the problem would arise no matter what the outcome of the experiments were) that the product of two p-values for independent experiments is NOT the p-value for the combined experiment, as they must be if p-values were probabilities (obeyed the laws of probability for independent events). The product of the two p-values is 0.0021, but the combined p-value is much larger, 0.038!