The first problem was the test cheating problem. You are to advise a professor who had two students, one of whom sat behind the other, who answered a 25 question multiple-choice test with identical answers, 8 answered wrong and 17 correctly. Each question has 5 choices. We can’t learn anything about possible cheating from the questions answered correctly…the students are supposed to know those answers. However, the questions answered incorrectly can give us some insight. Since we are led to look at them because they were answered incorrectly, there are four (not five) ways that they could answer incorrectly. (If they had chosen the correct answer at random, we would not be able to know that and would not be considering the question). Since there are four possible choices, each coincidence (if by chance) would have a probability of 1/4=1/2^{2}. There is one such factor for each wrong answer that matches, so the likelihood under the “no cheating” hypothesis is 1/2^{16}, and under the “cheating” hypothesis it is 1. For a prior, we noted that most students don’t cheat, so we took only a 1/10 prior that cheating was involved. The result of the calculation is shown in the board shot below:

It seems as if we have substantial evidence of cheating.

The second problem was the taxi problem. The assumption is that the taxis are numbered consecutively from 1. We saw 7 taxis, the largest number of which was 150. We know therefore that the likelihood of there being N taxis, if N<150, is zero. We do not (and according to Bayesian theory should not) build that into the prior, since the likelihood automatically takes care of it. For N≥150, the likelihood is 1/N^{7}.

For a prior, we noted that we are more likely to be in a small city than a large one, because small cities are more numerous. We chose a prior on N of the form 1/N, but it might have been 1/N for example, which also decreases as N increases. The rest of the calculation is a routine application of our spreadsheet method, and is shown below:

We noted that the posterior probability for N=151 is about 5% smaller than that for N=150. Probably half of the posterior probability is for N≤160 or so, and most of the remainder will be for N≤175. It’s a good bet, from these data, that the number of taxis in the city is between 150 and 175, approximately.

The third problem is the first part of the drug company decision problem. There are two unknown rates of cure for the two drugs, the old one (r) and the new one (s). We have to follow the practice of re-evaluating the cure rate for the old drug, even if we have lots of data on it, because we will be using a particular sample of patients and their profile may be different from the general population. This means that we’ll have to evaluate the likelihood on a 10×10 grid, with the different values of r corresponding to different rows, and the different values of s corresponding to the different columns. For simplicity we can take the prior to be the unnormalized prior with 1 in each grid location, which means that the joint probability will (except for the factor that we get from using an unnormalized prior) be equal to the likelihood, cell by cell. For the old drug the cure/no cure statistics were 25 and 25; for the new one, 30 and 20. This means that for cure rates r and s, the entry in the likelihood cell will be of the form r^{25}(1-r)^{25}s^{30}(1-s)^{20}, as shown in the diagram for one particular cell.

Once we have the likelihoods (and the joints) calculated, we add up all of them to get the marginal, and then we may divide the marginal into each joint to obtain the corresponding posteriors, cell by cell. Then adding up the posterior probability for those cells that satisfy s>r gives us the probability that the new drug is better than the old, as shown in the board shot:

s>r above the stair-stepped line. We could also have just added up the likelihoods above the stair-stepped line and divided the sum by the marginal. The answer would be the same, but the amount of work would have been less.

I asked you to think about the decision problem that is the second half of this problem for Friday.

## Leave a Reply