## HCOL 195 9/28/09

I started out by remarking on two of the journals. In one, there’s a nice illustration of how an initial decision may make it difficult to change ones mind, even given more data that casts doubt on the initial decision. This involves the old medical aphorism, “When you hear hoofbeats, think horses, not zebras.” This is a reflection of our Bayesian prior probabilities…at least over here, if you hear hoofbeats, it’s quite likely that it’s a horse that’s producing them, not a zebra (the situation might be reversed in parts of Africa). So the prior probability of “horse” is much bigger than “zebra”. In this case, the student, while a teenager, came down with stomach pains. The doctor was called, who diagnosed (over the phone) a stomach virus (which was going around). A very reasonable diagnosis, given that he hadn’t seen the patient. Horses, not zebras. But the pain didn’t go away, and after a second phone call and finally an office visit, the diagnosis was the same. (Several days had passed, and viruses being self-limiting, this should have been a clue that the initial diagnosis was wrong, but the physician apparently did not think “zebras” at this point.) A day or two later, the pain was still there, and the patient noticed that the stomach was particularly sensitive to being poked in a special place that, the patient had just learned, was where the appendix was. Another call, but it being the weekend, the patient was whisked off to the ER, where a ruptured appendix was diagnosed and surgically removed. This was a very close call! The bottom line for decision-makers is this: Don’t let an initial assessment cloud your future thinking as more data comes in.

The other comment was on another journal. There’s a common mistake in probability that we see from time to time when a person wins the lottery a second time. This usually gets reported in the press and on TV as a very low probability event (i.e., p*p if p is the probability of winning the lottery once). But this is a mistake. p*p is the probability that a particular person, chosen in advance, will win the lottery twice, if he only buys two tickets on two separate lotteries and never buys any other tickets. But that’s not what we have here. This is the probability that someone, sometime, who has already won the lottery, will win it again. And that probability is just p, for any particular lottery winner, if he only buys one ticket ever again.

But the lottery is held frequently. If it is held weekly, for example, in any year there will be 52 lottery winners, and the probability that any of them will win again is 52*p (approximately). The longer the lottery is held, and the more winners there are, the more likely it is that we will have winners who have won more than once.

And that’s not all! Every time a former winner enters, there is a chance that he or she will win again. If the winner enters every week, or buys multiple tickets, the probability that the former winner will win again is multiplied by the total number of tickets bought.

So, for all of these reasons, the probability that someone will win the lottery twice is much, much larger than these sensationalistic press reports suggest.

We then discussed the fish capture-release-recapture problem, with 100 tagged fish being thrown into the lake, and after a time, we catch 10 tagged and 90 untagged fish. As with last time, this gives us a rough estimate of 1000 fish in the lake, since our sampling of 10 tagged out of 100 caught tells us that approximately 10% of the fish in the lake are tagged, and we know that 100 are tagged.

But from a Bayesian point of view we want to set up a spreadsheet as we did before. So we identify the states of nature as being the different numbers of fish that there could be. We know there are at least 190 fish. We put a uniform prior on each state of nature. For simplicity, since we know that when we divide the joint probability by the marginal probability, any multiplicative constant will drop out, we simply enter 1’s in the prior column. The likelihood is a little harder. Suppose we catch first the 10 tagged and then the 90 untagged fish (we’ll discuss the problem of order in a moment). The probability of catching the first tagged fish is 100/N, if N is the number in the lake. The probability of catching the second tagged fish is 99/(N-1), of the third, 98/(N-2), and so on to the last tagged fish caught, which is 91/(N-9). The probability of catching the first untagged fish is (N-100)/(N-10), of the second (N-101)/(N-11), and so on to the last of the 90 untagged fish, (N-189)/(N-99). The probability of catching them all in this order is the product of these probabilities.

But we probably caught the fish in a different order. No worries! If you write the whole probability down, you’ll see that all this means is that you’ll switch various numbers in the numerator around, but the fraction will remain the same. Or, if you don’t know the order, you’ll multiply by the appropriate binomial coefficient (the number of ways of picking 90 untagged and 10 tagged out of N total fish), but that factor will be the same for each probability so will cancel out when we divide.

The calculation is shown here (photo of whiteboard):