I started out by remarking on two of the journals. In one, there’s a nice illustration of how an initial decision may make it difficult to change ones mind, even given more data that casts doubt on the initial decision. This involves the old medical aphorism, “When you hear hoofbeats, think horses, not zebras.” This is a reflection of our Bayesian prior probabilities…at least over here, if you hear hoofbeats, it’s quite likely that it’s a horse that’s producing them, not a zebra (the situation might be reversed in parts of Africa). So the prior probability of “horse” is much bigger than “zebra”. In this case, the student, while a teenager, came down with stomach pains. The doctor was called, who diagnosed (over the phone) a stomach virus (which was going around). A very reasonable diagnosis, given that he hadn’t seen the patient. Horses, not zebras. But the pain didn’t go away, and after a second phone call and finally an office visit, the diagnosis was the same. (Several days had passed, and viruses being self-limiting, this should have been a clue that the initial diagnosis was wrong, but the physician apparently did not think “zebras” at this point.) A day or two later, the pain was still there, and the patient noticed that the stomach was particularly sensitive to being poked in a special place that, the patient had just learned, was where the appendix was. Another call, but it being the weekend, the patient was whisked off to the ER, where a ruptured appendix was diagnosed and surgically removed. This was a very close call! The bottom line for decision-makers is this: Don’t let an initial assessment cloud your future thinking as more data comes in.
The other comment was on another journal. There’s a common mistake in probability that we see from time to time when a person wins the lottery a second time. This usually gets reported in the press and on TV as a very low probability event (i.e., p*p if p is the probability of winning the lottery once). But this is a mistake. p*p is the probability that a particular person, chosen in advance, will win the lottery twice, if he only buys two tickets on two separate lotteries and never buys any other tickets. But that’s not what we have here. This is the probability that someone, sometime, who has already won the lottery, will win it again. And that probability is just p, for any particular lottery winner, if he only buys one ticket ever again.
But the lottery is held frequently. If it is held weekly, for example, in any year there will be 52 lottery winners, and the probability that any of them will win again is 52*p (approximately). The longer the lottery is held, and the more winners there are, the more likely it is that we will have winners who have won more than once.
And that’s not all! Every time a former winner enters, there is a chance that he or she will win again. If the winner enters every week, or buys multiple tickets, the probability that the former winner will win again is multiplied by the total number of tickets bought.
So, for all of these reasons, the probability that someone will win the lottery twice is much, much larger than these sensationalistic press reports suggest.
We then discussed the fish capture-release-recapture problem, with 100 tagged fish being thrown into the lake, and after a time, we catch 10 tagged and 90 untagged fish. As with last time, this gives us a rough estimate of 1000 fish in the lake, since our sampling of 10 tagged out of 100 caught tells us that approximately 10% of the fish in the lake are tagged, and we know that 100 are tagged.
But from a Bayesian point of view we want to set up a spreadsheet as we did before. So we identify the states of nature as being the different numbers of fish that there could be. We know there are at least 190 fish. We put a uniform prior on each state of nature. For simplicity, since we know that when we divide the joint probability by the marginal probability, any multiplicative constant will drop out, we simply enter 1’s in the prior column. The likelihood is a little harder. Suppose we catch first the 10 tagged and then the 90 untagged fish (we’ll discuss the problem of order in a moment). The probability of catching the first tagged fish is 100/N, if N is the number in the lake. The probability of catching the second tagged fish is 99/(N-1), of the third, 98/(N-2), and so on to the last tagged fish caught, which is 91/(N-9). The probability of catching the first untagged fish is (N-100)/(N-10), of the second (N-101)/(N-11), and so on to the last of the 90 untagged fish, (N-189)/(N-99). The probability of catching them all in this order is the product of these probabilities.
But we probably caught the fish in a different order. No worries! If you write the whole probability down, you’ll see that all this means is that you’ll switch various numbers in the numerator around, but the fraction will remain the same. Or, if you don’t know the order, you’ll multiply by the appropriate binomial coefficient (the number of ways of picking 90 untagged and 10 tagged out of N total fish), but that factor will be the same for each probability so will cancel out when we divide.
The calculation is shown here (photo of whiteboard):
We thought about what the graph of this would look like. One student suggested that it would tail off and get smaller and smaller as the number of fish increased. Another pointed out (in response to my question) that the maximum should be around or at 1000. But that means that it should increase from 190 up to 1000, as in the picture:
So we can now do the same things that we did for cure rates. By adding (or computing areas under the curve) we can estimate, for example, the probability that the number of fish is between 500 and 1500, or any similar pair of numbers, or we can calculate a range for which the probability of the number of fish being in that range is 0.95 (which is the standard 2 standard deviations criterion). One student asked if this is the same as what we would get by calculating with the usual formula. The usual formula works for normals, but not for this skewed distribution. But it would still be approximately right.
I then changed the subject and asked, what is the value of a human life? Yes, it’s a horrible question, but it’s one that we are forced as a society to answer, since many policy questions depend on the answer, such as, is it cost-effective to require everyone to have seat belts, or health insurance, etc. A number of approaches could be used: How much life insurance should someone with dependents take out, or how much might one expect to earn in a lifetime, or the economic loss to society if someone dies prematurely. For such government questions, it’s probably best to err on the high side, and although the class voted 1-2, maybe 3 million dollars, government agencies generally use numbers in the 4-7 million dollar range.
We discussed situations like the Terry Schaivo case, where a young woman was in a persistent vegetative state, and the family couldn’t decide whether to withdraw artificial life support. The case went to Congress, which passed a special law, which in turn was declared unconstitutional. Eventually, the husband prevailed and she was taken off the life support, and died a few weeks later. I urged everyone to consider legal instruments that will make it clear what you want to happen should you be in such a situation. They are: A Living Will, and a Durable Power of Attorney for Health Care. The first of these states your wishes; the second allows someone you trust and choose to make decisions about health care in case you are unable to do so.