HCOl 195 11/6/09

November 6, 2009 by bayesrules

Today we discussed the ideas about money and investing found in the handout.

A new cool thing. I stumbled on this today. You will recall that we discussed the relative sizes of human cells and bacteria early in the course. I stated that bacteria are much smaller than human cells. This blog entry has a pointer to a lovely interactive page that illustrates this. By sliding the adjusting widget at the bottom, you can zoom down on smaller and smaller things.

HCOL 195 11/4/09

November 5, 2009 by bayesrules

Today we discussed “gambler’s ruin.” This is the idea that if we bet over and over small amounts, we will under some circumstances eventually go bust. I mentioned it in connection with the following scenario: Someone has borrowed money from a loan shark. The loan shark wants his money and interest back, and threatens to break the legs of the borrower if he doesn’t come up with the $200 owed tomorrow. The man has only $100, and his only plan is to go to the casino and attempt to win another $100. What should he do?

A student suggested that he should bet the whole $100 on one spin of the roulette wheel (odds of winning are 18/38 at an American casino). That was felt to have the greatest probability of success.

So we considered the idea of betting against a casino that has an infinite amount of resources. I drew a picture on the board:

Gambler's Ruin

In the diagram, we imagine that we start with $1 and bet it. The probability that we win is p=18/38, and the probability that we lose is 20/38. If we lose, we bust. If we win, we will go from $1 to $2. But we still might go bust, for example, we might lose the next two tries and end up with nothing, or we might bounce around a bit but eventually end up with nothing. If P1 is the probability that we eventually go bust, starting with $1, and P2 is the probability that we eventually go bust, starting with $2, then

P1=q+p*P2,

that is, we either bust immediately with probability q, or we obtain a second unit but eventually bust with probability P2.

But P2=P12. The reason is that if we have $2, then (since the bets don’t know how much money we have), the probability that we will eventually have a situation of having $1 is the same as the probability that we will bust if we start with $1, since it’s just the probability that eventually we will have $1 less than we have now. Therefore,

P1=q+p*P12

This is a quadratic equation for P1. There are two roots: 1 and q/p. If q>p then that root is not a possible root (since probabilities have to be no bigger than 1), and the only solution would be 1. But that means that in a real casino, where q/p>1, the only usable root is 1 and if we start with $1 and keep playing indefinitely, we will eventually bust. But that means that no matter how much money we start with, we will eventually bust, because if we start with $100, say, the probability is 1 that we will at some point have $99 ($1 less than we have now). But then the probability is 1 that we will at some point have $98, then $97, and so forth until we bust. We cannot beat the casino because the odds are in its favor.

The calculation of the roots is shown below and then on the right side of the board in the picture above.

Gambler's Ruin 2

Next, we consider the situation where the casino has a finite amount of money (or equivalently where we have m dollars and want to gamble until we get (m+n) dollars, then quit with enough money to pay off the loan shark). So, we would like to calculate the probability that if we start with m dollars, and gamble $1 at a time, that at some point in the future we will have (m+n) dollars. We can calculate this, using what we have done with the infinite casino. That’s because if Q is the probability of never getting to (m+n) dollars, that’s the same as the probability that we eventually bust without ever getting to (m+n) dollars, and (1-Q) would be the probability that we do eventually get to (m+n) dollars. But that means that

Pm=Q+(1-Q)Pm+n, since there are two ways to bust against an infinite casino, starting with m dollars: Either bust without ever getting to (m+n) dollars (that’s Q), or get to (m+n) dollars (probability 1-Q) and then go bust against the infinite casino (probability Pm+n). From what we’ve already learned, the probability of busting against an infinite casino, starting with m dollars, is rm where r is the smaller of 1 or (q/p). That yields the equations on the last chart. But for us, (q/p)>1 so that doesn’t help us to find out the probability of starting with m dollars, and gambling $1 at a time eventually getting to (m+n) dollars. But if we reverse the roles of the gambler and the casino, the probability that we will get to (m+n) dollars is the same as the probability of a casino that has n dollars busting against an infinite casino. That exchanges the roles of m and n, and of (q/p) and (p/q), and a similar formula works. That’s shown on the last chart.

Gambler's Ruin 4

If we evaluate this for just one bet, the probabililty that we win is just 18/38=0.474. If we divide our bet in two so that m=n=2, the probability of our winning is less than this, which confirms the student’s intuition that BOLD PLAY, that is, betting the maximum amount, is the best way to keep our legs from getting broken. So bet the $100 on one spin of the wheel, and hope for the best.

I remarked that this is why the lottery isn’t a good way to plan for your retirement. If you were to bet $40 a week for your entire life, your expected net worth from gambling at the end would be $80. (In class I said $40, but I checked it out; still not a good retirement plan!)

HCOL 195 11/2/09

November 2, 2009 by bayesrules

We decided to have the second test on Wednesday, November 18.

We then turned to the question of the death penalty, which although it is not allowed under Vermont state law, could conceivably be a problem for a Vermont jury which was convened in a federal capital case, as one Vermont jury recently was faced with this problem. I pointed out that a potential juror will be required to answer questions in vior dire, where each juror is questioned under oath about the case: Do you know the defendant or the victim, what have you heard about the case, can you render an impartial verdict, etc. In a capital case, you will also be asked whether you are opposed to the death penalty, and if you answer “yes,” then you will not be allowed to serve on that jury.

We then proceeded to consider a decision tree for a juror. This is a linked decision, so we need to have a second decision box if the jury decides to convict, since the second choice is “Death” or “Life in Prison.” See below for the general form of the tree, which we drew without many of the losses inserted.

We noted that each juror will have different losses for the outcomes. AI is clearly the best, with a loss of 0. It seems like we ought to assign some small loss to CG, Life, and since the scale is arbitrary, we assigned 1 for that case. We then considered what the loss should be for AG, and decided on 10 for this (although we also discussed 100; since we are illustrating the process, and everyone will come up with their own loss structure, we can use whatever will illustrate the process):

Determining loss for AG relative to CG, Life

Determining loss for CI, Death

We decided that convicting an innocent person and putting him in prison for life was pretty bad, and with a similar tree (not shown, sorry, I didn’t snap a picture of it), we settled on CI, Life=1000 for the loss. Then, we drew the above picture, and noted that CI, Death is worse (for the juror) than CI, Life. That means (if you believe it) that in the above trial decision, we should set p0 to some number less than 1, so the loss for CI, Death should be 1000/(1-p0) and will be larger than 1000. We then entered our losses, determined in this way, into the decision tree we had sketched before:

Death penalty decision tree

Because of the last observation, it’s clear that if we think that sending an innocent person to his death is worse than putting him in prison for the rest of his life, then (if we use decision theory) we will never decide on the death penalty, regardless of whether we personally approve of that penalty or are opposed to it.

HCOL 195 10/30/09

November 1, 2009 by bayesrules

Today we looked at the homework. The second problem was similar to one you’ve already done, so we just looked at the first one. This is an example where Bayesian answers are very different from those gotten by frequentists. The idea here is that we have a precise hypothesis (the coin or die is fair) and an alternative one (that it is biased). In the first case, the probability of one outcome is specified precisely, but in the other, the probability of the outcome is unknown. Since it is unknown, the Bayesian thing to do is to regard the bias itself to be a state of nature, and put a prior on it. Then we have a prior on the two hypotheses (fair, biased).

This is one case where we actually need to put a normalized prior on the value of the bias. Unlike the cases we have treated so far, in the final analysis in this case, there is no cancellation of factors. So in the biased case, we assumed biases of 0.05, 0.15, 0.25,…,0.95, and put a prior 1/10 on each. (If we wanted to be more precise, we could put the prior on 0.005, 0.015, 0.025,…,0.995 and use a prior of 1/100 on each; that would require a spreadsheet calculation). The likelihood under this case is P(data|p, biased)=ph(1-p)t, where h is the number of heads and t the number of tails (the data). The prior is P(p|biased)=1/10, and the joint probabililty is P(data|p,biased)P(p|biased)=P(data,p|biased). Summing over all values of p (getting the marginal) gives us P(data|biased). Here’s a snapshot of the whiteboard that results (not all numbers are filled in):

HW 1

Spreadsheet for biased (loaded) case

The calculation for the fair case is easier. In our example, if the die is fair, p=1/3 and (1-p)=2/3, so the likelihood is P(data|fair)=(1/3)h(2/3)t.

We adopted P(fair)=P(biased)=1/2. With these, we are now able to calculate the joint probabilities from the marginal for the biased case and the likelihood from the fair case:

P(data,fair)=P(data|fair)P(fair), P(data,biased)=P(data|biased)P(biased).

But also, P(data,fair)=P(fair|data)P(data), P(data,biased)=P(biased|data)P(data). Dividing these two we get just the ratio P(fair|data)/P(biased|data), which is the posterior odds ratio. Because we chose P(fair)=P(biased), this is also equal to B=P(data|fair)/P(data|biased), which is the Bayes factor. The probability of fair, given the data, is equal to B/(1+B). Here’s the whiteboard after this calculation:

Calculation of the posterior probability of fair

I then described a practical application of this theory. There was a project at Princeton University which was attempting to find evidence for paranormal powers. In one of the experiments, a student was placed in front of a device that randomly flashed red and green lights, and attempted, by pure thought, to “influence” the device so as to make the number of flashes of one of the colors greater than the number of flashes of the other color. The desired color was changed from time to time, so that on some runs, the student tried to make red flash more often, and on some others, green:

The experimental setup for a parapsychology experiment

I read a paper by these experimenters; they reported data on over 100 million trials that they had conducted over the years (various students). In these trials, there were an excess of 18,471 flashes in the desired direction, less than 0.02% of the total. Even though this was a very small excess in absolute terms, the p-value, that is to say, the probability of getting an excess of 18,471 or more flashes, was also very small, about 0.0003 (I got the number wrong on the board, there should be one more zero!) This would be regarded as a highly significant rejection of the hypothesis that the device is fair.

Yet, the Bayesian calculation is very different! Doing the exact calculation that is approximated by the spreadsheet method we described above, I found that the Bayes factor was about B=12, which corresponds to a posterior probability in favor of the fair hypothesis of about 0.92! The Bayesian calculation supports the hypothesis that the device is fair, in contradiction to the significance test.

I wrote a paper on this subject and published it in the same journal where the original research was reported. This led to an exchange of letters to the editor.

Bayes factor for parapsychology experiment

How to explain the discrepancy? First, note that the p-value and the posterior probability are different things. The posterior probability is the probability of fair, given the data. But the p-value is the probability of the data, or any data even more extreme, given that the device is fair. We really want the Bayesian answer, but the frequentist calculation can’t give us that.

Bayesians regard the frequentist calculation as the right answer to the wrong question. It has a number of defects: First, it doesn’t say anything about any probabilities if the device is biased, yet it purports to tell us something about the coin being biased. Secondly, the probability calculated is based not only on the data that were observed (18,471), but also on all the possible data that were more extreme and which were not observed! Moreover, the more extreme data are not expected to be observed, just because they are more extreme. There seems to be something incoherent about basing a conclusion mostly on data that were not observed and were not even expected to be observed!

Dennis Lindley, a British statistician, pointed out that just this kind of outcome can happen: A statistical significance test (the p-value) can reject the “fair” hypothesis with a very small p-value, yet the Bayesian calculation can strongly favor that hypothesis.

HCOL 195 10/28/09

October 29, 2009 by bayesrules

Today we picked up where we left off last time. Here’s the board as we left it then:

Estimating Losses

No one thought that the value of p should be as big as 1/2; p=0.1 seems to be close to the median for the class. When you put this in, then the value of the loss for CI that makes the two branches have the same expected loss is 10.

Next, we put up a chart showing how a juror would decide a case. The juror would put the value of p that he or she has estimated from the data, and pick the decision that had the lowest expected loss:

Jury Decision Chart

But as the tree shows, this means that the strength of the evidence that would put an innocent person in jail isn’t really very great. Considering that the standard should be “beyond a reasonable doubt,” a probability of 0.9 for guilt seems too low (and the class unanimously thought so). So we bumped the loss for CI up to 100, which means you’d have to be 99% sure of guilt before you’d convict:

Revised Decision Chart

We then considered priors, and DNA evidence. For a prior, we considered the idea that in a geographic area, without any evidence (that is, picking someone at random), the prior for someone being guilty should be approximately 1/N where N is the population of the geographic area. So, for example, in Chittenden County we estimated the population at approximately 100,000 (it is actually about 50% higher than that). So the prior probability of guilt is about 1/100,000.

The question was raised, shouldn’t we use something like 1/2? After all, the person is on trial! They wouldn’t get there if they were almost sure of being innocent! The problem with this line of reasoning is that the person would have been indicted by a Grand Jury, which would have based its indictment on the very same evidence that the jury is supposed to consider in the trial. So, even if the evidence convinced the Grand Jury that a trial was warranted, e.g., the the probability of guilt was over 0.5, to use that number as a prior would in effect be using the same data twice, which is forbidden in Bayesian inference. You have to use a prior that is independent of any of the evidence that will come up in trial, one that depends only on general principles that are known outside of the details of the crime or the defendant. The population idea is one such; you might get another factor of two if the defendant were a man, since most crimes are committed by men. But that’s an insignificant factor. Also, that factor can just as well be built into the likelihood, which is probably a better place to do it.

We then considered hypothetical DNA data that has a probability of 1 in a million of matching a randomly chosen person (but a 1 in 1 chance of matching the perpetrator, of course). This is commonly thought to mean that there is a 1 in a million chance that the defendant is innocent, but this is incorrect. P(match|innocent) is not equal to P(innocent|match), and thinking that they are equal is known as the “prosecutor’s fallacy.” The actual calculation is shown in the chart below:

DNA Decision

The calculation gives a probability of guilt at about 0.9, which is insufficent to convict if the loss for CI is 100. More (independent) evidence would be needed.

We then turned to the O.J. Simpson case. One of his lawyers had remarked to the press that in any given year, only 1 in 2500 batterers goes on to murder his partner. He meant this to show that it was unlikely that O.J. committed the crime, but it doesn’t take into account the fact that in a given year, only 1 in 20,000 women is killed by a random stranger:

Probabilities for OJ Simpson

When this information is entered into a Natural Frequencies chart, imagining a base population of 100,000 battered women, about 40 (that is, 100,000/2,500) will be killed by their batterer, but only 5 (that is, 100,000/20,000) would be killed by some random stranger (of the kind that O.J. himself claimed to be “seeking.”) So, the probability that the batterer does the deed is 40/45, greater than 0.9. Thus, the evidence that Dershowitz brought forward actually supports the hypothesis that O.J. did the deed, rather than undermining it.

OJ Simpson Natural Frequencies Chart

HCOL 195 10/26/09

October 26, 2009 by bayesrules

Reminder: The project handout just had some ideas, you aren’t restricted to them and I am delighted when a group works on something completely new.

Note that there’s a difference between Bayes and frequentist ideas. In particular, in Bayesian thought it is perfectly legitimate to talk about the probability of something that just happens to be unknown to us, but is perfectly certain. For example, we can talk about the probability that the Nile is over 1000 miles long…think of it as a bet, for example, what odds would you be willing to give someone else to take either side of a bet that the Nile is over 1000 miles long? If you would be willing to bet at double or nothing, for example, then you think that it’s a 50% probability that the Nile is over 1000 miles long. Frequentists aren’t allowed to use probability this way.

In particular, you should not be thinking of the utilities and losses we’ve been discussing in terms of many, many bets. For example, if you own a house, you shouldn’t be thinking of a lot of identical situations where your house may or may not burn down in a given year. Either it does or it doesn’t. If the house is worth, say, $200,000, and there is a 1 in 1000 chance that it burns down in a year, then the fair value of the expectation of a bet with an insurance company (the premium) that the house will burn down is $200, but you would never get an insurance company to take that bet. They will require significantly more, to cover their fixed expenses and (over many different houses with many different customers) to have a high probability of making a profit for their shareholders.

When we discussed a sure $100,000 versus a 50:50 bet of $1,000,000 or nothing, many preferred the sure thing. This is because the additional $900,000 isn’t (for these folks) as valuable as the first $100,000.

I’ve already posted the link to the podcast on Elinor Ostrom (see previous post). The podcast says everything.

I noted that the patient is the one that has the responsibility to make decisions about medical care. This is because the patient is the one that suffers the consequences. The role of the doctor is to explain the treatments, the consequences, and how likely the various outcomes are, in a way that the patient can understand well enough to make informed decisions. Similarly, lawyers cannot tell their customers what they should do. They are like doctors: Explain the law, and the probable consequences if the client decides on various different courses of action.

There are no riskless actions. Even just lying in bed has risks. Crossing the street, you could get hit by a bus and killed. You are willing to do this for a meal worth a few dollars only because the risk of getting hit by a bus is very low. This can be used in principle to decide on how valuable (in dollars) you think your life is.

One student showed that it is better to buy two lottery tickets on different numbers than to buy two tickets on the same number.

We then discussed the problem of which is worse: Convicting an innocent person (CI) or acquitting a guilty one (AG).

If you acquit a guilty one, then that person will be free to reoffend; on the other hand, the fact that we have his fingerprints, DNA, picture, and other information about him might serve to deter him to some degree and will make him easier to catch.

If you convict an innocent one, then the real culprit is still at large, free to commit another crime. We don’t have any accurate information about the culprit (no DNA, no picture, no name, no prints), and the police will have stopped looking for him. So he may be more prone to commit other crimes. In addition, there is an innocent person in prison, which is another bad thing.

On balance, it seems that CI is worse than AG.

This leads us to consider the following decision tree (assuming that the good outcomes, CG and AI are equally good with a loss of 0). We adopt a loss of 1 for the intermediate decision, AG. We put the worst one, CI, at the top of the chance node. I asked you to think about what value of p would make you indifferent between the two decisions. We’ll discuss it next time.

Tree to decide on how bad CI is relative to AG

Podcast about Nobel Prize winner Elinor Ostrom

October 26, 2009 by bayesrules

As I mentioned in class today, there is an interesting discussion with Nobel Prize winner Elinor Ostrom on the NPR website. It may be listened to or downloaded here.

In addition, there are several interesting letters on the mammogram/prostate cancer discussion, which can be found here.

Finally, there’s the article about the M. D. Anderson Cancer Center, in Houston, that I mentioned. You can find it here. Dr. Don Berry, who is mentioned, is a Bayesian statistician. He heads their division of quantitative sciences.

HCOL 195 10/23/09

October 25, 2009 by bayesrules

Today we discussed the homework.

First the lottery problem. There were several things here that not everyone thought of. One important thing is that there were 200,000,000 tickets and a chance of 1/80,000,000 that a particular ticket would win. This means that in a series of such lotteries, we can expect 2.5 tickets to win on average, so you’d have to share your prize with 2.5 people, making the prize worth about 112,000,000 to you, not 280,000,000.

A refinement of this is to figure out the probability P(n) that there will be n=0, 1, 2,… other winners, and thus figure out the amount that you’d win in each of these cases, setting up a probability tree with many nodes instead of just putting $112 M. If p is the probability that a single ticket wins (p=1/80 M), then (1-p) is the probability than the ticket loses, and the probability that all N=200,000,000 tickets lose is (1-p)N=0.0821. The probability that a specific ticket wins and all the others lose is p*(1-p)(N-1)=p*0.0821, since there is hardly any difference between (1-p)N and (1-p)(N-1). But there are N tickets out there, so the probability that one of them wins and all the others lose is N*p*0.0821=0.2052 (I just realized I wrote the wrong number on the whiteboard). For two, the probability that a particular two win, one buying the ticket first and the other later, and all the others lose, is p2*0.0821; but there are N of the first and (N-1) of the second, giving a factor of N*(N-1), which is essentially N2, and there are two orders in which the tickets could have been bought, so this has to be multiplied by 1/2, giving a probability for two winners other than yourself of (N*p)2*0.0821/2=0.2565. In general, for k other winners, the probability is (N*p)k*0.0821/k!. The calculation of the probabilities of the various branches is outlined here:

Probabilities of the branches

Probabilities of the branches

The first few of these are in the picture of the completed tree (without calculations):

Lottery Tree

Lottery Tree

But there are two other flies in the ointment. First is taxes: You don’t get to keep all the money, you have to give Uncle Sam 39% and some to Jim Douglas (if you live in Vermont). Second is annuitization: The only way you can get the full jackpot is to have the lottery buy an annuity for you that will pay you the amount over 20 years in equal installments. But if you take the money immediately (probably the best choice), they will only give you the amount that they have to pay the insurance company for the annuity, which is about half of the jackpot. So you will get about 0.5*0.6=0.3 of the amounts in the figure. So, if you are the only winner, your net take after taxes would be, not $112 M, but only $33.6 M. That’s the figure that really should be entered as the gain, and when you do this (and similarly for the other numbers), the “buy the ticket” branch will actually have a loss.

What does the lottery do with the $140 M that it doesn’t have to pay out? It uses it to finance its beneficiary, education mostly. So net, the lottery is actually a tax that people are willing to pay.

Then we turned to the lawsuit problem. Most groups did a pretty good job; there were some calculational glitches but they were minor. One group added a third branch, “just continue the lawsuit,” which wasn’t among the choices, but the tree says that this isn’t the best choice (I’ve added this below). The only unusual item in this tree is the second box that comes if the other side makes a counter-counter offer of $3 B. Since this choice comes later on in the logic, it is to the right of the first decision box. The final tree is here:

Lawsuit tree

Lawsuit tree

HCOL 195 10/21/09

October 22, 2009 by bayesrules

This will be short. We filled out the “attitude towards risk” form and plotted our risk profile. We found three typical forms, namely:

Risk Averse Profile

Risk Averse Profile

The first, risk-averse profile is a very common profile; it says that a person, when considering a gain, is willing to accept less than the fair or expected value of a risky proposition in order to lock in a sure thing gain. So, for example, one might be willing to accept $4,000 as a sure thing rather than a 50-50 chance on a gain of $10,000. Similarly, one might be willing to accept a sure loss rather than run the risk of a much larger loss that only happens with some probability. Most people have a risk profile like this.

Risk Neutral Profile

Risk Neutral Profile

This profile is risk neutral (a straight line). It represents the risk profile of a large company, like an insurance company, that has many bets out, some of which it will win and some of which it will lose, but which can be predicted statistically with high accuracy by the companies actuaries. The difference between this kind of risk profile for an insurance company and the risk-averse profile of a typical insurance buyer (the first plot) explains how it is that people will willingly buy insurance, willing to pay a fixed premium to an insurance company, to (for example) avoid financial disaster if their house burns down, and at the same time the insurance company is willing to take on this risk, since they can charge each policy holder a premium that will, on average, more than cover the expected losses in aggregate. Because of this difference, the insurance company can expect a profit with a high degree of certainty, a profit that will be distributed to the shareholders as a dividend or (in the case of a mutual insurance company, which is owned by the people who have policies) a reduction of premiums.

Risk Seeking Profile

Risk Seeking Profile

This last profile is anomalous: It is risk averse for gains, but risk seeking for losses. It is sometimes seen in practice at casinos, where someone who is “down” may take extraordinary risks to try and get even. This is not usually a good idea.

Finally, we had a visit from Brit Chace, the HC Student Fellowship Advisor, on fellowship and scholarship opportunities (like the Rhodes and Marshall Scholarships, which allow students to study in England, and the Goldwater Scholarships). Her office is right across the hall from our classroom, and she encourages everyone to visit with her and discuss these opportunities.

New York Times article today

October 21, 2009 by bayesrules

The Times had an article today that discusses the consequences of false positives in the context of mammography and the PSA test for prostate cancer. Worth reading!

And here’s another article that came out on Thursday morning.