Archive for October, 2009

HCOL 195 10/28/09

October 29, 2009

Today we picked up where we left off last time. Here’s the board as we left it then:

Estimating Losses

No one thought that the value of p should be as big as 1/2; p=0.1 seems to be close to the median for the class. When you put this in, then the value of the loss for CI that makes the two branches have the same expected loss is 10.

Next, we put up a chart showing how a juror would decide a case. The juror would put the value of p that he or she has estimated from the data, and pick the decision that had the lowest expected loss:

Jury Decision Chart

But as the tree shows, this means that the strength of the evidence that would put an innocent person in jail isn’t really very great. Considering that the standard should be “beyond a reasonable doubt,” a probability of 0.9 for guilt seems too low (and the class unanimously thought so). So we bumped the loss for CI up to 100, which means you’d have to be 99% sure of guilt before you’d convict:

Revised Decision Chart

We then considered priors, and DNA evidence. For a prior, we considered the idea that in a geographic area, without any evidence (that is, picking someone at random), the prior for someone being guilty should be approximately 1/N where N is the population of the geographic area. So, for example, in Chittenden County we estimated the population at approximately 100,000 (it is actually about 50% higher than that). So the prior probability of guilt is about 1/100,000.

The question was raised, shouldn’t we use something like 1/2? After all, the person is on trial! They wouldn’t get there if they were almost sure of being innocent! The problem with this line of reasoning is that the person would have been indicted by a Grand Jury, which would have based its indictment on the very same evidence that the jury is supposed to consider in the trial. So, even if the evidence convinced the Grand Jury that a trial was warranted, e.g., the the probability of guilt was over 0.5, to use that number as a prior would in effect be using the same data twice, which is forbidden in Bayesian inference. You have to use a prior that is independent of any of the evidence that will come up in trial, one that depends only on general principles that are known outside of the details of the crime or the defendant. The population idea is one such; you might get another factor of two if the defendant were a man, since most crimes are committed by men. But that’s an insignificant factor. Also, that factor can just as well be built into the likelihood, which is probably a better place to do it.

We then considered hypothetical DNA data that has a probability of 1 in a million of matching a randomly chosen person (but a 1 in 1 chance of matching the perpetrator, of course). This is commonly thought to mean that there is a 1 in a million chance that the defendant is innocent, but this is incorrect. P(match|innocent) is not equal to P(innocent|match), and thinking that they are equal is known as the “prosecutor’s fallacy.” The actual calculation is shown in the chart below:

DNA Decision

The calculation gives a probability of guilt at about 0.9, which is insufficent to convict if the loss for CI is 100. More (independent) evidence would be needed.

We then turned to the O.J. Simpson case. One of his lawyers had remarked to the press that in any given year, only 1 in 2500 batterers goes on to murder his partner. He meant this to show that it was unlikely that O.J. committed the crime, but it doesn’t take into account the fact that in a given year, only 1 in 20,000 women is killed by a random stranger:

Probabilities for OJ Simpson

When this information is entered into a Natural Frequencies chart, imagining a base population of 100,000 battered women, about 40 (that is, 100,000/2,500) will be killed by their batterer, but only 5 (that is, 100,000/20,000) would be killed by some random stranger (of the kind that O.J. himself claimed to be “seeking.”) So, the probability that the batterer does the deed is 40/45, greater than 0.9. Thus, the evidence that Dershowitz brought forward actually supports the hypothesis that O.J. did the deed, rather than undermining it.

OJ Simpson Natural Frequencies Chart

HCOL 195 10/26/09

October 26, 2009

Reminder: The project handout just had some ideas, you aren’t restricted to them and I am delighted when a group works on something completely new.

Note that there’s a difference between Bayes and frequentist ideas. In particular, in Bayesian thought it is perfectly legitimate to talk about the probability of something that just happens to be unknown to us, but is perfectly certain. For example, we can talk about the probability that the Nile is over 1000 miles long…think of it as a bet, for example, what odds would you be willing to give someone else to take either side of a bet that the Nile is over 1000 miles long? If you would be willing to bet at double or nothing, for example, then you think that it’s a 50% probability that the Nile is over 1000 miles long. Frequentists aren’t allowed to use probability this way.

In particular, you should not be thinking of the utilities and losses we’ve been discussing in terms of many, many bets. For example, if you own a house, you shouldn’t be thinking of a lot of identical situations where your house may or may not burn down in a given year. Either it does or it doesn’t. If the house is worth, say, $200,000, and there is a 1 in 1000 chance that it burns down in a year, then the fair value of the expectation of a bet with an insurance company (the premium) that the house will burn down is $200, but you would never get an insurance company to take that bet. They will require significantly more, to cover their fixed expenses and (over many different houses with many different customers) to have a high probability of making a profit for their shareholders.

When we discussed a sure $100,000 versus a 50:50 bet of $1,000,000 or nothing, many preferred the sure thing. This is because the additional $900,000 isn’t (for these folks) as valuable as the first $100,000.

I’ve already posted the link to the podcast on Elinor Ostrom (see previous post). The podcast says everything.

I noted that the patient is the one that has the responsibility to make decisions about medical care. This is because the patient is the one that suffers the consequences. The role of the doctor is to explain the treatments, the consequences, and how likely the various outcomes are, in a way that the patient can understand well enough to make informed decisions. Similarly, lawyers cannot tell their customers what they should do. They are like doctors: Explain the law, and the probable consequences if the client decides on various different courses of action.

There are no riskless actions. Even just lying in bed has risks. Crossing the street, you could get hit by a bus and killed. You are willing to do this for a meal worth a few dollars only because the risk of getting hit by a bus is very low. This can be used in principle to decide on how valuable (in dollars) you think your life is.

One student showed that it is better to buy two lottery tickets on different numbers than to buy two tickets on the same number.

We then discussed the problem of which is worse: Convicting an innocent person (CI) or acquitting a guilty one (AG).

If you acquit a guilty one, then that person will be free to reoffend; on the other hand, the fact that we have his fingerprints, DNA, picture, and other information about him might serve to deter him to some degree and will make him easier to catch.

If you convict an innocent one, then the real culprit is still at large, free to commit another crime. We don’t have any accurate information about the culprit (no DNA, no picture, no name, no prints), and the police will have stopped looking for him. So he may be more prone to commit other crimes. In addition, there is an innocent person in prison, which is another bad thing.

On balance, it seems that CI is worse than AG.

This leads us to consider the following decision tree (assuming that the good outcomes, CG and AI are equally good with a loss of 0). We adopt a loss of 1 for the intermediate decision, AG. We put the worst one, CI, at the top of the chance node. I asked you to think about what value of p would make you indifferent between the two decisions. We’ll discuss it next time.

Tree to decide on how bad CI is relative to AG

Podcast about Nobel Prize winner Elinor Ostrom

October 26, 2009

As I mentioned in class today, there is an interesting discussion with Nobel Prize winner Elinor Ostrom on the NPR website. It may be listened to or downloaded here.

In addition, there are several interesting letters on the mammogram/prostate cancer discussion, which can be found here.

Finally, there’s the article about the M. D. Anderson Cancer Center, in Houston, that I mentioned. You can find it here. Dr. Don Berry, who is mentioned, is a Bayesian statistician. He heads their division of quantitative sciences.

HCOL 195 10/23/09

October 25, 2009

Today we discussed the homework.

First the lottery problem. There were several things here that not everyone thought of. One important thing is that there were 200,000,000 tickets and a chance of 1/80,000,000 that a particular ticket would win. This means that in a series of such lotteries, we can expect 2.5 tickets to win on average, so you’d have to share your prize with 2.5 people, making the prize worth about 112,000,000 to you, not 280,000,000.

A refinement of this is to figure out the probability P(n) that there will be n=0, 1, 2,… other winners, and thus figure out the amount that you’d win in each of these cases, setting up a probability tree with many nodes instead of just putting $112 M. If p is the probability that a single ticket wins (p=1/80 M), then (1-p) is the probability than the ticket loses, and the probability that all N=200,000,000 tickets lose is (1-p)N=0.0821. The probability that a specific ticket wins and all the others lose is p*(1-p)(N-1)=p*0.0821, since there is hardly any difference between (1-p)N and (1-p)(N-1). But there are N tickets out there, so the probability that one of them wins and all the others lose is N*p*0.0821=0.2052 (I just realized I wrote the wrong number on the whiteboard). For two, the probability that a particular two win, one buying the ticket first and the other later, and all the others lose, is p2*0.0821; but there are N of the first and (N-1) of the second, giving a factor of N*(N-1), which is essentially N2, and there are two orders in which the tickets could have been bought, so this has to be multiplied by 1/2, giving a probability for two winners other than yourself of (N*p)2*0.0821/2=0.2565. In general, for k other winners, the probability is (N*p)k*0.0821/k!. The calculation of the probabilities of the various branches is outlined here:

Probabilities of the branches

Probabilities of the branches

The first few of these are in the picture of the completed tree (without calculations):

Lottery Tree

Lottery Tree

But there are two other flies in the ointment. First is taxes: You don’t get to keep all the money, you have to give Uncle Sam 39% and some to Jim Douglas (if you live in Vermont). Second is annuitization: The only way you can get the full jackpot is to have the lottery buy an annuity for you that will pay you the amount over 20 years in equal installments. But if you take the money immediately (probably the best choice), they will only give you the amount that they have to pay the insurance company for the annuity, which is about half of the jackpot. So you will get about 0.5*0.6=0.3 of the amounts in the figure. So, if you are the only winner, your net take after taxes would be, not $112 M, but only $33.6 M. That’s the figure that really should be entered as the gain, and when you do this (and similarly for the other numbers), the “buy the ticket” branch will actually have a loss.

What does the lottery do with the $140 M that it doesn’t have to pay out? It uses it to finance its beneficiary, education mostly. So net, the lottery is actually a tax that people are willing to pay.

Then we turned to the lawsuit problem. Most groups did a pretty good job; there were some calculational glitches but they were minor. One group added a third branch, “just continue the lawsuit,” which wasn’t among the choices, but the tree says that this isn’t the best choice (I’ve added this below). The only unusual item in this tree is the second box that comes if the other side makes a counter-counter offer of $3 B. Since this choice comes later on in the logic, it is to the right of the first decision box. The final tree is here:

Lawsuit tree

Lawsuit tree

HCOL 195 10/21/09

October 22, 2009

This will be short. We filled out the “attitude towards risk” form and plotted our risk profile. We found three typical forms, namely:

Risk Averse Profile

Risk Averse Profile

The first, risk-averse profile is a very common profile; it says that a person, when considering a gain, is willing to accept less than the fair or expected value of a risky proposition in order to lock in a sure thing gain. So, for example, one might be willing to accept $4,000 as a sure thing rather than a 50-50 chance on a gain of $10,000. Similarly, one might be willing to accept a sure loss rather than run the risk of a much larger loss that only happens with some probability. Most people have a risk profile like this.

Risk Neutral Profile

Risk Neutral Profile

This profile is risk neutral (a straight line). It represents the risk profile of a large company, like an insurance company, that has many bets out, some of which it will win and some of which it will lose, but which can be predicted statistically with high accuracy by the companies actuaries. The difference between this kind of risk profile for an insurance company and the risk-averse profile of a typical insurance buyer (the first plot) explains how it is that people will willingly buy insurance, willing to pay a fixed premium to an insurance company, to (for example) avoid financial disaster if their house burns down, and at the same time the insurance company is willing to take on this risk, since they can charge each policy holder a premium that will, on average, more than cover the expected losses in aggregate. Because of this difference, the insurance company can expect a profit with a high degree of certainty, a profit that will be distributed to the shareholders as a dividend or (in the case of a mutual insurance company, which is owned by the people who have policies) a reduction of premiums.

Risk Seeking Profile

Risk Seeking Profile

This last profile is anomalous: It is risk averse for gains, but risk seeking for losses. It is sometimes seen in practice at casinos, where someone who is “down” may take extraordinary risks to try and get even. This is not usually a good idea.

Finally, we had a visit from Brit Chace, the HC Student Fellowship Advisor, on fellowship and scholarship opportunities (like the Rhodes and Marshall Scholarships, which allow students to study in England, and the Goldwater Scholarships). Her office is right across the hall from our classroom, and she encourages everyone to visit with her and discuss these opportunities.

New York Times article today

October 21, 2009

The Times had an article today that discusses the consequences of false positives in the context of mammography and the PSA test for prostate cancer. Worth reading!

And here’s another article that came out on Thursday morning.

HCOL 195 10/19/09

October 20, 2009

We discussed the test. No problems with #1. On #2 the most effective way to do it is to list the possibilities and count up those that have the first child a boy (hence the king) and then count the individual cases: two brothers, two sisters, one brother and one sister. The four cases that are relevant are


Note that BBG and BGB are not the same. Therefore there is a 1/4 probability of two brothers, the same for two sisters, and a 1/2 probability of one brother and one sister. These add up to 1.

In problem #3 there are 10 SON (1,2,…,10); The likelihood for each SON is (SON/SON)*((SON-1)/SON)*(2/SON)*(2/SON). The denominator is always the SON since that’s how many fish there are in the lake each time we catch one. The first two numerators represent the number of untagged fish left in the lake, and the second two the number of tagged fish in the lake, for the four fish we caught. One student started with the smallest SON=5, but that’s not what the statement of the problem says. Also, tagged vs. untagged are not states of nature, they are data.

Problem #4 is easiest done by using natural frequencies: If we have 2000 patients (may as well use that number as it is directly useful for the last question), then 1%, or 20 of patients will have the disease and 1980 will not. Of the 20 that have the disease, 19, or 95%, will test positive. The remaining one will test negative. Of the 1980 patients who don’t have the disease, 4%, or 79 will test positive (it’s really 79.2, but we can round here without sensible error). That’s the answer to the number of false positives in the group of 2000 patients. The probability of having the disease, given that you test positive, is 19/(19+79)=19/98, or a little over 0.19.

The fifth problem has a table of independence. The marginals are .25 and .75 in the horizontal direction and .5, .2 and .3 in the vertical direction. Each entry in the joint table is the product of the corresponding marginals, which proves the result. To make it independent, you can add a fixed number to two rows and subtract the same number to two columns; this would involve four numbers changed in the table.

For the last problem, pick to use either gains or losses and stick to it. Losses is easiest; then there is a loss of $800×10 million or $8 billion if we require installation; if we do not require it, then there will be a loss of 10,000x$5 million, or $50 million due to lives lost that might have been saved. The second loss is greater, so we should reject that branch and require installation of the safety device. Note that you use each number exactly once: Some students tried to use the numbers on both branches, once as a gain and once as a loss, but that doesn’t work.

I asked whether people would prefer $100,000 as a sure thing or a 50% chance at $1 million and a 50% chance of nothing. About half the class preferred the sure thing, and half the gamble. We then said, what if the probability of getting the $1 million were 0.1, 0.2,…,0.9, 1.0. As the probability ramped up, more people were willing to take the gamble, but two students would only go for the $1 million if it were a “sure thing”, that is the probability were 1.

Then we talked about being on a jury. We decided that the four possibilities are: AI (acquit someone who is innocent), CI (convict someone who is innocent), AG (acquit someone who is guilty) and CG (convict someone who is guilty). We discussed which of these were the best and the worst outcomes. While it is clear that making a right decision (AI or CG) is good, and making a wrong decision (CI or AG) is bad, we didn’t come to agreement as to how to order the two good ones and the two bad ones. We’ll bring this up again later.

HCOL 195 091016

October 17, 2009

Today we just talked about my experiences with the Hubble Telescope project, and in particular how the bad mirror happened and what was done about it.

Monday, I’ll return the graded tests and discuss the results.


October 14, 2009

Some important points:

There is NO journal due on Friday this week. Next journal is due on Friday, October 23.

I intend to talk about something fun on Friday, namely, how we use Bayesian inference to solve problems in astronomy. I’ll describe some work that I have been involved with.

HCOL 195 10/12/09

October 12, 2009

The first thing we looked at was the decision problem at the end of the study sheet.

Recall that the factory can produce 24 parts per day. At the beginning of the day there is a 90% chance that the machine is in a “good” state that will produce 95% of parts “good”, worth $2000 each, and 5% “bad”, worth nothing. But there is a 10% chance that the machine is in a “bad” state that will produce only 70% “good” parts and 30% “bad” parts.

We can test a part quickly and at no cost to tell if it is good or bad.

We can call a repairman who can, for $150, put the machine into its “good” state; but that takes the same time as it takes to produce a part, so if we call him in, we’ll make one fewer parts.

We decided that there are three things we could do:

  1. Just run the machine and make parts, regardless of the state of the machine
  2. Call the repairman at the beginning of the day and put the machine into the “good” state, regardless
  3. Produce one part; check it. If it’s good, run the machine for the rest of the day as it is. If it’s bad, call the repairman in and have him put the machine in its “good” state and then produce the parts for the rest of the day

So we set up a decision tree to correspond to this.

Problem on decision theory from review sheet

Problem on decision theory from review sheet

We weren’t quite sure where to put the toll gate for the +$2000 value of the good part produced in the third scenario. We put it before the probability; I now think we should have put it afterwards. Not to worry, we’ll discuss this on Friday and in any case any decision tree I put on the exam won’t have this kind of complication.

Recall that on the exam I will want you to set up the tree, but not do the calculation. But I will expect you to explain to me how the calculation of the tree should be done.

I was asked about what marginalization means. It means adding up a joint probability over all the appearances of one of the variables, so as to be able to ignore that variable. So, in the tree below, to get the probability that someone will get a disease, regardless of whether or not they have the gene implicated in it, we add the probability of (disease, gene) to the probability of (disease, no gene) to get the probability of (disease). In the chart below, we had observed the result of the test (+) so this is the probability of getting the disease if you test positive.

What we said about marginalizing

What we said about marginalizing

Then we looked at two “Monty Hall” problems. Both are “4-door” problems. In both, you have picked Door #1. In the first, Monty opens one door, #2, to show a goat. Since he has a choice of three doors to open if the prize is behind the door you picked, the likelihood in that case is 1/3. He can’t open door #2 if the prize is there, so the likelihood in that case is 0. If the prize is behind either door #3 or #4 then he has a choice of two doors to open, and will open door #2 half the time, so the likelihood for those two cases is 1/2. Below is the spreadsheet we wrote down.

The first Monty Hall problem we solved

The first Monty Hall problem we solved

For such a simple spreadsheet with just a few numbers, you should be prepared to complete the calculation. You wouldn’t need a calculator for this, just simple skills with fractions.

The second problem is different. In this one, Monty opens all the doors except Door #4 (that is, he opens doors #2 and #3). If the prize is behind Door #1, he has a choice of doors to leave shut, either #2, #3 or #4. So it’s a probability of 1/3 that he opens doors #2 and #3 and leaves #4 shut. So that’s the likelihood for that case. If the prize is behind door #2 or #3, he can’t open one of those doors and must open door #4 to leave one door shut, so the likelihood that he opens #2 and #3 in either of these cases is 0. Finally, if the prize is behind door #4, he is certain to open both #2 and #3 so the likelihood in that case is 1. Here’s the start on that spreadsheet:

The second Monty Hall problem we solved

The second Monty Hall problem we solved

You all know how to complete this.

Questions were asked about independence and dependence. First, the tables. For a given set of row and sum marginal probabilities, there is only one way to write down a table of joint probabilities that represents independence. You use the fact that if the joint probabilities are independent, then they are the products of the marginals, because if we have independence, then P(x,y)=P(x)P(y):

Independent table of joint probabilities

Independent table of joint probabilities

Conversely, since if P(x,y)=P(x)P(y) for all x and y, to check if a table represents independence, just see if every joint probability is the product of the corresponding (row, column) marginals. If they are, the table is independent, otherwise not. IMPORTANT: You must check every entry in the table.

To convert a table that is dependent into one that is independent, just multiply the marginals as above. To convert a table that is independent into one that is dependent, but keep the same marginals, you can just add and subtract the same number to different entries in two rows and two columns, as below:

Converting an independent table into a dependent table

Converting an independent table into a dependent table

There is only one way that a particular set of marginals can represent independence; there are infinitely many ways that they can represent dependence.

You should be able to prove that all three of the following are equivalent (that is, if one of them is true for all x,y, then the other two are also true): P(x|y)=P(x), P(x,y)=P(x)P(y), P(y|x)=P(y). HINT: You can just use the definition of independence, and Bayes’ theorem.

We then looked at the plagiarism example. Here, in a table of 1000 numbers there will be about 100 that can be rounded either up or down because the end in a 5. By flipping a coin for each rounding, we can embed a secret code in the table. The probability that someone would independently have the same rounding pattern if he flipped coins is 1 in 2100 or about 1 in 1030. But if he copied our table, he would have the same rounding pattern for sure. So the spreadsheet looks like this:

Solution of the plagiarism problem

Solution of the plagiarism problem

I was asked why we don’t have to have the numbers in the prior column add up to 1. The reason is that whatever we choose, the common factor will cancel out in the end. In the above problem, if we put (a,a) in the prior column, the marginal will also have a factor of a, which will cancel out, regardless of the value of a, in the posterior column:

Why we dont have to normalize the prior (a cancels out!)

Why we don't have to normalize the prior (a cancels out!)