Today we picked up where we left off last time. Here’s the board as we left it then:
No one thought that the value of p should be as big as 1/2; p=0.1 seems to be close to the median for the class. When you put this in, then the value of the loss for CI that makes the two branches have the same expected loss is 10.
Next, we put up a chart showing how a juror would decide a case. The juror would put the value of p that he or she has estimated from the data, and pick the decision that had the lowest expected loss:
But as the tree shows, this means that the strength of the evidence that would put an innocent person in jail isn’t really very great. Considering that the standard should be “beyond a reasonable doubt,” a probability of 0.9 for guilt seems too low (and the class unanimously thought so). So we bumped the loss for CI up to 100, which means you’d have to be 99% sure of guilt before you’d convict:
We then considered priors, and DNA evidence. For a prior, we considered the idea that in a geographic area, without any evidence (that is, picking someone at random), the prior for someone being guilty should be approximately 1/N where N is the population of the geographic area. So, for example, in Chittenden County we estimated the population at approximately 100,000 (it is actually about 50% higher than that). So the prior probability of guilt is about 1/100,000.
The question was raised, shouldn’t we use something like 1/2? After all, the person is on trial! They wouldn’t get there if they were almost sure of being innocent! The problem with this line of reasoning is that the person would have been indicted by a Grand Jury, which would have based its indictment on the very same evidence that the jury is supposed to consider in the trial. So, even if the evidence convinced the Grand Jury that a trial was warranted, e.g., the the probability of guilt was over 0.5, to use that number as a prior would in effect be using the same data twice, which is forbidden in Bayesian inference. You have to use a prior that is independent of any of the evidence that will come up in trial, one that depends only on general principles that are known outside of the details of the crime or the defendant. The population idea is one such; you might get another factor of two if the defendant were a man, since most crimes are committed by men. But that’s an insignificant factor. Also, that factor can just as well be built into the likelihood, which is probably a better place to do it.
We then considered hypothetical DNA data that has a probability of 1 in a million of matching a randomly chosen person (but a 1 in 1 chance of matching the perpetrator, of course). This is commonly thought to mean that there is a 1 in a million chance that the defendant is innocent, but this is incorrect. P(match|innocent) is not equal to P(innocent|match), and thinking that they are equal is known as the “prosecutor’s fallacy.” The actual calculation is shown in the chart below:
The calculation gives a probability of guilt at about 0.9, which is insufficent to convict if the loss for CI is 100. More (independent) evidence would be needed.
We then turned to the O.J. Simpson case. One of his lawyers had remarked to the press that in any given year, only 1 in 2500 batterers goes on to murder his partner. He meant this to show that it was unlikely that O.J. committed the crime, but it doesn’t take into account the fact that in a given year, only 1 in 20,000 women is killed by a random stranger:
When this information is entered into a Natural Frequencies chart, imagining a base population of 100,000 battered women, about 40 (that is, 100,000/2,500) will be killed by their batterer, but only 5 (that is, 100,000/20,000) would be killed by some random stranger (of the kind that O.J. himself claimed to be “seeking.”) So, the probability that the batterer does the deed is 40/45, greater than 0.9. Thus, the evidence that Dershowitz brought forward actually supports the hypothesis that O.J. did the deed, rather than undermining it.