## HCOL 195 9/4/09

Today we did a coin-tossing experiment…all of you did very well, from a Bayesian point of view. Before I tossed the coin, everyone agreed that the probability of getting a head was (approximately) 0.5; but after it was tossed, after I looked at it, everyone agreed that from your point of view the probability hadn’t changed. This is because you had no information to update the probability. After I announced “tails,” peoples opinions changed in different ways. One opinion didn’t change, another decided that I was trustworthy and changed the probability to 0; but most were somewhere in between, because each student had to make an assessment of how reliable a witness I was, and you didn’t have much information to go on. After another student looked and announced “tails,” more people were willing to decide that it was indeed tails.

This experiment illustrated several points: 1) The conditional nature of probability. Probability assessments depend on the information you have, and different people with different information can legitimately make different assesments of the same (unknown) event. So 2) there is a subjective element to probability statements.

I drew the breast cancer/mammogram example in two different ways on the blackboard. The first took a sample of 1000 women, computed as a tree the number of women with and without undetected cancer. We estimated in the second branch of the tree the number of women with and without cancer who tested positive and negative. In the second tree, the numbers were just divided by 1000, the total number of women, and the numbers on the tree were now probabilities. From the trees we were able to read off:

1. The number of women who tested positive was 108
2. Correspondingly, the probability that a woman tested positive was 0.108. This is computed as the sum (now using conditional probability notation) $P(+)=P(+,D)+P(+,\bar{D})$, where $'+'$ means “tests positive”, $'D'$ means “has the disease” and $\bar{'D'}$ means “does not have the disease. This illustrates that if we sum probabilities over all the independent cases, we get a probability that represents all cases together. So, if we are only interested in the number of positives, and don’t care whether it is due to a detected case of disease or a false positive, we just sum over those two cases.
3. We then read off the probability chart that $P(+,D)=P(+ \mid D)P(D)$. We’ll talk more about this formula on Wednesday.
4. Finally, we read off the probability chart that $P(D \mid +)=\frac{P(+ \mid D)P(D)}{P(+)}$. This is Bayes’ theorem. It is our key equation for the entire course, and we’re going to see lots of ways of applying this very simple formula.