I posed the problem of medical tests (e.g., mammography, prostate cancer tests, etc.) These tests are not perfect. Sometimes they will report a problem when there is none. Sometimes they will miss a problem that is there. How should we think about the results of such tests? As an example, mammography is about 90% accurate in both of these challenges. If a woman has breast cancer, mammography will detect it correctly 90% of the time, but will report incorrectly 10% of the time. If a woman does not have breast cancer, mammography will report correctly 90% of the time that there is no cancer, but will give a false positive 10% of the time. But only 1% of the women in the population that gets routine mammography has an undetected cancer. If a woman gets a positive result, how worried should she be? The answer is, she should be worried, but not 90% worried. We worked this out in the following tree on the board.

Analysis of this tree led us to some basic rules that probabilities have to obey.

We can present the exact same calculation that came from the tree in a “spreadsheet” calculation.

While working on the spreadsheet, we defined some new terms: The first column has the *states of nature* (SON) we are interested in learning about, here, whether the patient has the disease or not. These are *mutually exclusive* (at most one can be true) and exhaustive (at least one must be true). P(D) is the *prior probability* that the person has the disease; we call it “prior” because it represents our best information about this, before looking at the data. Data are always known. Here the data will be the results of the mammogram, and will be either positive or negative. P(+|D) is the *likelihood*, that is, the probability of a positive mammogram, given that the person has the disease. P(+,D) is the *joint probability*, the probability of both having the disease and getting a positive mammogram. The *conditional probability law* that we got by looking at the tree (chart 2 above) says that P(+,D)=P(+|D)P(D). We add up all the joint probabilities to get P(+), the probability of the woman getting a positive mammogram, independent of whether she has the disease or not. Some of these positives will be true positives (because the woman has the disease), and others will be false positives. We call P(+) the *marginal*. Dividing the marginal into each joint probability gives us the *posterior probability*, for example P(D|+), the probability that the woman has the disease, given that the mammogram was positive. This is just using the conditional probability law again, in the form P(D|+)=P(D,+)/P(+). Note that P(D,+)=P(+,D); it doesn’t matter which order you put things in a joint probability. The probability of having the disease and getting a positive mammogram is equal to the probability of getting a positive mammogram and having the disease. The posterior probability is our goal, and the goal of every Bayesian analysis. It tells us everything that we can know about the states of nature, after we consider the data. Spreadsheets, trees. Both are correct, both are acceptable. Sometimes one is easier than the other for presenting the calculation. Use whichever method is best for you.

## Leave a Reply