HCOL 195 10/12/09

The first thing we looked at was the decision problem at the end of the study sheet.

Recall that the factory can produce 24 parts per day. At the beginning of the day there is a 90% chance that the machine is in a “good” state that will produce 95% of parts “good”, worth \$2000 each, and 5% “bad”, worth nothing. But there is a 10% chance that the machine is in a “bad” state that will produce only 70% “good” parts and 30% “bad” parts.

We can test a part quickly and at no cost to tell if it is good or bad.

We can call a repairman who can, for \$150, put the machine into its “good” state; but that takes the same time as it takes to produce a part, so if we call him in, we’ll make one fewer parts.

We decided that there are three things we could do:

1. Just run the machine and make parts, regardless of the state of the machine
2. Call the repairman at the beginning of the day and put the machine into the “good” state, regardless
3. Produce one part; check it. If it’s good, run the machine for the rest of the day as it is. If it’s bad, call the repairman in and have him put the machine in its “good” state and then produce the parts for the rest of the day

So we set up a decision tree to correspond to this.

Problem on decision theory from review sheet

We weren’t quite sure where to put the toll gate for the +\$2000 value of the good part produced in the third scenario. We put it before the probability; I now think we should have put it afterwards. Not to worry, we’ll discuss this on Friday and in any case any decision tree I put on the exam won’t have this kind of complication.

Recall that on the exam I will want you to set up the tree, but not do the calculation. But I will expect you to explain to me how the calculation of the tree should be done.

I was asked about what marginalization means. It means adding up a joint probability over all the appearances of one of the variables, so as to be able to ignore that variable. So, in the tree below, to get the probability that someone will get a disease, regardless of whether or not they have the gene implicated in it, we add the probability of (disease, gene) to the probability of (disease, no gene) to get the probability of (disease). In the chart below, we had observed the result of the test (+) so this is the probability of getting the disease if you test positive.

Then we looked at two “Monty Hall” problems. Both are “4-door” problems. In both, you have picked Door #1. In the first, Monty opens one door, #2, to show a goat. Since he has a choice of three doors to open if the prize is behind the door you picked, the likelihood in that case is 1/3. He can’t open door #2 if the prize is there, so the likelihood in that case is 0. If the prize is behind either door #3 or #4 then he has a choice of two doors to open, and will open door #2 half the time, so the likelihood for those two cases is 1/2. Below is the spreadsheet we wrote down.

The first Monty Hall problem we solved

For such a simple spreadsheet with just a few numbers, you should be prepared to complete the calculation. You wouldn’t need a calculator for this, just simple skills with fractions.

The second problem is different. In this one, Monty opens all the doors except Door #4 (that is, he opens doors #2 and #3). If the prize is behind Door #1, he has a choice of doors to leave shut, either #2, #3 or #4. So it’s a probability of 1/3 that he opens doors #2 and #3 and leaves #4 shut. So that’s the likelihood for that case. If the prize is behind door #2 or #3, he can’t open one of those doors and must open door #4 to leave one door shut, so the likelihood that he opens #2 and #3 in either of these cases is 0. Finally, if the prize is behind door #4, he is certain to open both #2 and #3 so the likelihood in that case is 1. Here’s the start on that spreadsheet:

The second Monty Hall problem we solved

You all know how to complete this.

Questions were asked about independence and dependence. First, the tables. For a given set of row and sum marginal probabilities, there is only one way to write down a table of joint probabilities that represents independence. You use the fact that if the joint probabilities are independent, then they are the products of the marginals, because if we have independence, then P(x,y)=P(x)P(y):

Independent table of joint probabilities

Conversely, since if P(x,y)=P(x)P(y) for all x and y, to check if a table represents independence, just see if every joint probability is the product of the corresponding (row, column) marginals. If they are, the table is independent, otherwise not. IMPORTANT: You must check every entry in the table.

To convert a table that is dependent into one that is independent, just multiply the marginals as above. To convert a table that is independent into one that is dependent, but keep the same marginals, you can just add and subtract the same number to different entries in two rows and two columns, as below:

Converting an independent table into a dependent table

There is only one way that a particular set of marginals can represent independence; there are infinitely many ways that they can represent dependence.

You should be able to prove that all three of the following are equivalent (that is, if one of them is true for all x,y, then the other two are also true): P(x|y)=P(x), P(x,y)=P(x)P(y), P(y|x)=P(y). HINT: You can just use the definition of independence, and Bayes’ theorem.

We then looked at the plagiarism example. Here, in a table of 1000 numbers there will be about 100 that can be rounded either up or down because the end in a 5. By flipping a coin for each rounding, we can embed a secret code in the table. The probability that someone would independently have the same rounding pattern if he flipped coins is 1 in 2100 or about 1 in 1030. But if he copied our table, he would have the same rounding pattern for sure. So the spreadsheet looks like this:

Solution of the plagiarism problem

I was asked why we don’t have to have the numbers in the prior column add up to 1. The reason is that whatever we choose, the common factor will cancel out in the end. In the above problem, if we put (a,a) in the prior column, the marginal will also have a factor of a, which will cancel out, regardless of the value of a, in the posterior column:

Why we don't have to normalize the prior (a cancels out!)