I first talked about the homework due on Wednesday.

Every group did #1 just fine. On #2, one group didn’t realize that since there were three widgets sampled, the likelihood has to have three factors, one for each widget.

On #3, the spreadsheets were all just fine; one group accidentally left off the last number in each sum when summing up to 0.05, 0.10, 0.20…the terms for 0.045, 0.095 and 0.195 were left out of the sums. I’m surmising that it was just a mistake in setting up these sums from the spreadsheet.

On #4, there were two interesting errors, and both are worth paying attention to. The first one computed the likelihood by adding the probabilities of each data point (word observed), instead of multiplying. Here’s the important thing: The likelihood is *never* computed by adding, only by multiplying. When you add, you are computing the probability that event 1 *or* event 2 *or*…*or* event n occurred. But that’s not what the likelihood is. That’s the probability of event 1 *and* event 2 *and*…*and* event n occurring, which is what we had. When you see ‘or’, add; but when you see ‘and’, multiply. Likelihoods are *always* gotten by multiplying the appropriate probabilities for each event observed. The *only* time you add is when you are adding the joint probabilities to get the marginal, in preparation for computing the posterior distribution.

The second error was even more interesting. Unfortunately, I didn’t think to take a picture of the board for this. But what this group did was to try to compute the effect of each different word separately. So they set up a spreadsheet with priors 1/2, 1/2 for the two states of nature (Shakespeare, Marlowe), and likelihoods equal to the probability of the first word under that state of nature raised to the number of instances of that word seen. Then they computed the marginal and the joint. So far, so good.

At this point they have computed a posterior distribution, for the first word’s data. What they should have done is to use that as the prior for the second word’s data, then taken the posterior for that and used it as the prior for the third set of data. That would have given the same answer as gotten by the other groups that did it all in one calculation, with all the words in a single likelihood. It illustrates a great strength of the Bayesian approach: You can use a posterior computed from a partial data set as the prior for a calculation using a new set of data. You’ll always get the same answer as you would have gotten had you handled all the data at once.

Instead, this group tried to average the answers from the three calculations, with a prior of 1/2, 1/2 for each state of nature for each word’s calculation. That doesn’t work. To their credit, this group noted that they weren’t sure that this was the right thing to do: I encourage everyone to do as they did.

Now to the oil well problem. I suggested that a company is interested in drilling an oil well. If they drill it, it will cost them $12 million dollars ($12M). If the well comes in a “gusher” (G), then they will make $42M. If it is “dry” (D), they make nothing. The probability that the node is G is 1/3; that it is D, 2/3. There is also the option of doing a test that will tell you, with 90% accuracy, whether the well is going to be a gusher or not (that is, if it says that the well will be G, then the probability is 0.9 that it will be G; if it says that it will be D, then the probability is 0.9 that it will be D). The test costs $4M. So, what decision should the company make? Should it drill or not, and if it decides to drill, should it test beforehand and only drill if the test is positive (+) and not if it is negative (-)?

Here’s the board I drew that summarizes the problem:

We drew a decision tree:

This was a bit messy. But the new thing is the Toll Gates (marked by two triangles). When you pass through a toll gate, you have to add the amount there to the running total. *Note*: I’ve realized that I should have put the two toll gates representing the drilling of the well on the “test and then decide” branch *after* the probabilities of 0.9 or 0.1, and *not before*. That would make it clear that you should apply the toll prior to multiplying by the probability. The rest of it is as we talked about before. That is, the rules for evaluating a decision tree are:

(1) Starting on the right, add any toll gate to the amount to its right (that would be the amount at the tip of the branch when you start this). Do this for all branches coming out of that node.

(2a) If the node to the left is a chance (round) node, multiply the current amount on each branch by the probability of that branch, and add the probabilities to get the value of the node.

(2b) If the node to the right is a decision (square) node, cut off branches with greater than the minimum loss, or less than the maximum gain, and assign the value of the remaining branch to the node.

(3) Repeat (1) and (2) all the way back to the root of the tree. Only one choice will remain, and that is the decision to take.

In this case, we found that the company will expect to make the most money by choosing the “test, then decide” option.

## Leave a Reply