We discussed the problem set. I said that in the future I want one paper per group; if there are disagreements within the group, they should be discussed in the paper.
On Problem 2, I noted that some groups were using set theory to try to solve it. That was not my intention. I wanted you to use just the axioms of probability theory as we discussed in class. Probability theory is a calculus on propositions A, B, …; a proposition such as A=”the Nile is longer than 1000 miles” isn’t a set. There are similarities between set theory and probability theory, but you should not confuse the two.
On Problem 3, the presentation is simplest if you ignore the actual numbers on the dice and just note whether the outcome is odd or even.
On Problem 4, the commonest problem was thinking that P(y|x=1) is a number. It is an array, because y is a vector with three components, and each component needs a probability. What many teams computed was P(x=1).
On Problem 5, most groups did find, but some didn’t note that the first two probabilities I gave were conditional probabilities, but the third was a joint probability. The trick is to figure out how to compute the conditional probability corresponding to the third number (0.05). This fills out the first row for independence. Then the second row is a constant multiple of the first row, and to get a table for dependence, just alter the numbers in the second row to make everything add up to 1.
On Problem 6, the main problem was that some teams didn’t give enough detail or explanation of how they reached the result or would explain it to a member of the general population.
We spent the majority of the time on #7. Here there are several issues: Is this sampling with replacement, or without? My position is that in general this should be considered sampling with replacement, which would make a factor of 1/N for each observation of a taxi. The reason is that if one had independent repeats of an observation of a taxi (as for example, observing taxi #3 the next morning) then that would reinforce our opinion that the number of taxis was small, as against the next morning observing (at random) taxi number 89. Just because we observed taxi #3 in the evening doesn’t mean that we can’t use our next-morning observation as new evidence. There are some subtleties, as we can’t follow a taxi down the road and glance away and glance back and think that’s another independent observation. They aren’t independent, and so you can’t just multiply!
Issue number two was the prior. N is the number of taxis in the city, and you are more likely to be parachuted into a small city than a large one (because there are more small cities than large ones); so the prior P(N) should be smaller for large N than for small ones. It’s not certain what one should use, but several groups used P(N)=1/N (improper prior but OK if the posterior is proper). More generally one might want to consider something like a power law which would look like for positive. There could be other choices.
Issue number 3 was minor, but just to note that not every 5% credible interval should look like a [0.025, 0.975] interval. That is because in this particular case, the largest amount of posterior probability is in the [0, 0.025] interval. The whole point of a credible interval is to tell your readers where most of the posterior probability lies. It would be perverse, if this is your goal, to exclude the maximum of the posterior probability!
We spent the rest of the period looking at the example of normal data with unknown mean and variance. We did it with simulation, and saw that as we increased the sample size, we did not come closer and closer to the mean and variance from which the data were drawn; the data are fixed, and all increasing the sample size does is to increase the precision to which we can squeeze the information out of our data, but getting better credible intervals, posterior means, variances, and so forth. But the histograms of the posterior probabilities get smoother and smoother as our simulation sample size gets bigger and bigger.