We discussed the problem set. I said that in the future I want one paper per group; if there are disagreements within the group, they should be discussed in the paper.

On Problem 2, I noted that some groups were using set theory to try to solve it. That was not my intention. I wanted you to use just the axioms of probability theory as we discussed in class. Probability theory is a calculus on propositions A, B, …; a proposition such as A=”the Nile is longer than 1000 miles” isn’t a set. There are similarities between set theory and probability theory, but you should not confuse the two.

On Problem 3, the presentation is simplest if you ignore the actual numbers on the dice and just note whether the outcome is odd or even.

On Problem 4, the commonest problem was thinking that P(y|x=1) is a number. It is an array, because y is a vector with three components, and each component needs a probability. What many teams computed was P(x=1).

On Problem 5, most groups did find, but some didn’t note that the first two probabilities I gave were conditional probabilities, but the third was a *joint* probability. The trick is to figure out how to compute the conditional probability corresponding to the third number (0.05). This fills out the first row for independence. Then the second row is a constant multiple of the first row, and to get a table for dependence, just alter the numbers in the second row to make everything add up to 1.

On Problem 6, the main problem was that some teams didn’t give enough detail or explanation of how they reached the result or would explain it to a member of the general population.

We spent the majority of the time on #7. Here there are several issues: Is this sampling with replacement, or without? My position is that in general this should be considered sampling with replacement, which would make a factor of 1/N for each observation of a taxi. The reason is that if one had independent repeats of an observation of a taxi (as for example, observing taxi #3 the next morning) then that would reinforce our opinion that the number of taxis was small, as against the next morning observing (at random) taxi number 89. Just because we observed taxi #3 in the evening doesn’t mean that we can’t use our next-morning observation as new evidence. There are some subtleties, as we can’t follow a taxi down the road and glance away and glance back and think that’s another independent observation. They aren’t independent, and so you can’t just multiply!

Issue number two was the prior. N is the number of taxis in the city, and you are more likely to be parachuted into a small city than a large one (because there are more small cities than large ones); so the prior P(N) should be smaller for large N than for small ones. It’s not certain what one should use, but several groups used P(N)=1/N (improper prior but OK if the posterior is proper). More generally one might want to consider something like a power law which would look like for positive. There could be other choices.

Issue number 3 was minor, but just to note that not every 5% credible interval should look like a [0.025, 0.975] interval. That is because in this particular case, the largest amount of posterior probability is in the [0, 0.025] interval. The whole point of a credible interval is to tell your readers where most of the posterior probability lies. It would be perverse, if this is your goal, to exclude the maximum of the posterior probability!

We spent the rest of the period looking at the example of normal data with unknown mean and variance. We did it with simulation, and saw that as we increased the sample size, we did not come closer and closer to the mean and variance from which the data were drawn; the data are fixed, and all increasing the sample size does is to increase the precision to which we can squeeze the information out of our data, but getting better credible intervals, posterior means, variances, and so forth. But the histograms of the posterior probabilities get smoother and smoother as our simulation sample size gets bigger and bigger.

October 3, 2012 at 1:08 pm |

I’m still a little confused why we would be sampling with replacement. I figured once you see a taxi, you already know it exists and shouldn’t have to count it again. And since there are a finite number of taxis, N, the second taxi you’d see would be (N-1).

October 3, 2012 at 2:45 pm |

Hi Anna, It all depends on how the sampling is done. It’s not so much that you’ve seen it so you know that it exists, it is how you’ve seen it, that is, the sampling scheme.

If there were only one taxi, then you’d always see #1 and never any other number. I’m sure that in that case, if day after day you kept seeing #1 and never any other number, you’d become more and more certain that there was only one taxi.

October 5, 2012 at 12:25 am |

I have a question about the taxi problem:

if you do sample with replacement and you use 1/N^3 as likelihood, I don’t see how seeing for example car 3 1,000 times over and over will affect the likelihood. I understand that Im more sure that there are 3 cars for example, but how is that reflected in the likelihood? is it reflected in the likelihood?.

October 5, 2012 at 1:00 am |

The point is, that if you observe ONLY low numbered taxis, over and over, every time you notice a taxi, each time you will be more and more convinced that there are only low-numbered taxis in the town. You have to think that your observations are independent (that is, you don’t count as a separate observation seeing a taxi, glancing to the left, glancing back at the taxi and seeing it again as an independent observation).

Here’s another example, from the first day of class: I had a two headed coin. Suppose I toss a coin 100 times, and every time it comes up heads. Sure, it might be a fair coin, but every time that you toss it and it comes up heads again, you will believe more and more that the coin has two heads (or that I am cheating in some other way).

So, it is reflected in the likelihood by the fact that each observation of the same taxi (or coin toss) will add an additional factor, in the case of taxis, of 1/N, again assuming that the observation events are independent. So if you observe taxi #3, for example, 1000 times, that will be a factor of in the likelihood, which peaks very close to 3.