We continued the fish “catch and release” problem, allowing us to estimate the number of fish in a lake, when we caught 100, tagged them, let them swim around, caught another batch and noted that 10 of them were tagged. We noted that we must have seen 190 different fish, so the experiment (not the prior) will, through the likelihood, guarantee that the posterior is 0 for N<190. Here’s the spreadsheet we constructed; we represented the nonzero terms in the likelihood with squiggles but didn’t actually calculate them (this is really where a real spreadsheet would be useful).

I also put 1 for each entry in the prior, and we learned that even though we are not making the prior add up to 1, it doesn’t matter here, since when we divide by the marginal distribution in calculating the posterior, the factor we multiply the prior by to get all 1’s cancels out.

We figured out that the likelihood would be calculated as in the following chart. Each tagged fish gives a factor equal to the number of tagged fish left in the lake (after the ones we have recaptured were removed), divided by the number of fish left in the lake (after removing the tagged fish). Similarly, each untagged fish we recapture gives a factor equal to the number of untagged fish left in the lake, divided by the total number of fish left in the lake. Each fish captured changes the number of that kind of fish in the lake as well as the total number of fish. We first did this for N=190. We figured out how to do this for a general N. We saw that if N<190, there will be a 0 factor somewhere in the product, which yields a 0 likelihood. We finally figured out how to use the factorial function n! to express the likelihood somewhat more simply.

The graph of the posterior distribution looks something like this. It’s an asymmetric “bell-shaped curve”, with a maximum near 1000, 0 for N<190, and asymptotes to 0 on the high end (but never quite gets to 0 unless the prior is 0 for some maximum number of fish).

We then turned to the problem of counting German tank production, as was done in World War II by the allies, by looking at the serial numbers of tanks that were captured. The Germans numbered their tanks sequentially, which gave the allies a way to estimate how many tanks they had produced. We did this with a simple example. We supposed that we had captured tanks #10, 5 and 11 and asked, how does this model predict the number of tanks produced? The spreadsheet looks somewhat similar to the fish calculation in that we see that if we have seen Tank #11, then there have to be at least 11 tanks and the likelihood will be 0 for N<11.

To calculate the likelihood, we see that if there are just 11 tanks, the probability of observing the first tank captured is 1/11, the second tank captured 1/10, and the third 1/9. The product of these is the likelihood if N=11. Similar considerations work for N=12, 13, and so on, except that we will start with a larger denominator, decreasing by 1 each time. We see that the likelihood decreases as N increases (for N=11, 12, 13, …). This means that the posterior is going to decrease as N gets larger and larger.

So the posterior distribution looks like this:

This is *not* like a bell-shaped curve! This is why taking averages and so on is not a good way to think about this problem.

Have a nice weekend!

## Leave a Reply