HCOL 195 9/23/09

We started with some announcements. In particular, I mentioned the William Lowell Putnam exam, which will be held in December. If you are interested in taking this exam, please see the math department (16 Colchester Ave.) immediately. The secretary in the math office can tell you the person you need to talk to. The application forms are due in California in about two weeks, so don’t delay.

I passed out the next assignment and remarked that the first two problems are different, in that the first one has you eat the chocolate as you pick them out, whereas in the second, the machine is producing an essentially infinite supply of widgets. In the third problem, the procedure is basically the same as what we did on Monday in class, except that you will use an electronic spreadsheet so that you can divide the x-axis into 100 rather than 10 divisions. The fourth problem shows that the methods we are using are applicable to many fields, in this case, literary analysis to credit an author. I pointed out that this problem was inspired by the problem of attributing several (about 10) of the anonymously written Federalist Papers to their actual author (Hamilton or Madison).

I had redrawn the pictures from our last class (see blog below). We attempted to use them to estimate one standard deviation. Our estimate was that 2/3 of the probabililty was contained between 0.20 and 0.47. For your interest, I calculated this with a spreadsheet with divisions 1/10 of what we used in class. Our estimate was spot on. We also considered the question of what the probability is that the drug is more effective than the standard drug, which we said had a cure rate of 0.2. By adding up the probability in the intervals from 0.2 to 1.0, we found that that probability adds up to about 0.84.

I finished the class by posing the problem of counting the fish in a lake. One student pointed out that when you want to count deer on land, you count all the deer in some sample areas, then extrapolate to the entire area under study. Theoretically this might work, although in a lake it might be difficult to count all the fish in some volume of water. So I introduced another approach, the “capture-release-recapture” method. We catch a sample of fish (say 100), tag them all, and return them to the lake (ideally we will sample the whole lake to prevent oversampling in an atypical region). We allow some time to pass, then go back and capture another sample (we said 100), and count the tagged fish (10). A rough estimate of the number of fish was correctly stated by a student to be about 1000 fish, based on the idea that we must have tagged about 10% of the fish if we had 10 tagged amongst the 100, so if there are 100 tagged fish in the lake there should be about 1000 fish in the lake.

We noted that there have to be at least 190 fish in the lake, since we tagged 100 and counted 90 untagged in the second catch. In setting up a spreadsheet calculation for a Bayesian calculation here, we decided that the states of nature (SON) should go from 191 to a million; actually I realized that it should start at 190. An uninformative prior would put the same on each SON, so we can just write 1 for each, since we know that we don’t have to make the prior add up to 1. We’ll look at this more on Friday. I asked you to think about how we should determine the likelihood, defined as the probability of obtaining the data we did, given each SON. The data are: We tagged 100 fish, and on the recapture phase caught 100 fish, 10 of which were tagged.