One of our students reported how his stats 141 class had discussed the following problem: 48 managers were each given a copy of a promotion resume. All the resumes were identical, except that 24 of them stated that they were for a male candidate, and 24 a female candidate. In the case of the resumes marked as for a male candidate, 21 of the 24 managers recommended promotion. In the case of the resumes marked as for a female candidate, 14 of the 24 were promoted.

His stats class analyzed this in a frequentist way, and determined that the tail area (one-sided) was 0.021 (see figure below, left-hand side). This would indicate statistically significant evidence of bias. The student wondered how we would look at this problem from a Bayesian point of view, and what results we would get

So I looked at the problem. I knew that if there were no bias, there would be one value of the probability p for promotion, but if there were bias, then there would be two values, and for men and women. All of these probabilities are unknown, and have to be estimated from the information we have. First, consider the case of no bias. We want to know the posterior probability for various values of p. We know how to solve this problem with our spreadsheet method. We adopt p=0.05, 0.15, 0.25,…, 0.95, for example (or many more values if we are using a real spreadsheet for greater accuracy). The likelihood is since there were 35 recommendations for promotion and 13 for no promotion. We would fill these values into the likelihood column. In this case, since we are estimating different numbers of parameters in the two cases, we have to use a “real” prior that adds up to 1. So we enter 0.1 into each entry in the prior column. Then the usual procedure: Multiply prior times likelihood to get the joint, add them to get the marginal likelihood, and divide the marginal into each joint probability to get the posterior on p. Actually, for this problem we don’t need the posterior probability on p, we just need the marginal likelihood. The process is indicated in the shot below:

In the lower right of this picture, you will see that I remarked that the marginal likelihood we got by summing is an approximation that gets better and better, the closer we space the sample values of p. In the limit, it approaches an integral. It is in fact an integral that can be evaluated exactly by integration by parts. In general, if there were m promotion recommendations and n recommendations not to promote, the value of the integral is

,

where in the second form we’ve used the (m+n) choose m notation.

The calculation for two different promotion probabilities is more complex. We arrange the values for along the left-hand side of the array, and those for along the right-hand side. The likelihood in each grid point is . We would fill each grid point up by the appropriate value of this quantity. I then asked you to imagine behind the board an array of priors of the same size (10 x 10 in the picture). Again, we want a “real” prior that adds up to 1, so we’d fill in each grid point with 0.01. I then asked you to imagine another array into which you would enter the desired values of prior x likelihood. Some of you were a bit puzzled, so I imagined turning the array on its side so that we can look along the individual tables (see right hand side of the diagram below). That seemed to help. The board looked like this:

We then calculate the marginal likelihood by adding all of the numbers in the joint table. Equivalently, we could add all the numbers in the likelihood table and multiply by 0.01; that gives us the same answer. At the bottom of the board, we see that the sums involving the two different probabilities are independent of each other, so we can split the sum into two independent sums over one variable, and these also are approximated by integrals (which is what we really want). That’s shown in the chart below.

I evaluated the required integrals using the choose(a,b) function in the free statistical computer language R, which can be obtained by googling ‘cran’. So once these are evaluated, I can compare the marginal likelihoods of the two cases. I find that the marginal likelihood for one common promotion rate is smaller than for the two independent rates case. The posterior probability of no bias is 0.21, which is ten times larger than the tail area that was calculated in the class.

However, the two calculations aren’t really doing the same thing. The calculation in the class is really computing the probability of getting data as extreme as or more extreme than what we did observe, in a direction that says there is bias against the female candidates, given that there is no bias. It’s not looking at bias in favor of females. On the other hand, our calculation assumes at the outset that there is a significant (in fact equal) prior probability of no bias as to bias, and it wouldn’t matter whether the bias was in favor of or against female candidates. We really want to investigate the probability of bias against females and for males, given that bias exists. To do that, we really want to add up only the numbers that represent bias in favor of males, and compare that to the marginal likelihood we calculated for the case of bias. This is reasonable to do, because our experience is that no one can be perfectly unbiased, so considering the case where the promotion probabilities don’t depend at all on gender is probably unreasonable. To do this, we noted that the case of bias in favor of males consists of all the numbers in the upper triangle of our array. We’ve already (in principle) computed the marginal likelihood summing over all the numbers, so we can also compute it summing only over the upper triangle. Comparing the two, we can see how much the bias in favor of male candidates is. This is equivalent to doing a double integral, where the upper limit on the inner integral depends on the value of in the outer integral. Unfortunately, this isn’t so simply done. It can be done in principle, because the integrands are all polynomials, but it wouldn’t work very well because integrating high-order polynomials will produce a lot of rounding error.

Here’s the right-hand board, the illustrates the idea:

I did the calculation using a completely different method. What I did was to have R draw a large number (100000) of independent samples of from a probability proportional to , and similarly for . I then just counted how many of the ‘s were greater than the corresponding ‘s. I divided this by 100000, and the result is the probability that the bias is in favor of males. That number turned out to be 0.988. This is even more strong indication of bias than the method used in the statistics class!

Note that there are some differences here between what we’ve done and what was done in the other class. The tail area evaluated in that class assumed that there was no bias, and calculated the probability of getting data as extreme as we did, assuming that that was true. It didn’t do any calculation that assumed that there was bias. Basically, it says that if the data are really extreme, then we are justified in rejecting the null hypothesis of no bias.

Our Bayesian calculation did the calculation assuming, in the first case, that there were two possibilities, bias or no bias, and in the second case that there was bias, and we wanted to know in what direction. And it computes the probability of the hypothesis given the data, not the other way around. It is this feature of the Bayesian way of doing things that appeals to me.

November 17, 2009 at 6:38 pm |

[…] commented on Monday’s class By bayesrules I’ve commented on Monday’s class here. Possibly related posts: (automatically generated)Darkness […]