Shouldn’t Bayes’ Theorem Be Used for Probability of Gender?

New Call-to-action


 With fair lending practices, examiners require you to show you are not discriminating against any of the defined protected classes, but the catch is you can’t ask applicants for their information regarding those protected classes (in most cases). One of the coolest modules Visible Equity has to offer is our Fair Lending module, a module with a feature where we calculate the borrower’s probability of belonging to certain protected classes, namely race/ethnicity and gender. We’ve covered probability of race/ethnicity in a few blogs, such as Breaking Down Bayes' Theorem for Fair Lending and Improving the Bayesian Improved Surname Geocoding Method. In this blog, we’ll cover a bit of probability of gender. If you’ve read up on the probability of race/ethnicity blogs or have read our white papers, you know that Visible Equity employs a version of the Bayesian Improved Surname Geocoding (BISG) method to calculate the probabilities. As the name implies, the BISG method relies on Bayes’ Theorem heavily. As a reminder, we define Bayes’ Theorem as follows



For more specifics on what each part of this theorem means and applying this formula to probability of race/ethnicity, see Breaking Down Bayes' Theorem for Fair Lending. 

In our Fair Lending white paper, we state “The calculation of a borrower’s gender is the second and perhaps most straightforward estimate required in the pricing analysis. A borrower’s expected gender is estimated using a conditional probability based on first name. For example, consider the borrower with the first name of John. Given the first name of John, the conditional probability that this individual is male is calculated to be 97%. This probability is calculated as the total number of people named John that are male, divided by the total number of people named John. Therefore, the conditional probability that this individual is female is then 3%.” So the probability of a borrower named John being male is simply



You might be asking, “If you used Bayes’ Theorem to calculate a borrower’s race/ethnicity, why don’t you use it for gender as well?” Well, we’ve got two responses to that. First, the probability of race/ethnicity calculation is a little more complicated in the fact that it incorporates a borrower’s location, and Bayes’ Theorem lends itself beautifully to incorporating additional information. Second, the probability of gender calculation actually does use Bayes’ Theorem! It’s just not obvious. Let’s revisit the John example, and let’s apply Bayes’ Theorem to this example as follows



where Male Johns is the number of people named John who are male, Males is the number of males in the entire population, Female Johns is the number of people named John who are female, Females is the number of females in the entire population, and People is the total population. So as you can see, in the first line of the equation we apply Bayes' Theorem, and through the magic of algebra, we can reduce the calculation down to one simple ratio of dividing the total number of male Johns by the total number of people named John. Our calculation is just a simplified version of Bayes’ Theorem, so we know we are mathematically sound! This example can be extended to all first names with gender information from the Census. So don’t fret. The probability of gender calculations are in fact obeying Bayes’ Theorem; it’s just not obvious on the surface!

Keaton Baughan

Product Manager