# Breaking Down Bayes’ Theorem for Fair Lending

Breaking Down Bayes’ Theorem for Fair Lending

Institutions such as the Consumer Financial Protection Bureau (CFPB) and Visible Equity use what is known as the Bayesian Improved Surname Geocoding (BISG) method to assign probabilities of race to a person. As the name implies, the BISG method uses Bayes’ Theorem for this calculation. You’ve probably seen Bayes’ Theorem referenced in previous Visible Equity blogs, but what is it exactly?

Let’s break down Bayes’ Theorem to understand where it came from, what it’s doing, why it’s useful, and why it’s beautiful. Say we are interested in understanding the probability of event A occurring. Now let’s say we have some useful information in the form of event B that can help give us insight into event A. With this added information from B, we are ultimately interested in the conditional probability $P(A|B)$, which we read as “the probability of A given B” (or rather, the probability of event A occurring given event B has occurred).

The definition of a conditional probability is

$P(A|B) = \frac{P(A \cap B)}{P(B)}$ .

If we multiply both sides of this equation by $P(B)$, then we can isolate $P(A \cap B)$ (the upside down U stands for “intersect” or “and,” so we are looking for the probability of both A and B occurring) . And since $P(A \cap B) = P(B \cap A)$, it follows that $P(B \cap A) = P(B|A)P(A).$  Thus, we can rewrite the original conditional probability definition as

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$.

This is Bayes’ Theorem. The beauty of this theorem lies in the fact that we can find the desired conditional probability, $P(A|B)$, by switching the condition. Notice how on the left side of the equation we have $P(A|B)$, and on the right side we have $P(B|A)$. So if we can find these pieces, $P(B|A)$, $P(A)$, and $P(B)$, then we can obtain our desired probability $P(A|B)$!

Let’s manipulate the denominator of the right side a little bit. The Law of Total Probability states that $P(B) = P(B|A)P(A) + P(B|A')P(A')$. This says that we can find the probability of an event by breaking it down to conditional probabilities. Thus, we can rewrite Bayes’ Theorem as

$P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{P(B|A)P(A)}{P(B| A)P(A) + P(B| A')P(A')}$.

Note that $P(A')$ is the probability of event A not occurring, which is simply $1 - P(A)$, and $P(B|A')$ is the probability of event B occurring given that event A did not occur.

The unconditional probability $P(A)$ is called the prior. The name comes from the fact that this is the probability of A before we have any additional information from B. This is the piece where we have a lot of control. On the other hand, $P(A|B)$, the probability we ultimately want, is the probability of A after the added information from B; thus, it is known as the posterior.

Let’s do a simple problem using Bayes’ Theorem. Suppose you have two bags. Assume the probability of choosing bag one is 50%, so the probability of choosing bag two is also 50%. In bag one, there are five red balls and five blue balls. In bag two, there are three red balls and seven blue balls. You put on a blindfold and pull a ball from one of the two bags at random. The ball is red. What is the probability that the red ball came from bag one?

First, what is the probability we want? If we just wanted to know the probability of choosing bag one, we already know that it is 50%. But you were blindfolded and chose a red ball, so that probability has updated. It’s the probability of pulling from bag one given you pulled a red ball, $P(Bag_{1}|Red)$. This is a conditional probability, so let’s go to Bayes’ Theorem to help out. With Bayes’ Theorem, we get

$P(Bag_{1}|Red) = \frac{P(Red|Bag_{1})P(Bag_{1})}{P(Red|Bag_{1})P(Bag_{1}) + P(Red|Bag_{2})P(Bag_{2})}$.

Do we have the pieces needed for this calculation? We certainly do. We have

$P(Bag_{1}|Red) = \frac{P(Red|Bag_{1})P(Bag_{1})}{P(Red|Bag_{1})P(Bag_{1}) + P(Red|Bag_{2})P(Bag_{2})} = \frac{(\frac{1}{2})(\frac{1}{2})}{(\frac{1}{2})(\frac{1}{2}) + (\frac{3}{10})(\frac{1}{2})}$.

This results in $\frac{5}{8} = 62.5\%$. Not too bad.

Now we’ll apply Bayes’ Theorem to finding race probabilities for fair lending where we know a person’s surname. The probability we are interested in is $P(Race| Surname)$. So using Bayes’ Theorem, we get

$P(Race| Surname) = \frac{P(Surname| Race)P(Race)}{P(Surname)} = \frac{P(Surname| Race)P(Race)}{P(Surname| Race)P(Race) + P(Surname|Race')P(Race')}$

So the three pieces we need are $P(Surname| Race)$, $P(Surname|Race')$, and $P(Race)$.

We can find $P(Surname| Race)$ by using data from the U.S. census.

Again, using the Law of Total Probability, $P(Surname) = P(Surname| Race)P(Race) + P(Surname|Race')P(Race')$. As stated previously, we can get $P(Surname| Race)$ from the census, and it follows that we can also get $P(Surname| Race')$ from the census. All we need now is $P(Race)$.

If you recall, $P(Race)$ is the prior, that is, this is the probability of race before we know the person’s surname. This is where the person’s location comes into play. If we know the person’s address, then we can find his/her census tract, which gives us the counts of each race in the area. Using these counts, we can get the probability of race in the area. (See our recent blog Improving the Bayesian Improved Surname Geocoding Method on how Visible Equity has further improved these calculations by way of the prior, $P(Race)$.) If we do not have the person’s address, we could use the national level probabilities of races. Once we have $P(Race)$, it follows that $P(Race')$ is simply $1 - P(Race)$. Thus, we have all the pieces to calculate our posterior.

We’ll do one last example. Suppose we want to find the probability of Alex Jackson’s being white. Then

$P(White|Jackson) = \frac{P(Jackson|White)P(White)}{P(Jackson|White)P(White) + P(Jackson|Not \text{ } White)P(Not \text{ } White)}$

According to census data, we know that $P(Jackson|White) = 0.0017$ (of all people who claimed white on the census, 0.17% had the surname Jackson) and $P(Jackson|Not \text{ } White) = 0.0052$. Let’s say Alex comes from a census tract where 30% of the people are white. Then $P(White) = 0.3$. When we throw these numbers into Bayes’ Theorem we get

$P(White|Jackson) = \frac{.0017*0.3)}{.0017*0.3 + 0.0052*(1-0.3)} = 12\%$.

Now we can all appreciate the power and beauty of Bayes’ Theorem.

Product Manager