Data Science. Bayes theorem

Data Science. Bayes theorem

There are a lot of engineers who have never been involved in statistics or data science. So, in order to build a data science pipelines or rewrite produced by data scientists code to an adequate, easily maintained code many nuances and misunderstandings arises from engineering side. For these Data/ML engineers and novice data scientists, I make this series of articles. I'll try to explain some basic approaches in plain English and, based on them, explain some of the Data Science model concepts.

The whole series:

Bayes theorem is one of the most important rules of probability theory used in Data Science. It provides us with a way to update our beliefs based on the arrival of new events.

Imagine we have two related events A and B. It can be for example, A — I get wet today, B — it will be rainy today. Let's calculate the probability of A given B has already happened.

A ∩ B

Now since B has happened, the part which now matters for A is the shaded part which is interestingly A ∩ B. So, the probability of A given B turns out to be:

$$ P(A|B) = \frac{P(A ∩ B)}{P(B + A ∩ B)} $$

Therefore, we can write the formula for event B given A has already occurred by:

$$ P(A|B) = \frac{P(A ∩ B)}{P(B)} $$


$$ P(B|A) = \frac{P(A ∩ B)}{P(A)} $$

Now, the second equation can be rewritten as :

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

It's all. These are all conclusions that need to be made to arrive at Bayes' theorem. Let's combine it all into one picture and re-name the members of the formula:

Bayes' theorem

  • P(A|B) is the posterior probability, or the probability of A to occur given event B occurred
  • P(B|A) is the likelihood, or the probability of B given A
  • P(A), P(B) is the prior probability of event A or B to occur

It should be noted that with independent events P (B | A) = P (B), which is logical — if the occurrence of event A does not affect the occurrence of event B, then what should be taken into account?

Example: If a single card is drawn from a standard deck of playing cards, the probability that the card is a king is 4/52, since there are 4 kings in a standard deck of 52 cards. Rewording this, if King is the event "this card is a king," the prior probability P(King) = 4/52=1/13

If evidence is provided (for instance, someone looks at the card) that the single card is a face card, then the posterior probability P(King|Face) can be calculated using Bayes' theorem:

P(King|Face) = P(Face|King) * P(King) / P(Face)

Since every King is also a face card, P(Face|King)=1. Since there are 3 face cards in each suit (Jack, Queen, King) , the probability of a face card is P(Face) = 3/52. Combining these gives a likelihood ratio of 1/3/13 = 13/3.

Using Bayes' theorem gives P(King|Face) = 13/3 / 1/13 = 1/3.

Intuitive understanding

Man was sitting with his back to a perfectly flat and perfect square table. Then he would ask an assistant to throw a ball onto the table. Now, this ball could obviously land and end up anywhere on the table and he wanted to figure out where it was? So, what he asked his assistant to do was to throw another ball and tell him whether it landed to the left or to the right or in the front or behind of the first ball. This he would note down and then ask for more and more balls to be thrown on the table. What he realized was, that through this method he could keep updating his idea of where his first ball was. But, of course he could never be completely certain but with each new piece of evidence he would get more and more accurate.

And that’s how Bayes saw the world, it is his thought experiment. It wasn’t that he thought the world was not determined, that reality didn’t exist, but it was that we couldn’t know it perfectly and all we could hope to do was update our understanding as more and more evidence was available.

I'm advising to watch the video below, it's cool anyway: Bayes theorem


The fundamental idea of Bayesian inference is to become "less wrong" with more data. The process is straightforward: we have an initial belief, known as a prior, which we update as we gain additional information.

The conclusions drawn from the Bayes theorem are logical, but anti-intuitive. Almost always, people pay a lot of attention to the posterior probability, but they overlook the prior probability.

Using this simple formula we already can construct some of the models but hold on to the flow. Basics first.

Support author