There are a lot of engineers who have never been involved in the field of statistics or Data Science. But in order to build data pipelines or rewrite produced code by Data Scientists to an adequate, easily maintained code many nuances and misunderstandings arise on the engineering side. For those Data/ML engineers and novice Data Scientists, I've made this series of posts.

I'll try to explain some basic approaches in plain English and, based on it, explain some of the basic concepts in Data Science.

The whole series:

True logic of this world lies in the calculus of probabilities — James Clerk Maxwell

Data science often uses statistical inferences to predict or analyze insights from data, while statistical inferences use probability and its characteristics. So knowing probability and its applications are important to effectively handle Data Science problems.

Below is an explanation of the probability as a Frequentist probability as it is easier to understand from the beginner point of view.

Events. Probabilities of events

One of the basic concepts in statistics is an event. Events are simply results of experiments. Events can be certain, impossible, or random.

A certain event is an event that as a result of an experiment (the execution of certain actions with a certain set of conditions) will occur in 100% cases. For example, a tossed coin will certainly fall(on the Earth conditions).

Impossible event — an event, as the name implies, that will not occur as a result of the experiment. For example, a tossed coin will fly in the sky — this is an "impossible" event.

And finally, an event is called a random event if the event may or may not occur as a result of the experiment. There should be present fundamental criteria of randomness in such an experiment. A random event is a consequence of random factors which influence cannot be predicted or such predictions can be extremely difficult. For the coin toss example, random factors are the shape and physical characteristics of the coin, the strength/direction of the throw, air resistance, etc. These are factors that are extremely hard to predict.


Let us consider in more detail the flip of a coin (meaning a fair coin — a coin in which both results ("heads and tails") are equally likely). There are 2 mutually exclusive outcomes — heads or tails. The outcome of the flip is random since the observer cannot analyze and take into account all the factors that influence the result. What is the probability of heads? Most answer ½, but why?

Let's name A the event that came up as tails. Let the coin be thrown n times. Then the probability of event A can be defined as:

The probability of an event happening = number of ways it can happen / Total number of outcomes

This is called the frequency of event A in a long series of tests.

Example: there are 4 Kings in a deck of 52 cards. What is the probability of picking a King?

Number of ways it can happen: 4 (there are 4 Kings)

Total number of outcomes: 52 (there are 52 cards in total)

So the probability = 4/52 = 1/13

It turns out that in various test series the corresponding frequency for large n is fluctuating around a constant value P(A). This value is called the probability of event A and is denoted by the letter P — an abbreviation for Probability.

The probability lies in the range [0, 1], where, in general, 0 indicates the impossibility of the event, and 1 indicates certainty. The higher the probability of an event, the greater the likelihood that an event will occur.

Example: What is the probability of drawing a Jack or a Queen from a well-shuffled deck of 52 cards?

If we have 4 Jack and 4 Queen cards, the probability is simply the sum of the individual probabilities.

Number of ways it can happen: 4 (there are 4 Jacks) and 4 (there are 4 Queens) = 8

Total number of outcomes: 52 (there are 52 cards in total)

So the P(Jack or Queen) = 8/52 = 2/13

Event types

Independent and dependent events

Two random events A and B are called independent if the occurrence of one of them does not change the probability of the occurrence of the other. Otherwise, events A and B are called dependent.

Counterintuitively, knowing that the coin landed on the head on the first toss, does not provide any useful information for determining what the coin will land on the next toss. The probability of a head or a tail on the next toss is still 1/2, regardless of the outcome of the first toss.

Probabilities of independent events should be multiplied to get the total probability of the occurrence of all of them.

Example: What are the chances of getting heads 3 times in a row?

Let's define possible outcomes of 3 coin tosses(H - heads, T - tails):


Number of ways it can happen: 1

Total number of outcomes: 8

So, the answer is 1/8. However, we know that the results of a coin toss are independent and we can multiply them to get the total probability: P(3x heads) = P(heads)P(heads)P(heads) = 1/2 * 1/2 * 1/2 = 1/8

On the other hand, knowing that the first card drawn from a deck is an ace does provide useful information for calculating the probabilities of consecutive draws. So, the probability of drawing yet another ace is going to be 3 over 51, instead of 4 over 52 because we know that we already remove one of the aces from the deck. You can think about that in your next poker game.

Disjoint and overlapping events

Disjoint  events cannot happen at the same time. A synonym for this term is "mutually exclusive".

For example, the outcome of a single coin toss cannot be a head and a tail, it can be either head or tails.

The not disjoint event can happen at the same time. Therefore they can overlap, and the probability of overlapping should be excluded from total probability to avoid double counting.

Example: What is the probability of drawing a Jack or a Red card from a well-shuffled deck of 52 cards?

Several ways it can happen: 4 (there are 4 Jacks) and 26 (there are 26 Red cards). But there are 2 Red cards overlap between them. Two red jacks that fit both criteria.

Total number of outcomes: 52 (there are 52 cards in total)

So, P(Jack or Red card) = P(Jack) + P(Red card) - P(Jack and Red card) = 4/52 + 26/52 - 2/52 = 7/13

Types of probabilities

Joint Probability

Joint Probability

Joint probability is a type of probability where more than one event can occur simultaneously. The joint probability is the probability that event A will occur at the same time as event B.

For example, from a deck of 52 cards, the total probability of receiving a red 6 card is P(6 ∩ red) = 2/52 = 1/26, since there are two red sixes in the deck of cards — ♦6 and ♥6. You can also use the formula to calculate the total probability — P(6 ∩ red) = P(6)P(red) = 4/52 * 26/52 = 1/26.

The symbol "∩" in joint probability is an intersection. The probability of occurrence of an event A and an event B is the same as the intersection of A and B sets. Venn Diagram is perhaps the best visual explanation for that.

Marginal probability

Marginal probability — a probability of any single event occurring unconditioned on any other events.

Whenever someone asks you whether the weather is going to be rainy or sunny today(without any conditional or prior information), you are computing a marginal probability.

Conditional probability

Conditional Probability

Conditional probability — is a probability of an event given that (by assumption, presumption, assertion, or evidence) another event has occurred.

When I ask you what is the probability that today will be rainy or sunny given that I noticed the temperature is going to be above 80℉, you are computing a conditional probability. There is a specific notation for conditional probability shown in the image above.

So, we want to understand the probability of event B given A. It is defined as the probability of the joint of events A and B divided by the probability of B. Thus,

$$ P(B|A) = \frac{P(A ∩ B)}{P(B)} $$

Example: A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the class passed the first test. What is the probability of those who passed the first test also passed the second test?

$$ P(Second|First) = \frac{P(First ∩ Second)}{P(First)} = \frac{0.25}{0.42} = 0.6 = 60% $$


In this post, we got acquainted with the basic concepts of probability and probability algebra. In the next post, we will talk about the Bayes theorem and look at the world through the eyes of Bayes.

Next post

Additional material

Buy me a coffee