Why is the Bayesian Theorem different from the frequentist statistic approach?

Lets start with an example:

When I flip a fair coin now, what is the probability of getting a tail? Frequentist approach says, you have 50% chance of getting a tail and 50% of getting a head.

If I tell you that my last 100 flips by using the same coin have resulted in a head, what is the probability of getting a tail when I flip this coin now? Can you say 50%? Or would you like to say less than 50% by considering the additional information about the flip history of this coin? I guess second option is more likely…

Different from frequentist general statistic probability approach, Bayesian theorem considers additional dependencies or set of information regarding to event. So that’s why “Bayes’ Theorem is a formula which converts human belief, based on evidence, into predictions.”


  • P(A|B) indicates conditional probability: the likelihood of event A occurring given that B is True.
  • P(B|A) indicates conditional probability: the likelihood of event B occurring given that A is true.
  • P(A) and P(B) are the probabilities of observing A and B respectively; they are known as the marginal probability.

Moreover, the occurrence of an event A depends on B, is different from occurrence of an event B depends on A. Bayesian theorem indicates a relationship between these two conditions as well.

How probability of one event could affect probability of another event?

As you can understand from a formula, if you increase the denominator P(B), then P(A|B) decreases. Let me give you a nice example that I use to explain this relationship. A runny nose is a symptom of the measles, but a runny nose is a very common symptom for most of the illnesses, which happens even more than measles. So, if you choose P(B) where B is a runny nose, then the frequency of runny noses in the general population will be very high and this will decrease the chance of considering a runny nose as a sign of measles. As a result, the probability of a measles diagnosis goes down with regard to symptoms that become increasingly common; those symptoms are not strong indicators.

That’s why in definition of Bayesian theorem, we see “Every parameter is a random variable and has their distributions.” With bayesian, we could figure out statistics of these distributions.

According to the frequentist definition of probability, only repeatable random events (like the result of flipping a coin) have probabilities. Frequentists argue that you can not assign probabilities to possible parameter values. Therefore they use (maximum likelihood) point estimates of unknown parameters to predict new data points. These probabilities are equal to the long-term frequency of occurrence of the events. In contrast, Bayesian accepts to use probability of unknown values instead of basing fixed value.

Finally, Probabilisticworld.com gives a very clear way of expression to indicate way of looking of a frequentist vs. a bayesian towards problems. As a problem, if we wanted to estimate the average height of adult females, as a problem; a frequentist would say:

“I don’t know what the mean female height is. However, I know that its value is fixed (not a random one). Therefore, I cannot assign probabilities to the mean being equal to a certain value, or being less/greater than some other value. The most I can do is collect data from a sample of the population and estimate its mean as the value which is most consistent with the data.

On the other hand, a Bayesian, would reason differently:

I agree that the mean is a fixed and unknown value, but I see no problem in representing the uncertainty probabilistically. I will do so by defining a probability distribution over the possible values of the mean and use sample data to update this distribution.

Thank you for reading this intuitive explanation of Bayesian approach and its difference from the frequentist approach. Soon other posts related to applied examples of Bayesian will be available.


Data Science and all related topics..