ML Interview Q Series: Fair vs Biased Coins: Calculating Same Outcome Flip Probability with Total Probability.

May 26, 2025

Browse all the Probability Interview Questions here.

We have two coins: one that is fair and another that has a 3/4 chance of landing heads. We randomly choose one coin and flip it twice. What is the probability that both flips yield the same outcome?

Comprehensive Explanation

When we randomly pick one of the two coins, each coin is selected with probability 1/2. The first coin is fair, meaning it has an equal 1/2 chance for heads or tails. The second coin is biased, giving a 3/4 chance for heads and a 1/4 chance for tails. We want to find the probability that two consecutive flips of the chosen coin produce the same face (either both heads or both tails).

For the fair coin, the probability of getting heads is 1/2 and tails is also 1/2. The chance of getting both heads is (1/2)*(1/2) = 1/4, and the chance of getting both tails is also 1/4. So the probability of same side in two flips with the fair coin is 1/2.

For the biased coin, the probability of getting both heads is (3/4)(3/4) = 9/16, and the probability of getting both tails is (1/4)(1/4) = 1/16. Hence the probability of same side in two flips with the biased coin is 10/16 = 5/8.

Since we pick each coin with equal probability 1/2, the final probability can be expressed as

where P_fair(same side) = 1/2, and P_biased(same side) = 5/8. Plugging in these values:

P(same side) = (1/2)(1/2) + (1/2)(5/8) = 1/4 + 5/16 = 9/16.

Thus, the probability that both flips result in the same side is 9/16.

Detailed Reasoning

The problem is a straightforward application of the law of total probability. We have two scenarios based on the selection of the coin. Each scenario has a known probability of occurrence (1/2) and a conditional probability for obtaining the same outcome twice. By combining these scenarios, we arrive at the final probability.

A crucial point is recognizing that the events “pick the fair coin and get two of the same” and “pick the biased coin and get two of the same” are disjoint, and their probabilities can be added in a weighted manner by their prior probabilities (1/2 for each coin).

Practical Example in Python

Below is a simple Python snippet that runs a quick empirical simulation to confirm the analytical result. The results should converge around 9/16 (approximately 0.5625).

import random

def simulate(num_trials=10_000_000):
    same_count = 0
    for _ in range(num_trials):
        # Pick one coin at random (fair coin: 0, biased coin: 1)
        coin_choice = random.randint(0, 1)

        if coin_choice == 0:  # Fair coin
            flip1 = 'H' if random.random() < 0.5 else 'T'
            flip2 = 'H' if random.random() < 0.5 else 'T'
        else:  # Biased coin
            flip1 = 'H' if random.random() < 0.75 else 'T'
            flip2 = 'H' if random.random() < 0.75 else 'T'

        if flip1 == flip2:
            same_count += 1

    return same_count / num_trials

estimated_probability = simulate()
print("Estimated Probability of Same Side:", estimated_probability)

This code randomly picks one of the coins, performs two flips, and checks if they match. Over many simulations, the empirical result should be close to 0.5625.

Potential Follow-Up Questions

How does this result change if the coin selection probabilities were not 1/2 and 1/2?

The law of total probability would still apply. If we pick the fair coin with probability p and the biased coin with probability 1−p, we update the calculation. Let P_f be the probability of same side with the fair coin (which is 1/2), and P_b be the probability of same side with the biased coin (which is 5/8). Then:

P(same side) = p * P_f + (1−p) * P_b.

By adjusting p, you can explore how heavily the biased coin influences the outcome.

What if we wanted the probability that exactly one flip is heads and one flip is tails?

We can either compute this directly for each coin or use 1 – P(same side) for each coin. For the fair coin, the probability of exactly one heads and one tails is 1/2. For the biased coin, it is 6/16 = 3/8 (because HT or TH each happens with probability (3/4)*(1/4) = 3/16, and there are two such permutations). Then the combined probability would be the weighted sum by the prior probabilities of picking each coin.

Could the concept of Bayes’ Theorem be relevant here?

Yes, if the question asked for the probability that a chosen coin was fair given that two flips were the same, then Bayes’ Theorem would be the method of choice. In that situation, we would use:

P(coin is fair | same side) = [P(same side | coin is fair)*P(coin is fair)] / P(same side)

We already know the ingredients for that calculation. This is a typical way to show how prior and conditional probabilities interplay, which is particularly important in many Bayesian inference problems.

Would the result differ if we flipped the coin more than two times?

Yes, if we consider more flips, then the probability of getting the same side on every flip changes. For example, with the fair coin, the probability of getting all heads or all tails with three flips is (1/8 + 1/8) = 1/4. With the biased coin, that probability is (3/4)^3 + (1/4)^3 = 27/64 + 1/64 = 28/64 = 7/16. We would then calculate the overall probability with the updated probabilities and still weigh them by the coin selection probabilities. The general pattern holds, but the actual numeric result changes according to the number of flips.

Are there any tricky pitfalls in this kind of problem?

A common pitfall is to forget to weight by the probability of selecting each coin or to incorrectly assume the probability for the biased coin. Another mistake is to double-count outcomes or confuse the total probability. Carefully partitioning the event “same side” by the event “choice of coin” avoids these issues.

All these considerations show why it is crucial to think systematically about conditional probabilities and how they interact with prior probabilities.

Below are additional follow-up questions

What if we observe two flips showing one head and one tail, but not necessarily in order (could be HT or TH)? How do we update the probability that we selected the fair coin?

To approach this problem, we can invoke the law of total probability and Bayes’ Theorem. Let F denote the event that the chosen coin is fair, and B denote that the chosen coin is the biased one. Let O denote the observation that we get one head and one tail in two flips (regardless of order).

We know:

P(F) = 1/2
P(B) = 1/2

From our original analysis:

For the fair coin, the probability of one head and one tail in two flips is 1/2.
For the biased coin (probability of heads = 3/4), the probability of one head and one tail in two flips is 3/8 (because the probability of HT is (3/4)*(1/4) = 3/16 and similarly for TH, so total is 6/16 = 3/8).

First, we compute P(O) using the law of total probability:

P(O) = P(O | F)P(F) + P(O | B)P(B) = (1/2)(1/2) + (3/8)(1/2) = 1/4 + 3/16 = 7/16.

We then use Bayes’ Theorem:

P(F | O) = [P(O | F)*P(F)] / P(O).

Substituting the values:

P(F | O) = [(1/2)(1/2)] / (7/16) = (1/4) / (7/16) = (1/4)(16/7) = 4/7.

Hence, if we observe one head and one tail in any order, the posterior probability that the chosen coin is the fair coin is 4/7. A subtlety here is to keep track of different permutations (HT or TH) and handle them carefully for each coin’s probability. The main pitfall is forgetting to consider both orders or mixing up these probabilities when applying Bayes’ Theorem.

How would the result change if the coin flips are not independent (e.g., a physically worn-out coin might give correlated flips)?

In our standard model, we assume that each flip outcome is independent. But in a real-world situation with a worn-out coin, the flips could be correlated. Perhaps if the coin lands on heads once, it might be slightly bent, affecting the chance of landing heads on the next flip, introducing dependence between flips.

If flips are correlated, we can no longer multiply probabilities directly (like P(Heads)*P(Heads)) to get the chance of two heads in a row. Instead, we would need to specify or estimate the conditional probability of the second flip given the first. For instance, if we define:

Probability of heads on the first flip as p.
Probability of heads on the second flip as p2 if the first flip was heads, and q2 if the first flip was tails.

Then the calculations become more complex. A pitfall is to carry on using the usual multiplication approach that assumes independence, which would be incorrect in the presence of correlation. In practice, you would either gather empirical data or specify a model (e.g., a Markov chain) to capture the correlation structure.

How can we extend this problem to multiple coins with different biases and still compute the probability of matching outcomes?

If we have multiple coins C1, C2, C3, … with different biases p1, p2, p3, … for heads, we can define:

Probability of choosing coin Ci as w_i (where the w_i sum to 1).
Probability of matching outcomes (two flips the same) with coin Ci as p_i^2 + (1 – p_i)^2.

The overall probability of the same side in two flips is then:

sum over i of ( w_i * [ p_i^2 + (1 – p_i)^2 ] ).

A potential pitfall is mislabeling or mixing up the biases p_i with their respective weights w_i. Another subtlety is to ensure you sum across all coins, and your w_i must sum to 1. Real-world issues arise if you do not accurately know each coin’s probability of heads—then you must estimate or measure them, introducing uncertainty and possible Bayesian modeling for unknown biases.

What are potential real-world complications if you have only a small number of flips to estimate the biased coin’s probability (instead of knowing it’s 3/4 a priori)?

In reality, you often do not know the exact bias (3/4 in our theoretical case) and might only observe a few coin flips to approximate it. Common pitfalls:

Overfitting the bias: For example, if you see 3 heads out of 4 flips, you might assume p=3/4 right away, but your confidence should reflect that the sample size is small.
Not using confidence intervals or Bayesian posteriors: A single set of flips may not give a stable estimate of the bias, so ignoring confidence intervals can lead to overly certain conclusions.
Sampling bias or incomplete trials: If conditions change halfway (humidity, flipping method, etc.), your earlier flips might not reflect the same physical process for later flips.

An interviewer might want to see if you recognize that real-world data is subject to noise and that your prior knowledge about the coin’s bias could be updated using Bayesian or frequentist methods (e.g., Beta distributions in Bayesian approaches).

How might you approach this problem if you needed not just the probability of matching sides, but also the distribution of run lengths (like HHHT, TTTH, etc.) over multiple flips?

When exploring run lengths—consecutive occurrences of the same face—you need a more intricate model tracking states like “currently on a run of k heads.” For example, with a Markov process approach:

States might represent “length of current run of heads” or “length of current run of tails.”
Transition probabilities depend on the coin’s bias.

From these states, you can calculate the distribution of run lengths over many flips. Potential pitfalls include:

Failing to account for transitions between runs of heads and runs of tails accurately.
Confusing run length probabilities with single-flip probabilities.

In practice, you’d either set up a Markov chain or a dynamic programming approach to track the probabilities systematically.

How does the “law of total probability” generalize if we consider continuous distributions for the coin’s bias, rather than a discrete scenario (fair vs. 3/4)?

When dealing with a continuous prior over the coin’s bias (for example, p ~ Beta(α, β)), we can express the probability of the observed flips by integrating over all possible bias values. Formally, you would replace the sum in the discrete case with an integral over p from 0 to 1, weighting by the prior distribution Beta(α, β) and the likelihood of the observed data. The main pitfalls in a continuous setting are:

Incorrectly assuming a discrete model when the underlying distribution might be better treated as continuous.
Mistakes in setting up or evaluating the integral (for instance, forgetting normalization constants in a Bayesian framework).

This approach is extremely important in Bayesian analysis, where p is unknown, and you have a distribution over possible values of p.

How do we handle a scenario where both coins are biased, but we don’t initially know which coin has which bias?

A trickier scenario arises if we have two coins, each with an unknown or different known bias p1, p2, and we do not know which coin is which. If we choose a coin at random, there is also a random unknown label. We might represent the selection probability as 1/2 for “Coin 1 or Coin 2,” but we also have to incorporate that the identity (which is p1 vs. p2) might be unknown. Pitfalls:

Mixing up biases: When analyzing flips, you might inadvertently assign outcomes to the wrong coin.
Not properly marginalizing over which coin you have selected: If you do not properly sum or integrate over possible coin identities, your probabilities will be incorrect.

In real life, you might try flipping each coin a few times to identify which has bias p1 or p2. But if you cannot label them, you must incorporate that uncertainty into your probability model.

How do errors in measurement or human error affect the calculation?

In a physical experiment, human error can creep in:

Miscounting heads or tails.
Mixing up which coin is which.
Possibly flipping the coin in a biased manner.

All these factors can lead to a misestimation of actual probabilities. For instance, if you record a head instead of a tail by mistake, you affect the empirical frequencies. A robust analysis might include repeated experiments, independent verifications, or inter-rater agreement (in medical or psychological contexts, for instance) to minimize these potential errors.

The takeaway is that in a purely theoretical setting, we ignore measurement error or confusion in labeling. But in real-world experiments, you have to plan for data collection procedures that are resilient to human mistakes or incomplete logging.

Rohan's Bytes

Discussion about this post