ML Interview Q Series: Expected Value Approach: Predicting Heads from Fair and Biased Coins

May 26, 2025

Browse all the Probability Interview Questions here.

You have a collection of coins: x are normal (fair), y have heads on both sides, and z have tails on both sides. If you randomly draw a coin (with replacement) 100 times and flip each one, how many heads do you expect to observe in total?

Comprehensive Explanation

When there is a bag containing different types of coins, each draw from the bag can be viewed as a random trial where the chance of picking each type of coin depends on x, y, and z. Specifically, the probability of drawing a fair coin is x / (x + y + z), a coin with both sides heads is y / (x + y + z), and a coin with both sides tails is z / (x + y + z).

Once a particular type of coin is drawn, the probability of flipping a head depends on the coin type:

For a fair coin, the probability of landing heads is 1/2.
For a double-headed coin, the probability of landing heads is 1.
For a double-tailed coin, the probability of landing heads is 0.

The expected value of the number of heads for a single draw-and-flip is the sum of each possible outcome’s probability multiplied by its respective payoff (0 or 1 head):

Probability of heads in a single trial = (x / (x + y + z)) * 1/2 + (y / (x + y + z)) * 1 + (z / (x + y + z)) * 0

This simplifies to x / (2(x + y + z)) + y / (x + y + z), which can be expressed as (x + 2y) / (2(x + y + z)).

Since there are 100 independent draws (with replacement), each having the same probability of resulting in heads, the total expected number of heads is 100 times the single-trial expectation.

Here, x is the count of fair coins, y is the count of double-headed coins, z is the count of double-tailed coins, and (x + y + z) is the total number of coins in the bag.

This expression simply multiplies the probability of getting heads on a single flip (given random draw of coin) by the total flips (100).

Below is a short Python code snippet that gives an empirical demonstration of this concept by simulating many coin draws and flips:

import random

def simulate_heads(x, y, z, draws=100, trials=10_000_00):
    """
    x: number of fair coins
    y: number of double-headed coins
    z: number of double-tailed coins
    draws: number of draws (and flips) per single simulation
    trials: how many times to repeat the experiment to approximate expectation
    """
    total_heads = 0

    # Precompute probabilities for convenience
    total_coins = x + y + z
    prob_fair = x / total_coins
    prob_double_head = y / total_coins
    # prob_double_tail = z / total_coins  # not needed directly for simulation

    for _ in range(trials):
        heads_count = 0
        for _ in range(draws):
            # Draw a coin
            coin_type = random.random()
            if coin_type < prob_fair:
                # Fair coin
                flip_result = random.choice(['H', 'T'])
                if flip_result == 'H':
                    heads_count += 1
            elif coin_type < prob_fair + prob_double_head:
                # Double headed coin
                heads_count += 1
            else:
                # Double tailed coin
                pass  # heads_count remains unchanged (0 for this flip)
        total_heads += heads_count

    return total_heads / trials

# Example usage
x_val, y_val, z_val = 3, 2, 1
empirical_estimate = simulate_heads(x_val, y_val, z_val, draws=100, trials=100_000)
theoretical_estimate = (100 * (x_val + 2 * y_val)) / (2 * (x_val + y_val + z_val))

print(f"Empirical Estimate of Heads: {empirical_estimate}")
print(f"Theoretical Expected Heads:  {theoretical_estimate}")

What If x, y, and z Are Extremely Large?

When x, y, and z are very large, the ratio x/(x+y+z), y/(x+y+z), and z/(x+y+z) might be influenced by numerical precision if computed directly in code. To handle such large values, it is best to use a data type with sufficient precision (e.g., float in Python can handle quite large numbers but watch out for extremely large magnitudes). At a conceptual level, the formula remains exactly the same, relying on the relative proportions of x, y, and z.

How Does Drawing With Replacement Affect This Outcome?

Because each draw is replaced, every new draw from the bag is an independent event with the same probability distribution over x, y, and z. This means we can simply multiply the single-draw expected number of heads by the total draws (100). If the draws were without replacement, the probabilities would shift after each draw, and we would need to account for the changing composition of the bag with each draw.

Could There Be Any Bias If We Don’t Shuffle Or Mix The Bag Properly?

In a purely theoretical sense, as long as each coin is equally likely to be selected, there is no bias. In physical implementations, it is essential to ensure each coin is indeed equally likely. In an algorithmic simulation, using a uniform random draw from x + y + z coins ensures no systematic bias.

If One Coin Type Greatly Dominates?

If y (the double-headed coins) is very large compared to x and z, the expected number of heads will be significantly closer to 100 because almost every draw yields a coin that lands on heads. Conversely, if z (the double-tailed coins) dominates, the expected heads will be closer to 0. Hence, the formula elegantly captures these edge cases by giving nearly 100 heads or 0 heads in extreme scenarios.

How To Generalize To Weighted Coins?

In more advanced settings, you might have coins that are not necessarily 50-50 fair or 100-0 biased. You could assign each coin type a distinct probability p_i of coming up heads. Then the expected value of heads in one flip from a randomly chosen coin becomes a weighted sum of each p_i times the probability of selecting that coin. The rest of the logic is the same, but the expression for heads probability in a single draw changes accordingly, and then you multiply by the number of draws.

Practical Pitfalls In Implementation

Failure to handle floating-point arithmetic carefully can introduce minor errors in large-scale simulations. Also, conceptual mistakes like mixing up the probability of heads for fair coins (1/2) with the probability of picking a fair coin can lead to incorrect results. Verifying the final answer with a simulation is a good way to catch such mistakes.

Below are additional follow-up questions

What if the composition of the bag (x, y, z) changes midway through the 100 draws?

When you begin with a certain number of fair coins (x), double-headed coins (y), and double-tailed coins (z), the probability of drawing a particular type is x/(x+y+z), y/(x+y+z), or z/(x+y+z). However, if something alters the counts of x, y, or z after some draws—imagine discovering a damaged coin and removing it, or adding new coins with a different bias—the underlying distribution shifts.

In that case, you cannot simply multiply the old single-trial probability of heads by the total number of draws. Instead, you must break the process into separate phases:

Phase 1: For draws that occurred before the composition changed, use x/(x+y+z), y/(x+y+z), z/(x+y+z).

Phase 2: For draws after the change, use the updated distribution, say x'/(x'+y'+z'), y'/(x'+y'+z'), z'/(x'+y'+z').

You would then compute separate expectations for Phase 1 and Phase 2 and sum them. More formally, if n1 draws happen before the change and n2 draws happen after (with n1 + n2 = 100), the overall expected heads is:

Expected heads in Phase 1 + Expected heads in Phase 2 = n1 × p_old + n2 × p_new, where p_old and p_new are the respective probabilities of flipping heads in each phase.

Real-World Pitfall: It is easy to mistakenly apply the initial probability across all 100 draws. Or you might not precisely track the exact point or the nature of the distribution shift. Proper record-keeping of each draw segment is crucial for an accurate expectation.

How do you handle situations where coins might become physically damaged, changing their probability of landing heads?

Sometimes a fair coin (initially 1/2 chance for heads) can become bent, slightly changing its probability. This is more complicated than having only three distinct types (fair, double-headed, double-tailed). Each coin could theoretically evolve its own bias p_i for heads over time.

In such scenarios, you can treat each coin’s probability of heads as a variable that might drift. A practical approach is to maintain a model for the likelihood of flipping heads given the current state of the coin. If you suspect a coin is damaged, you update its probability distribution accordingly.

For an analytical approach, you could define a set of possible states for a coin with associated probabilities and update those probabilities via Bayesian inference based on observed flips. This would make the “expected number of heads” a more dynamic computation, depending on how coins transition states.

Edge Case: If the coin’s damage severely skews it toward heads or tails, it might be akin to partially turning into a double-headed or double-tailed coin. You would need a robust way to detect such a scenario from empirical flips to avoid making flawed assumptions.

How would you estimate x, y, and z if they were initially unknown?

If x, y, and z are unknown but you observe flips, you could try to infer these parameters. One approach is maximum likelihood estimation (MLE). Let’s suppose you draw N coins (with replacement) and flip each. You observe H heads in total. From the mixture model:

p(Heads) = [x/(x+y+z)] × 1/2 + [y/(x+y+z)] × 1 + [z/(x+y+z)] × 0.

Denote p = (x + 2y)/(2(x + y + z)). Over N flips, you expect p × N heads. The likelihood function for observing exactly H heads is tied to the binomial distribution with parameter p. You then attempt to find x, y, z that maximize that likelihood subject to p matching the fraction of heads observed. However, you only have one equation from p = H/N, and three unknowns x, y, z. Additional constraints or prior knowledge (like total coin count or known ratio among them) would be needed to pinpoint a unique solution.

Pitfall: Overfitting or underdetermination if you do not have enough data relative to the complexity (three unknown counts). Another subtlety arises if x+y+z is extremely large, making the difference in p harder to measure for moderate H.

Could correlation between flips occur, and how would you manage it?

In theory, if each draw is truly independent and with replacement, flips are independent of each other. But in reality, if the way you flip coins or the environment introduces a correlation (for example, if flipping is done in a manner that biases consecutive flips in the same way), the assumption of independence might break.

To handle correlation, you need a model that captures it—maybe a Markov chain or a hidden-state model that explains how a run of heads or tails in your flipping process can influence subsequent flips. The expected number of heads can still be computed, but you must account for the joint probability distribution rather than a simple product of marginal probabilities.

Edge Case: If the flipping device is malfunctioning or you have a pattern in how you select coins (even though it is said to be random with replacement), correlation creeps in. Testing for independence by running a goodness-of-fit test or checking auto-correlation in the flip outcomes can help detect such anomalies.

If you flip each drawn coin multiple times before replacing it, how does that change the expected number of heads?

Previously, you draw one coin and flip it exactly once. If instead you decide to flip that same coin k times before putting it back, the probability of each flip being heads (for a given coin type) does not change. But now the expected number of heads per draw is k times the probability of heads for that coin type.

Hence, for a single draw, if you pick a coin type with heads probability p_coin, you expect k × p_coin heads from that draw. Summing over the probability of each coin type:

Overall p_coin = 1/2 for fair coins, 1 for double-headed coins, and 0 for double-tailed coins. If the bag proportion for fair coins is α, double-headed is β, and double-tailed is γ (with α+β+γ=1), then your expected heads from one draw of k flips is:

k × [ α × (1/2) + β × 1 + γ × 0 ].

So if you do N draws in total, each with k flips, that is N × k flips. The expected total heads becomes N × k × [ α × (1/2) + β × 1 ].

Pitfall: People sometimes forget to multiply the single-flip probability by k when the same coin is flipped multiple times. Also, k flips of the same coin share the same underlying bias, so the variance in the total heads per draw is affected differently than if you flip different coins each time.

How to compute the variance and confidence interval of the total heads?

If we define p = (x + 2y)/(2(x + y + z)) as the probability of getting a head on any single draw-and-flip, then for n = 100 draws (with replacement and independence):

where n=100 and p is as above. This formula follows from the binomial distribution’s variance. If you want a confidence interval, you could use a normal approximation (when n is large):

Confidence interval ~ H ± z * sqrt(n * p * (1 - p)),

where H is the observed heads count (random variable in practice) and z is the appropriate z-score for your confidence level (for example, 1.96 for 95% confidence).

Potential Pitfall: When p is close to 0 or 1, the normal approximation becomes less accurate, and other methods (e.g., exact binomial confidence intervals) might be more appropriate. Another real-world subtlety is that flips might not be truly independent, which can invalidate the straightforward variance calculation.

What if the number of draws (100) is small compared to the total variety of coins?

When n=100 but the bag is extremely large and diverse (huge x, y, z), you are sampling a very small fraction of the bag’s complexity. The law of large numbers helps us only when n is also large. With n=100, your observed heads might deviate significantly from the theoretical mean if x+y+z is enormous.

Yet, the expectation formula itself remains correct for an “infinite” bag, since drawing with replacement preserves the same distribution at each draw. Practically, if n is small, your empirical estimate might show more variance, so the actual realized heads could be off from the theoretical expectation by a larger margin.

Edge Case: If x+y+z is extremely large but the ratio x : y : z is not well-known or if it’s changing, the small sample might not accurately reflect the true composition. Sometimes, more draws or a prior understanding of x, y, z can stabilize your estimates.

In a real-world scenario, how do you address potential mechanical or human biases during flipping?

Real coin flips can introduce slight biases based on flipping technique, spin axis, or how a person catches the coin. If those biases remain constant for all coins, a fair coin might not be strictly 1/2 probability for heads. A double-headed coin is still effectively 100% heads, but a “fair” coin might have p slightly above or below 0.5.

Practically, you could run a calibration experiment with known fair coins to measure the actual flipping heads rate. Then, use that measured rate in place of 1/2 in your probability formulas. Alternatively, you might use a mechanical coin flipper designed to reduce human-based biases.

Pitfall: Assuming perfect fairness across all real flips can lead to systematic errors. Testing the flipping device or technique is part of a thorough experiment design.

How might randomization fail if coins stick together or are not properly mixed?

When physically drawing coins from a bag, real coins can sometimes clump together or be drawn in partial groups. If your procedure inadvertently favors picking certain regions of the bag, you will not be sampling x, y, z in the correct proportions. The probability distribution x/(x+y+z), y/(x+y+z), z/(x+y+z) relies on a well-mixed bag and a truly random pick.

Countermeasure: Properly shake or stir the coins, possibly multiple times. Another approach is to number each coin from 1 to x+y+z, then use a uniform random generator to pick an index. This ensures truly uniform selection. Always be mindful of how quickly coins re-mix after each draw (with replacement) in a physical setting.

Edge Case: If the bag is small and the coins are physically jammed, or if static electricity causes some coins to stick together, you might effectively reduce the diversity of your draws, skewing your results.

Rohan's Bytes

Discussion about this post