ML Interview Q Series: Coin Toss Head Count Comparison: Probability and Symmetry Explain the 50% Result.
Browse all the Probability Interview Questions here.
Pete tosses n + 1 fair coins and John tosses n fair coins. What is the probability that Pete gets more heads than John? First solve it for n = 1 and n = 2, then find the probability for the general case.
Short Compact solution
For n = 1, we look at all 8 possible outcomes of the 3 coin tosses (the first 2 for Pete, and 1 for John). Exactly 4 out of these 8 outcomes yield Pete having more heads, giving 4/8 = 1/2.
For n = 2, there are 32 possible outcomes of the 5 coin tosses (the first 3 for Pete, and 2 for John). Exactly 16 out of these 32 yield Pete having more heads, again giving 16/32 = 1/2.
In the general case (where Pete tosses n + 1 coins and John tosses n coins), there are 2^(2n+1) total ways to flip all these coins. The probability that Pete gets more heads than John is given by
Evaluating this sum for various n shows it always equals 1/2. A simple argument is that the total number of coins is odd, so the events “Pete gets more heads” and “Pete gets more tails” partition the entire sample space and have equal probability; hence each is 1/2.
Comprehensive Explanation
Fundamental Setup
We have two players:
Pete, who flips n + 1 fair coins.
John, who flips n fair coins.
A fair coin has a 1/2 chance of landing Heads (H) or Tails (T). Because Pete flips one extra coin compared to John, the total number of coins flipped is 2n + 1, which is an odd count.
Direct Combinatorial Argument
One way to compute the probability that Pete has more heads than John is to count all outcomes of flipping (n + 1) + n = 2n + 1 coins. Each configuration can be described by the number of heads Pete obtains (denote it by random variable X) and the number of heads John obtains (denote it by random variable Y).
The probability we want is P(X > Y). Symbolically, one might write:
Here is the meaning of each symbol in that formula:
2^(2n+1) is the total number of ways to flip 2n + 1 fair coins.
binom{n}{k} counts how many ways John’s n coins can contain exactly k heads.
binom{n+1}{j} counts how many ways Pete’s n + 1 coins can contain exactly j heads.
The inner sum from j = k+1 to n+1 enforces that Pete’s heads j strictly exceed John’s heads k.
When you evaluate this double sum for any n, it simplifies to 1/2.
Symmetry Argument
An even more insightful explanation uses symmetry:
Let A be the event “Pete gets more heads than John.”
Let B be the event “Pete gets fewer heads than John.”
Because there are 2n + 1 coins total (an odd number), there is no possibility that they end up with exactly the same number of heads. Therefore, A and B form a partition of all possible outcomes (i.e., they cover the entire sample space and do not overlap). Hence:
P(A) + P(B) = 1
By symmetry, there is no intrinsic bias favoring Pete over John in terms of heads, nor John over Pete, except that Pete has an extra coin—but since the overall total is odd, exactly one of them must have more heads. This argument shows P(A) = P(B) = 1/2, establishing that the probability is 1/2 for all n.
Checking Small Cases
For n = 1: Total of 3 coins (Pete has 2, John has 1). The sample space has size 8. Counting exactly how many ways Pete can exceed John in heads yields 4 out of 8.
For n = 2: Total of 5 coins. The sample space has size 32. Counting how many ways Pete can exceed John in heads yields 16 out of 32.
Both cases confirm the probability is 1/2.
Practical Simulation Illustration
While the analytical argument is concise, we can also verify numerically by simulating many trials in Python.
import random
def simulate_pete_vs_john(n, trials=10_000_00):
more_heads_count = 0
for _ in range(trials):
# Pete flips n+1 coins
pete_heads = sum(random.choice([0, 1]) for _ in range(n+1))
# John flips n coins
john_heads = sum(random.choice([0, 1]) for _ in range(n))
if pete_heads > john_heads:
more_heads_count += 1
return more_heads_count / trials
# Test for a few values of n
for test_n in [1, 2, 5, 10]:
est_prob = simulate_pete_vs_john(test_n)
print(f"n={test_n}, Estimated Probability Pete > John = {est_prob:.4f}")
If you run the code for large numbers of trials, you will see the simulation value hover close to 0.5 for each n.
Potential Follow-up question 1
Why is there no chance for a tie when the total number of coins is odd?
To see this, note that Pete has n + 1 coins and John has n coins, so altogether there are 2n + 1 coins. If an outcome resulted in an equal number of heads for Pete and John, then the total number of heads (X + Y) would be 2k for some integer k, which is even. However, X + Y can range from 0 to 2n+1. Because we are looking at exactly 2n + 1 distinct flips, the probability that X = Y would require the total number of heads to be an even integer split evenly between the two. But the difference in the number of coins each person has (Pete has one more coin) makes an exact tie in heads impossible. Hence no ties can occur in the distribution of heads.
Potential Follow-up question 2
Would the probability still be 1/2 if the coins were not fair?
If each coin had the same bias p for landing heads and 1 - p for tails, then the problem becomes more subtle. The symmetry argument used above relies on the assumption that each coin is fair (p = 1/2). For biased coins, the probability distribution shifts. In that scenario, Pete’s extra coin might give a stronger advantage or disadvantage depending on p. Specifically, the probability of Pete having more heads is generally not 1/2 unless p = 1/2. You would then need to calculate:
P(X > Y) = sum over all k and j of [ binomial(n, k) p^k (1-p)^(n-k) * binomial(n+1, j) p^j (1-p)^(n+1-j) ] where j > k
That sum will not, in general, simplify to 1/2 for p != 1/2.
Potential Follow-up question 3
How does this result connect to the concept of random walks or reflection principles?
In some probability arguments, especially those involving random walks, one can use a reflection (or symmetry) principle to conclude that half of the paths lead above a certain boundary and half lead below it—given certain conditions like no possibility of ties. Here, the reflection principle idea is that for each sequence of coin tosses where John has more heads, you can reflect a certain part of that sequence to obtain a sequence where Pete has more heads, and vice versa. Because the total number of coin flips is odd, that reflection pairs up outcomes in a one-to-one manner, each pair consisting of “Pete has more heads” vs. “Pete has fewer heads.” This deepens the understanding of why the probability is 1/2.
Potential Follow-up question 4
Could this intuition help us in other probability problems where one person flips an extra coin?
Yes. Any time there are two players flipping coins, and one flips one more coin than the other, the same logic applies for fair coins: the probability that the person with the extra coin gets more heads (or similarly, more tails) is 1/2. The main requirement is that the coins are fair and that the only difference is a single coin advantage for one player. This principle can be adapted to a wide range of combinatorial scenarios where an “odd one out” break in symmetry leads to an exact 1/2 probability split.
Below are additional follow-up questions
What if we want the exact probability distribution of (X − Y), where X is the number of heads Pete gets and Y is the number of heads John gets?
Because X is the sum of n+1 independent Bernoulli(1/2) random variables and Y is the sum of n independent Bernoulli(1/2) random variables, one could define Z = X − Y. Although we know P(X > Y) = 1/2, exploring the entire distribution of Z can be interesting for deeper insights:
First, X follows a Binomial(n+1, 1/2) distribution, and Y follows a Binomial(n, 1/2) distribution.
Z can theoretically range from −n to + (n+1).
The probability P(Z = z) for a specific z is given by summing over all ways that X can take a value x and Y can take a value y such that x − y = z. Formally, for x in {0,...,n+1} and y in {0,...,n}, you include terms binomial(n+1, x)*(1/2)^(n+1)binomial(n, y)(1/2)^n whenever x − y = z.
While we know the integral probability P(X > Y) is 1/2, Z’s distribution itself can highlight interesting features such as most probable differences (e.g., 0 or 1), tails of the distribution, etc.
A subtle pitfall in real-world scenarios is that if data about individual coin flips is partially missing or erroneous (e.g., some flips are unobserved), recovering the distribution of Z might require Bayesian inference or imputation techniques. In actual data settings, you might not simply be able to apply the direct binomial model without addressing missing data issues.
How would we handle a scenario where the coins are fair but some of Pete's flips are lost, and we only observe the final tally?
Real data often introduces challenges like partial observability. Suppose we only know how many heads John gets, but for Pete, we lose the exact count for a subset of flips, or we only know partial outcomes. In this scenario:
If we do not know the full outcome of Pete’s n+1 tosses, we must model the missingness carefully. Typically, we assume a missing-at-random scenario and use a likelihood-based or Bayesian approach, incorporating the known partial outcomes for Pete and the complete outcomes for John.
A common method is to treat the unobserved flips as random variables drawn from Bernoulli(1/2), then integrate (or sum) over all possible unobserved states. This often involves constructing a posterior distribution for the number of heads Pete could have obtained.
A key pitfall is incorrectly assuming the missing flips do not affect probabilities. If the missing flips systematically occur (e.g., we only lose data in certain conditions), that biases the inference. Proper treatment requires a model for how data is lost.
This highlights that the clean, theoretical 1/2 result relies on complete and correct observation of all coin flips.
Could we extend the problem to more than two players, each flipping different numbers of coins?
If we have multiple players—say, Pete flips n+1 coins, John flips n coins, Mary flips n−1 coins, and so on—the question becomes: “What is the probability that Pete has the most heads among all players?” The original argument that relies on pairing events (Pete has more vs. Pete has fewer) no longer directly applies, because we now must compare Pete’s heads to multiple opponents’ counts.
To compute Pete’s chance of having strictly the greatest number of heads:
We would need to enumerate or integrate across all possible head counts of each player (a multinomial-style distribution) and then consider only those outcomes where Pete’s heads exceed each other player’s heads.
For large numbers of players or large n, a direct combinatorial enumeration becomes infeasible. One might resort to approximate or asymptotic methods:
Normal approximation using the Central Limit Theorem for each player’s binomial distribution.
Monte Carlo simulation to estimate the probability.
A subtle pitfall in real-world contexts is correlating coin flips across players. If, for example, some flips are shared or correlated (not truly independent across players), the distribution changes dramatically and the straightforward binomial-based or multinomial-based approach breaks down.
How can the result be generalized if each coin has its own unique probability of landing heads?
In a more complex real-world situation, each coin might have its own bias. For instance, imagine a bag of coins, each slightly different in weight or shape, leading to distinct p_i for the i-th coin. Then:
Pete’s probability of getting j heads would be the sum over all combinations of j coins landing heads among his n+1 coins, each with its own p_i.
Similarly, John’s distribution for k heads among his n coins would be computed from his subset of coins with their respective probabilities.
Determining P(X > Y) then requires convolving these two complicated distributions.
A key subtlety is that the symmetry argument is completely lost if each coin is distinct. The probability might still be close to 1/2 if, on average, the biases are around 1/2 and Pete’s extra coin is also near 1/2, but this is no guarantee. Practically, one would likely use numerical approaches:
Dynamic programming or recursive calculations to find the distribution of heads for Pete and John.
Monte Carlo simulation, especially when n is large, to approximate P(X > Y).
The pitfall here is blindly applying the 1/2 result in a situation with heterogeneous coin biases. Real manufacturing processes often produce coins with minute but real bias variations, so the exact probability might deviate from 1/2 if these differences are significant.
Is there an intuitive reason why the difference of one coin leads to a 1/2 split, instead of something like 1/2 ± 1/(some factor)?
An intuitive understanding uses the idea that with an odd total number of coins, exactly one player must have more heads. Since each coin is fair, there is no inherent reason to believe that one player is favored to have more heads across the entire sample space, other than the fact that one is flipping an extra coin. But that “extra coin” precisely ensures that in every outcome, either Pete surpasses John or John surpasses Pete; there is no tie. Because the entire probability mass is thus partitioned equally between “Pete has more heads” and “John has more heads” by symmetry, each event is 1/2.
A subtle mental pitfall is to think that having one extra coin might “slightly” boost Pete’s odds. Indeed, if there were any possibility of ties (say if both had the same number of coins), we might see 1/2 as the probability of X > Y but with some portion allocated to X = Y. Here, the “extra coin” is precisely what removes the tie scenario entirely and splits the space into two symmetric halves.
How does the result change if John is allowed to reflip any tails once, while Pete gets no reflips?
A tricky variation is if John is given a second chance to flip any tails once. This fundamentally alters the distribution of heads John ends up with:
Let Y' be John’s new random variable representing his final count of heads after the reflips.
Even though John initially flips n fair coins, each tail is then flipped again, giving a higher expected head count than simply Binomial(n, 1/2).
Formally, for each coin, there is a 1/2 chance of heads on the first flip. If it is tails (1/2 chance), John flips it again for another 1/2 chance of heads. This leads to an effective probability of heads = 1 − (1/2)*(1/2) = 3/4 for each coin. So Y' is Binomial(n, 3/4).
Pete’s distribution remains Binomial(n+1, 1/2). We can no longer expect the probability P(X > Y') to be 1/2. In fact, John is now more likely to achieve a higher proportion of heads because each coin can be flipped twice if the first flip is tails.
One subtlety is that we are still dealing with an odd total number of coin flips (Pete’s n+1 plus John’s n initial flips plus possible reflips of John’s tails). But the total number of Bernoulli trials for John might vary from one sample to another because some might get reflipped while others remain as heads on the first try. This complexity undermines the simple symmetry argument.
A real-world pitfall would be misapplying the original 1/2 conclusion, ignoring the advantage gained by repeated flips. Once the distribution of heads changes for John, the entire combinatorial symmetry collapses.
Can the Central Limit Theorem approximate the 1/2 result for large n, and are there accuracy pitfalls?
For large n, each of X = number of heads for Pete and Y = number of heads for John can be approximated by a normal distribution:
X ~ Normal(mean = (n+1)/2, variance = (n+1)/4)
Y ~ Normal(mean = n/2, variance = n/4)
If we define D = X − Y, then D is approximately Normal with mean = (n+1)/2 − n/2 = 1/2 and variance = (n+1)/4 + n/4 = (2n+1)/4.
We are interested in P(D > 0). The normal distribution with mean 0.5 and variance (2n+1)/4 is centered around 0.5, so the probability that a realization of D is greater than 0 is roughly the probability that a Normal(mean=0.5, std_dev=sqrt((2n+1)/4)) is positive. As n grows large, the difference of 0.5 becomes small compared to the standard deviation, which scales with sqrt(n). In the limit, the distribution becomes wide, but the symmetrical shape still suggests that the probability is close to 1/2.
A pitfall arises if n is not sufficiently large, or if the normal approximation is used incorrectly (for instance, ignoring the discrete nature of binomial distributions). For moderate n, the exact combinatorial or a more precise approximation (such as the continuity correction to the normal approximation) might be needed to get an accurate estimate.
In real-world settings, if the sample size is large, the normal approximation works reasonably, and you will see P(X > Y) remain extremely close to 1/2, consistent with the exact combinatorial result.
What would happen if we were interested in the event "Pete has at least as many heads as John" instead of strictly more heads?
This would change the question to P(X >= Y). However, if Pete flips n+1 coins and John flips n coins, ties are possible if they both have the same number of heads X = Y only if X and Y do not occupy all integer points from 0 to n or 0 to n+1 in a symmetrical way. Actually, in our original scenario with an odd total number of coins, a tie is not possible if they had the same number of coins. But since Pete has one extra coin, we need to see if a tie can happen:
Let X be the number of heads for Pete and Y be the number of heads for John.
Because John has n coins, Y can be at most n. Because Pete has n+1 coins, X can be at most n+1.
For X and Y to be equal, we would need X = Y = some integer k. This is indeed possible if, for instance, Pete gets k heads out of n+1, and John gets k heads out of n (assuming k ≤ n). So ties are not automatically excluded when the number of coins differs by one—it depends on the actual values.
Hence, if the event is X >= Y, that includes X = Y and X > Y. The probability X >= Y is necessarily more than 1/2. In fact, one can show:
P(X >= Y) = 1/2 + 1/(2 * (2^(n+1))) * [some counting term],
or more simply, P(X >= Y) = 1 - P(X < Y). But we already know P(X > Y) = 1/2. Since X = Y is a small but nonzero probability, we get P(X >= Y) = 1/2 + P(X = Y).
The subtle pitfall is to automatically assume that "Pete has at least as many heads" might also be 1/2. In truth, it will be strictly greater than 1/2, because X = Y is an extra event on top of X > Y. Computing that tie probability explicitly requires enumerating all ways that Pete can match John’s head count.