ML Interview Q Series: Using Bayes' Theorem to Solve the Two-Headed Coin Identification Problem.
Browse all the Probability Interview Questions here.
I have in my pocket ten coins. Nine of them are ordinary coins with equal chances of coming up head or tail when tossed, and the tenth has two heads.
(a) If I take one of the coins at random from my pocket, what is the probability that it is the coin with two heads? (b) If I toss the coin and it comes up heads, what is the probability that it is the coin with two heads? (c) If I toss the coin one further time and it comes up tails, what is the probability that it is one of the nine ordinary coins?
Short Compact solution
(a)
(b) We apply Bayes’ Theorem. The overall probability of getting a head is 11/20, because the double-headed coin contributes heads with probability 1 while each of the nine fair coins contributes heads with probability 1/2. Hence,
(c) If the next toss comes up tails, it cannot possibly be the double-headed coin. Therefore, it must be one of the nine ordinary coins (probability 1).
Comprehensive Explanation
We have 10 coins total: 9 fair coins (with probability of heads = 1/2 for each) and 1 special coin that always lands heads. We define the following events in plain text:
D: The chosen coin is the double-headed one.
D^c: The chosen coin is any one of the nine fair coins.
H: The event that a coin toss results in heads.
T: The event that a coin toss results in tails.
Part (a)
We pick one coin at random from the 10 coins. Because there is only 1 coin with two heads among 10 coins, the probability that we pick the double-headed coin is simply 1/10.
Part (b)
Once we see that the chosen coin produces heads on a single toss, we want the probability that this coin is the double-headed one. By Bayes’ Theorem:
Below is how each term fits in:
P(H|D): Probability of getting heads if the coin is double-headed. This equals 1 because a double-headed coin always lands heads.
P(D): Probability we selected the double-headed coin initially, which is 1/10.
P(H): Overall probability of getting heads from a random selection of any coin and a single toss. We can use the law of total probability:
Concretely:
P(H|D) = 1.
P(D) = 1/10.
P(H|D^c) = 1/2, since an ordinary fair coin yields heads with probability 1/2.
P(D^c) = 9/10.
Thus,
P(H) = (1)(1/10) + (1/2)(9/10) = 1/10 + 9/20 = 1/10 + 9/20 = 2/20 + 9/20 = 11/20.
Substituting back:
P(D|H) = [1 * (1/10)] / (11/20) = (1/10) / (11/20) = (1/10) * (20/11) = 2/11.
Hence, seeing one head on a single toss updates the probability that we are holding the double-headed coin to 2/11.
Part (c)
If we toss that same coin again and observe tails on the new toss, it can no longer be the double-headed coin (which always lands heads). Therefore, it must be one of the nine fair coins with probability 1. In other words, the moment we see tails, the event “picked the double-headed coin” is ruled out entirely.
Potential Follow-up Questions
How would the answer change if the special coin had a probability p of landing heads (rather than 1)?
In that scenario, the double-headed coin would be replaced by a biased coin (with heads probability p). You could generalize the above steps:
Let D be the event that the chosen coin is the biased one with probability p of heads.
P(D) = 1/10.
The probability of heads on any toss if we randomly pick a coin becomes (p)(1/10) + (1/2)(9/10).
From that, apply Bayes’ Theorem to compute P(D|H).
This shows how flexible the approach is: just swap out the appropriate values for p, and the rest follows through the same logic.
What if there were multiple coins with different probabilities of heads?
You could again apply the law of total probability by summing across all coins with their respective probabilities of landing heads, weighted by their selection probabilities. Then, for updating P(D|H), you would still use Bayes’ Theorem, but you would expand P(H) to account for each coin’s head probability multiplied by its prior probability of selection.
Why do we say the probability that the coin is one of the nine ordinary coins is 1 after observing a tail?
Since a coin that shows a tail cannot be the double-headed coin, we have essentially disqualified the double-headed possibility. In probability terms, P(D|T) = 0. Consequently, P(D^c|T) = 1. This logic depends on the assumption that tails is possible only from fair coins, which is obviously correct if the double-headed coin can never produce tails.
How do we think about repeated tosses and updated probabilities?
Bayes’ Theorem can be used iteratively. After each toss, you update the probability that you have the double-headed coin (or any other type of coin). If you keep seeing heads in many consecutive tosses, it becomes increasingly likely that the coin is special. Conversely, seeing a tail even once immediately tells you you do not have the double-headed coin.
Could a real double-headed coin still produce a tail due to error or coin flipping anomalies?
Strictly speaking, in this idealized probability puzzle, we assume the double-headed coin cannot produce tails. In a real-world scenario, physical imperfections might introduce extremely small probabilities for unexpected results. But for standard textbook probability problems, we treat the coin sides as definitive for heads/tails outcomes.
These kinds of details show your deeper awareness of how real-world nuances can shape or slightly alter the idealized theoretical results.