ML Interview Q Series: Generating Fair Results from a Biased Coin Using Symmetric Flip Pairs
Browse all the Probability Interview Questions here.
You have an unfair coin with an unknown tendency for heads or tails. Explain how you could create an unbiased result (50-50 chance) using repeated flips of this coin.
Short Compact solution
Suppose the probability of landing heads is denoted by (P(H)), and the probability of landing tails is (P(T) = 1 - P(H)). A single coin flip will not yield a fair (50-50) result if (P(H)) is unknown. However, by flipping the coin twice, four outcomes arise: HH, HT, TH, TT. Note that the probabilities for HT and TH are both (P(H)\times P(T)). Therefore, we can ignore cases where both flips match (HH or TT) and only use the pairs that differ (HT or TH). Whenever we see HT, we treat that as a “fair heads” outcome; if we see TH, we treat that as a “fair tails” outcome. If HH or TT appear, we discard and flip again until HT or TH shows up.
Comprehensive Explanation
One of the most fundamental ways to transform a biased coin toss into a fair result is based on pairing flips and exploiting the symmetry in certain combined outcomes. Below is an in-depth discussion:
Key Insight
When the coin is flipped once, the probability of heads might be (p) and tails might be (1-p). Because (p) can be any unknown number between 0 and 1 (excluding trivial cases where (p=0) or (p=1)), one flip cannot guarantee an unbiased output.
By considering two consecutive flips, we produce four possible outcomes: HH, HT, TH, TT. The core observation is that the chance of HT is (p \times (1 - p)) and the chance of TH is ((1 - p) \times p). These two values are equal.
Step-by-Step Process
Flip the coin two times in succession.
If the two results are different (HT or TH), we have a fair way to decide between two equally probable events. We can map:
HT → fair heads
TH → fair tails
If the two results are the same (HH or TT), we discard this pair and flip two more times. We keep repeating until we get either HT or TH.
Practical Implementation
A simple Python function to generate a fair coin result from an unfair coin might be written as follows:
import random
# Example unfair coin that returns 'H' with probability p
def unfair_flip(p):
return 'H' if random.random() < p else 'T'
# Function that yields a fair result using the unfair coin
def fair_coin(unfair_flip_func):
while True:
flip1 = unfair_flip_func()
flip2 = unfair_flip_func()
if flip1 == 'H' and flip2 == 'T':
return 'H' # "Fair heads"
elif flip1 == 'T' and flip2 == 'H':
return 'T' # "Fair tails"
# If the flips are HH or TT, we discard and continue
In this snippet, any time we see HT or TH, we return a balanced result. If we see HH or TT, we do not return an answer but repeat the flipping process.
Expected Number of Flips
Although the process might require an arbitrary number of flips (because HH or TT can appear multiple times in a row), in expectation the algorithm does not take infinitely long. On average, you will observe HT or TH half of the time after two flips. Thus, the expected number of flips to generate one fair outcome is finite.
Edge Cases
If (p=0) or (p=1), the coin always lands the same way, making it impossible ever to observe HT or TH. In reality, such an edge case means the procedure never terminates. Mathematically, we often assume (0 < p < 1).
In real practice, it is improbable (though not impossible) to keep getting HH or TT for too many iterations. Eventually, HT or TH arises.
How would you generalize this to produce random bits for other distributions?
To generalize, you can build on the fair bit approach to sample multiple bits and combine them to represent a larger range of outcomes. For instance, if you need to pick an integer from 1 through (N) uniformly, you could repeatedly sample fair bits (0 or 1) until you have enough bits to cover the range of numbers in binary form. You would discard any bit pattern that exceeds your desired range and resample as needed, ensuring all valid outcomes remain equiprobable.
What if we want to minimize the expected number of coin flips?
Although flipping in pairs until we see HT or TH is a classic solution, researchers have investigated other “optimal” or “near-optimal” algorithms for generating unbiased bits from a biased coin. Such approaches can sometimes reduce the expected number of flips per fair bit. However, they may require more complex sequential logic. The original method (von Neumann’s extractor) is the simplest and most straightforward.
What happens if we keep observing HH or TT indefinitely?
Mathematically, there is always a nonzero probability (albeit extremely small) of not seeing HT or TH for many tries. However, as soon as (0 < p < 1), the probability of indefinite repetition of HH or TT converges to zero. Practically, you will almost certainly see a differing pair in a feasible number of flips.
Could we extend this idea for multi-sided “unfair dice”?
Yes. If you have a biased (k)-sided die, you can still generate fair outcomes using similar pairing or grouping techniques and ignoring cases that do not yield desired symmetrical probabilities. This typically involves more complicated mappings but follows the same principle of discarding any outcomes that do not share equal probability.
How does this algorithm relate to real interview or production scenarios?
In certain cryptographic or simulation contexts, the approach is used to clean biased sources of randomness. It also forms a basic example of how to apply probability theory to develop fair procedures from imperfect sampling tools. While this might not be common in day-to-day machine learning pipelines, it is an elegant demonstration of leveraging fundamental probability properties in practice.
Below are additional follow-up questions
What if the coin is not memoryless? Could that break our approach?
If the coin flips are not independent—for instance, if the outcome of a flip depends in some way on previous flips—then the fundamental assumption behind the two-flip fairness method no longer holds. The method relies on the fact that the probability of getting HT is the product of the probabilities of H on the first flip and T on the second, and similarly for TH. If those two events are no longer guaranteed to have the same probability (because the second flip might be influenced by the first flip’s outcome), then HT and TH would not necessarily occur with identical probabilities.
In practice, if we suspect non-independence, we should perform statistical tests on sample flips to measure correlation or dependence. If strong dependence is detected, we might need to modify the coin or choose a different randomness source that guarantees independence. Alternatively, advanced correction methods (like Markov chain-based algorithms) could be employed, but the simplest solution is to ensure we truly have independent flips before applying the two-flip technique.
How does the expected number of flips scale with the coin’s bias?
If the coin’s probability of landing heads is extremely close to 1 or extremely close to 0, then the outcomes HH and TT will happen much more frequently than HT or TH. Since we discard HH and TT, we end up repeating the experiment potentially many times before obtaining an HT or TH pair. This increases the expected number of total flips needed to generate one fair coin toss.
Can we extend this procedure to generate multiple fair outcomes in fewer flips?
The standard approach is to discard invalid pairs (HH or TT) and use valid pairs (HT or TH) to generate a single fair bit. However, more advanced techniques exist to extract multiple fair bits from sequences of biased coin flips without wasting as many flips. One such approach is to use algorithms based on entropy extraction, such as the von Neumann extractor extended to longer sequences or more complex combinational logic. These methods look for patterns of heads and tails that can be mapped onto one or more fair bits, but they also require more complicated bookkeeping to handle partial bits of entropy from each series of flips.
In a quick two-flip scenario, it is simplest just to do the discard method. But if you want multiple fair bits in fewer flips, you can gather a large string of outcomes and apply specialized randomness extraction techniques. This reduces the fraction of wasted flips but makes the procedure significantly more intricate to implement.
Are there any issues if we have a limited number of coin flips available?
If there is a strict cap on the number of flips, repeatedly discarding pairs of flips (HH or TT) might cause you to run out of flips before producing any fair bits. In scenarios where you must operate within a tight budget of flips, the probability that you fail to produce a fair bit might be substantial if the coin is extremely biased. You could implement a fallback mechanism that approximates fairness if you near the flip limit without having yet produced a valid pair, but that starts to deviate from strict fairness.
In practice, if your bias p is highly skewed and your number of flips is limited, your chance of finishing with zero fair results rises. You might need to either relax your fairness requirement or find a different method of generating randomness, such as using an external source of truly random bits.
How would we handle a scenario where the coin can land on its edge occasionally?
In real-world coin tosses, sometimes a coin might land on its edge or get stuck somewhere so it does not produce a standard heads or tails result. Even if this occurs with a very small probability, it complicates the original assumption that each flip yields a head or tail. One simple solution is to define any edge landing as a “discard and re-flip” scenario, just like HH or TT. Essentially, you treat it as an unusable outcome and keep flipping until you see one of the recognized patterns (HT or TH).
If edge landings happen more frequently, it might reduce efficiency because you are discarding even more flips. There might be approaches to incorporate a third outcome into a more general fairness extraction strategy (e.g., generating random trits), but that goes beyond the simple two-outcome scenario. If you do not carefully handle edge landings, you will introduce additional bias, since ignoring that event (or mislabeling it as H or T) can skew your distribution away from the original assumption.
Can we adapt the method if we only have partial knowledge of the coin’s bias?
Even if you do not know the exact bias p, the von Neumann two-flip method does not require you to measure it. As long as the flips are independent, the probability of HT and TH is always p(1−p). The reason this method works so nicely is that the ratio of probabilities for HT and TH remains 1:1, no matter the value of p. Hence, not knowing p does not impede the standard method of discarding HH or TT. You do not need partial knowledge of p; indeed, you need not know p at all.
Problems arise, however, if your partial knowledge suggests that flips may not be truly i.i.d. For instance, if p might vary from flip to flip or if there is some pattern in heads vs. tails over time, then partial knowledge of p could imply the presence of dependence. You would need additional steps (for example, using subsequences of flips that appear uncorrelated) or more advanced randomness extraction methods to ensure fairness. But the core two-flip technique itself remains valid so long as each flip is identically and independently biased with probability p—irrespective of the numerical value of p.
What if the bias changes slowly over time?
If the bias itself evolves—perhaps due to physical changes in the coin or the flipping mechanism—the two-flip method might still produce fair outcomes, but only under certain assumptions. If the change is slow enough that consecutive flips remain nearly the same bias (and are still i.i.d. at that short timescale), then HT and TH can still be assumed equally likely within that small window of time. For example, if p is 0.6 now and might be 0.60001 later, the difference over two consecutive flips might be negligible.
However, if p drifts substantially within just two flips or the coin’s bias shifts unpredictably, independence and identical distribution might fail. The probability of heads in the first flip could differ significantly from that in the second flip. In that case, the probability of HT and TH would no longer match, invalidating the method. One practical workaround is to sample the coin in small bursts, attempting to ensure that no large environmental or physical change occurs within those bursts. Testing the coin’s outputs over short intervals to confirm stability can help, but once again, if the coin is truly changing in real time in a non-negligible way, you would need a more robust approach to guarantee fairness.