ML Interview Q Series: Generating Unbiased Bits from a Biased Coin Using Flip Pairing

May 26, 2025

Browse all the Probability Interview Questions here.

You have a coin that doesn’t land heads or tails with equal probability. How would you create a procedure, using only flips of that biased coin, to produce uniformly distributed 0s and 1s?

Comprehensive Explanation

A fundamental approach to solve this problem is commonly referred to as the von Neumann extractor, which can convert a biased source into an unbiased one. The overall idea is straightforward:

You flip the biased coin in pairs:

If the result of two consecutive flips is heads–tails, you output 1.
If the result is tails–heads, you output 0.
If the result is heads–heads or tails–tails, you discard that pair and flip again.
The rationale behind this is that the probability of obtaining heads–tails is p * (1 - p), and tails–heads is (1 - p) * p. Both of these events share the same probability of occurring, so conditional on observing either heads–tails or tails–heads, each occurs with probability exactly 1/2. By ignoring pairs of outcomes that are the same (heads–heads or tails–tails), we neutralize the effect of the bias.
The cost of this procedure is that some pairs are thrown away, which can reduce efficiency. However, this discarding ensures the final bits we collect are genuinely unbiased.
Expected Number of Coin Tosses
The probability of either heads–tails or tails–heads is 2 p (1 - p). Hence, on average, we get one useful bit for every 1 / [2 p (1 - p)] pairs of flips. Each pair requires 2 coin tosses, so the expected number of tosses to get one unbiased bit is 1 / [p (1 - p)]. In more explicit form:
where E[T] is the expected number of biased coin tosses required to produce a single unbiased bit, and p is the probability of landing heads for the biased coin.
When p is very close to 0 or 1, efficiency suffers because you frequently get heads–heads or tails–tails. Despite this, the correctness of the extracted bits being unbiased holds in all cases.
Possible Implementation in Python
```
import random

def biased_coin_flip(p):
    # Simulate a biased coin that returns 1 (heads) with probability p
    return 1 if random.random() < p else 0

def generate_unbiased_bit(p):
    """
    Uses two biased coin flips to generate one unbiased bit.
    Discards the outcome if both flips are the same.
    """
    while True:
        flip1 = biased_coin_flip(p)
        flip2 = biased_coin_flip(p)
        if flip1 == 0 and flip2 == 1:
            return 0  # Tails -> Heads
        elif flip1 == 1 and flip2 == 0:
            return 1  # Heads -> Tails
        # If flip1 == flip2 (HH or TT), we discard and try again.

# Example: Generating a sequence of 10 unbiased bits.
unbiased_sequence = [generate_unbiased_bit(0.8) for _ in range(10)]
print(unbiased_sequence)
```
In practice, if p is extremely skewed, the process might waste a large number of coin flips before yielding an unbiased bit.

What Happens If We Need More Than One Bit?

The procedure is repeated as many times as needed. Each unbiased bit is derived from separate pairs (or multiple discards until a valid pair is found). Generating multiple bits means executing the same logic repeatedly. This process remains correct no matter how many bits are produced, as long as each bit is derived from an independent and valid pair of flips.

Are There More Efficient Algorithms?

Yes, there are more advanced algorithms (such as those using multiple flips at a time) that can yield higher efficiency, but they can be more complex to implement and analyze. The von Neumann approach is conceptually simple and guarantees correctness but may not be the most efficient for certain values of p.

How Does This Generalize to Larger Alphabets?

If you had a biased random source of symbols from a larger set, you can adopt a similar principle: pair up outcomes, discard if they are the same, and treat them as binary outcomes if they differ. The complexity increases with more possible symbols, but the core idea of pairing and discarding identical outcomes while mapping different ones to 0 or 1 remains valid.

What If We Wanted to Use This in Real-Time?

If results are produced in a stream, you would still collect outcomes in pairs. Whenever a pair is identical, you discard and wait for the next pair. While this leads to potentially unpredictable waiting times, every valid pair continues to yield an unbiased bit.

Could Rounding Errors Affect The Unbiasedness?

No. Once you decide to keep only those pairs that differ (heads–tails or tails–heads), the probability of heads–tails is p * (1 - p) and the probability of tails–heads is (1 - p) * p. Given that both have identical probability, conditioned on these two scenarios, the resulting bit is guaranteed to be 0 or 1 with probability 1/2 each. Thus, no rounding or floating-point approximation can affect the fundamental correctness that one event is as likely as the other.

How Could We Check That the Output is Unbiased?

One practical way is to perform hypothesis testing on the generated sequence to see if the bits appear uniform. You could use a statistical test (such as a chi-square test) on the frequency of 0s and 1s. More sophisticated randomness tests (like the NIST test suite) can further confirm that the generated bits behave according to uniform distribution properties.

What if p is Unknown?

Interestingly, the von Neumann extractor does not require knowing the actual bias probability. You simply apply the rule to pair flips, discard matching pairs, and use the differing pairs for unbiased bits. The procedure is self-correcting, making it robust even if p changes over time or remains unknown.

Below are additional follow-up questions

What if the coin tosses are not truly independent?

Even though the classical von Neumann extractor requires that each coin toss be independent, in real scenarios a biased coin could exhibit minor dependencies across flips. If the correlation between consecutive flips is small, the procedure often still approximately holds because:

When we pair flips, a mild correlation might slightly shift the probability of heads–tails vs. tails–heads, but in many practical cases this shift is negligible.
Strong correlation, however, can break the assumption that heads–tails occurs with the same probability as tails–heads. In such a situation, the extracted bits might not be perfectly unbiased. In practical terms, one could empirically measure correlation or run additional randomness tests. If correlations are discovered to be significant, more advanced de-biasing methods (potentially requiring blocks of flips rather than pairs) might be needed.

How do we handle extremely high or extremely low bias values (for example, p = 0.9999)?

When p is very close to 1 or 0, the approach can yield very few usable pairs:

In the extreme, almost every pair will be heads–heads or tails–tails.
The probability of heads–tails or tails–heads drops significantly, so we may wait a large number of flips before obtaining a valid pair. If we require a large volume of unbiased bits under heavy bias, the von Neumann algorithm may become impractical. One potential workaround is to gather many flips at once and apply more sophisticated data compression or advanced extractors that leverage the known bias distribution. However, even then, the fundamental premise of needing to discard a high fraction of pairs persists, so performance might still be poor.

Could we parallelize the von Neumann approach?

Yes, if you can flip many biased coins (or one coin rapidly) in parallel, you can distribute the procedure across multiple processes:

Each process pairs its own flips independently of the others.
Whenever a process observes heads–tails or tails–heads, it outputs an unbiased bit.
All the outputs from the processes can be combined into a global pool of unbiased bits. Potential pitfalls:
Ensuring that each process’s coin tosses do not overlap in the random stream (e.g., seed management in simulations or using physically separate coin tosses in the real world).
Any correlation or shared state across parallel processes can reintroduce bias. So, isolation of parallel streams is crucial.

Is there a theoretical limit on how many unbiased bits can be generated?

Yes, there is a fundamental information-theoretic limit. Each biased coin flip provides at most a certain amount of entropy. If p is known, the Shannon entropy per flip can be computed in bits with the formula -[p log2(p) + (1-p) log2(1-p)]. The von Neumann scheme does not always extract the maximum possible entropy but guarantees correctness and simplicity. More advanced extractors can come closer to the theoretical maximum but often at the cost of greater algorithmic complexity.

What if the bias can change dynamically over time?

The classic von Neumann approach:

Does not need to know p, nor does it need p to be constant. It only relies on the fact that for each pair, the probability of (heads–tails) equals (1 - p(t)) p(t), and (tails–heads) equals p(t) (1 - p(t)) at the moment of flipping.
If the coin’s bias drifts slowly or changes abruptly, pairs that are the same (heads–heads or tails–tails) are still discarded, and pairs that differ are still equally likely in the moment of flipping. Potential edge case:
If the coin flips are correlated in time or the bias changes so wildly that certain sequences become more or less probable in a short window, the pairing logic may need deeper statistical verification. However, typically, the same argument that (heads–tails) and (tails–heads) have equal probabilities in an instantaneous sense continues to hold as long as the two flips remain identically distributed but not correlated.

How might we validate that the output is truly unbiased in practice?

One practical strategy is to run the final bitstream through statistical randomness tests:

A basic test is to check if the proportion of 0s vs. 1s is close to 50%.
More advanced tests include frequency tests within blocks, run-length tests, autocorrelation tests, and others in the NIST suite. If the coin is extremely biased or if dependencies exist in flips, these tests could reveal anomalies in the output distribution. In that case, further scrutiny of the flipping process or adopting more advanced correction techniques may be necessary.

Could hardware imperfections break this algorithm?

If you’re using a physical process for biased coin flips—like a sensor or a physical noise source—there can be hardware-specific anomalies:

Sensor saturation might cause an apparent “heads” nearly every time.
Voltage spikes could produce correlated runs. When implementing the von Neumann approach, you must ensure that each flip is adequately separated in time or measurement so that genuine independence is preserved. Persistent hardware issues leading to correlated or repeating outcomes can undermine the assumption that the pairs (heads–heads or tails–tails) are fairly detected. Careful calibration, monitoring, and device-level testing can mitigate these pitfalls.

If storage or bandwidth is limited, how do we handle the discard step?

The discard step implies that sometimes pairs of flips yield no output bit. Potential storage/bandwidth considerations include:

When flips are being read in a continuous stream, discarding just means not appending any output for certain pairs. We don’t need to store all flips if we can process them on the fly.
If we must store flips before processing (for example, in a large batch), discarding might still be memory intensive if p is near 0 or 1 because many pairs might be useless. One practical method is streaming: we fetch the latest two flips, check if they differ, and only produce a bit if they do. This avoids large-scale storage since pairs are processed in a rolling window.

How do we combine multiple bits once they are generated?

In typical usage, each valid pair yields a single unbiased bit. These bits can be concatenated into a binary sequence or even assembled into bytes for practical purposes (e.g., generating random IDs, keys, or seeds). Concatenating the bits does not introduce bias as long as each bit is independently derived from the von Neumann procedure. Ensuring no subtle correlation between pairs is crucial; in practice, the procedure itself guarantees independence among valid pairs.

What if the time to generate bits is random and we need a steady rate?

Since the algorithm discards some pairs, the generation rate is not constant. If a system requires a steady bit rate, one approach is to buffer the generated bits:

Continue flipping and storing the unbiased bits in a buffer until they are consumed downstream.
If consumption exceeds the average production, you might deplete the buffer and fail to meet real-time demand. In scenarios demanding consistent throughput (e.g., real-time cryptographic key generation), additional strategies—like using partial random bits in combination with pseudo-random generators—are often adopted. The unbiased bits can feed the seed of a pseudo-random generator, which then provides a steady-rate output, albeit with cryptographic strength heavily reliant on the extracted random seed.

Rohan's Bytes

Discussion about this post