ML Interview Q Series: Probability Analysis: Determining the Winner in an HH vs TH Coin Flip Game

May 18, 2025

Browse all the Probability Interview Questions here.

You and a friend repeatedly flip a fair coin. The game ends the moment either HH or TH emerges in consecutive flips. If HH is the first pattern to appear, you win, and if TH appears first, your friend wins. Determine the probability that you will emerge victorious.

Short Compact solution

One immediate way to see the solution is to note that if at any point a T is flipped, the next H leads to TH, guaranteeing your friend’s victory before you can obtain HH. Consequently, your only chance to get HH and win occurs if the first two tosses are both heads. Since each toss is fair, the probability for two heads in a row from the start is 1/4. Hence, your probability of winning is 1/4, and your friend’s probability of winning is 3/4.

Comprehensive Explanation

Probability Analysis

To understand why the probability is 1/4 that you get HH first, consider the sequence of flips from the beginning:

If the first coin is T, then as soon as you observe a head afterward, the sequence TH appears, and your friend wins. You no longer have any route to achieve HH before TH.
If the first coin is H:
- If the second coin is also H, you immediately have HH, and you win right away.
- If the second coin is T (so we have HT), then again, on the next head, your friend secures TH first.

Hence, the only way for you to win (HH appearing first) is if the first two flips are H followed by H. The probability of getting H on the first toss is 1/2, and H on the second toss is also 1/2. Multiplying these gives 1/4.

All other sequences lead to your friend eventually getting TH first, yielding a 3/4 probability for your friend. This reasoning can also be captured via a Markov chain or state-machine approach, but the shortcut above is much simpler for this particular game.

Markov Chain Perspective (Optional Alternative)

Although the intuitive shortcut is enough to conclude the result, one can build a Markov chain with states:

S (start)
H (just saw a single head)
T (just saw a tail)
HH (absorbing state: you win)
TH (absorbing state: friend wins)

You can analyze the transitions and solve for the probability of eventually reaching HH instead of TH. You will find that the probability of reaching HH starting from S is indeed 1/4. The key transition detail is that once you move to T, any future head creates TH immediately, making it impossible to get HH first.

Follow-Up Question 1: Why does the appearance of T at any point ruin the chance for HH?

When a T is flipped at any point before HH appears, you only need a subsequent head to form TH. That sequence (TH) finalizes the game in your friend’s favor. There is no way to insert two consecutive heads before that TH sequence completes. The coin tosses are sequential, so once T has been observed, “the clock” toward TH only needs one more head, which will come sooner than your opportunity to place two heads consecutively.

Follow-Up Question 2: How would you verify this result via simulation?

A quick simulation can be performed in Python by simulating a large number of coin flips and tracking which sequence, HH or TH, appears first. Here is an example:

import random

def simulate_game(num_simulations=10_000_000):
    wins = 0  # Count for HH wins
    for _ in range(num_simulations):
        # Start flipping until HH or TH appears
        prev_flip = None
        while True:
            flip = random.choice(['H','T'])
            if prev_flip is None:
                prev_flip = flip
                continue

            # Check the pattern of two consecutive flips
            if prev_flip == 'H' and flip == 'H':
                # HH -> You win
                wins += 1
                break
            elif prev_flip == 'T' and flip == 'H':
                # TH -> Friend wins
                break
            else:
                prev_flip = flip

    return wins / num_simulations

prob_estimate = simulate_game()
print(f"Estimated probability of winning (HH first): {prob_estimate}")

Explanation:

We repeat the experiment many times (num_simulations).
For each experiment, we flip a fair coin repeatedly until we see either HH or TH in consecutive flips.
We track how often HH appears first.
Dividing the count of HH-first outcomes by the total simulations gives the empirical probability of winning.
With a large number of simulations, the estimate should converge to about 0.25.

Follow-Up Question 3: Does changing the fairness of the coin affect the conclusion?

If the coin is biased, say with probability p of landing heads and 1−p for tails, the reasoning changes slightly. In that case:

The probability of flipping two heads in a row from the start is

If a tail appears first (probability 1−p), then you are quite likely to see a TH sequence soon after, but it depends on p and the chain of future flips. A Markov chain approach would be more direct for computing the exact probabilities in the biased scenario. However, the fundamental principle remains that once T appears, it’s only one head away from TH. Solving the chain’s transition matrix would be the rigorous way to handle the biased scenario.

Follow-Up Question 4: What if we extended the sequences to longer patterns?

Extending to longer patterns leads to more intricate states. For example, if you wanted the game to end on HHH or TTH, or any other combination, you would construct states that keep track of the most recent few flips. The general method to solve these is via Markov chains or the method of constructing a tree of possible outcomes and noting where absorbing states occur. The number of states grows exponentially with the length of the pattern, but there are systematic ways to handle this (like the Aho–Corasick algorithm for detecting patterns in sequences, which can be adapted to probability computations with Markov chain principles).

Follow-Up Question 5: Are there any practical lessons or pitfalls to note if we tried to implement this logic in real experiments?

Implementations in code should be mindful of random seed and the number of trials when estimating probabilities, to ensure stable estimates.
Floating-point arithmetic can introduce tiny deviations, but they are usually negligible in Python for probabilities of this sort.
For longer or more complex patterns, watch out for how state transitions might overlap (e.g., if your pattern ends in H, and you flip another H, it might also start a new pattern). Properly manage these overlaps in code or in your Markov chain design.
Edge cases arise if the coin is heavily biased towards heads or tails, where either pattern might appear much more frequently, and the probabilities deviate from the fair coin intuition.

All these additional considerations confirm the essential lesson: careful state management, whether in a Markov chain or in code, is crucial to get correct probabilities for pattern-related problems in coin flipping.

Below are additional follow-up questions

What is the expected number of coin flips required for the game to end, and how do you calculate it?

To find the expected number of flips until HH or TH appears, you can analyze the stochastic process step by step:

Let’s define a state machine with:
- State S: The starting point (no flips yet).
- State H: The most recent flip was H (but we do not yet have HH or TH).
- State T: The most recent flip was T (again, no HH or TH yet).
- Absorbing states HH and TH: Where the game ends.
Each non-absorbing state transitions to either H or T with probability 1/2 for a fair coin.

To calculate the expected number of steps from each state until absorption, set up equations for the expected time based on transitions:

From S:
- With probability 1/2, we move to H in one flip.
- With probability 1/2, we move to T in one flip.
From H:
- With probability 1/2, we get another H and end at HH in one additional flip.
- With probability 1/2, we flip T and move to T in one additional flip.
From T:
- With probability 1/2, we flip H and end at TH in one additional flip.
- With probability 1/2, we flip T and stay in T in one additional flip (though this might look repetitive, it keeps you in a state “the last flip was T,” still waiting for a concluding pattern).

Setting up the expectation equations, you solve for the average time to absorption from S. This approach reveals how quickly, on average, you’d expect the game to finish. For a fair coin, the expected value is finite and can be found through linear algebra or by systematically solving the system of equations derived from these transitions. You end up with a small integer or fraction indicating the average flips required (it turns out to be 2.666... or 8/3 for a fair coin in this specific HH vs TH scenario).

A subtle point is that remaining in the T state can prolong the game if you keep flipping additional tails. Each tail keeps you in the same T state but doesn’t end the game until a head follows.

How might correlated coin flips alter the probability of reaching HH vs TH?

In the standard problem, each flip is independent and fair. In reality or in some contrived scenario, flips might exhibit correlation:

Perhaps after getting a head, there’s a higher or lower chance of another head.
Or after a tail, there’s a different probability distribution for the next outcome.

With correlation, the reasoning “once T is seen, it’s easier for TH to form next” might change. If a tail strongly predicts another tail, then TH becomes less likely right away. This modifies all transition probabilities:

From state T, the chance of seeing an H next is not necessarily 1/2 but could be p_T(H).
From state H, the chance of seeing another H might be p_H(H).

To handle correlated flips rigorously, you’d again form a Markov chain with states S, H, T, HH, TH, but the transition probabilities between states would be determined by the correlation structure instead of 1/2. Solving for the probability of eventually reaching HH or TH first would require standard Markov chain analysis (i.e., computing absorbing probabilities via matrix operations). The key pitfall here is incorrectly assuming independence or ignoring the correlation parameters, which leads to incorrect probability estimates. If the correlation is strong, the difference between the probability of HH vs TH might shift significantly compared to 1/4 vs 3/4.

What happens if you allow the possibility of continuing after HH or TH occurs?

In some games, you might not stop the process at the first occurrence of HH or TH but rather keep flipping and record which pattern appears first time, second time, etc., or how frequently each occurs over a long series of flips. This changes the problem:

Instead of focusing on the immediate stopping rule (the first time HH or TH appears), you now track occurrences of these patterns in an ongoing stream.
If you were looking at the first time each pattern occurs but not stopping, you would need to handle overlapping possibilities and perhaps use advanced pattern-matching algorithms (like a finite automaton that can detect all possible pattern overlaps).

A subtle pitfall is ensuring you don’t “double count” sequences. For instance, if the sequence of flips is HHH, you might have both the substring HH in positions 1–2 and also in positions 2–3. Correctly accounting for these overlaps is crucial for accurate probability or expected-time calculations.

How does this problem change if you only look at flips in pairs?

A variant approach might be: rather than looking at a continuous sequence of coin flips, you decide to chunk them into pairs. For instance, each time you flip twice simultaneously (or in isolation from prior flips) and see if that pair is HH or TH. This changes the dynamic:

Now each pair is an independent trial.
You keep flipping pairs until you see HH or TH.
Patterns like “HT” in one pair followed by “HH” in the next are not overlapping.

In this chunked scenario, the first time you see HH in a pair, you win; the first time you see TH, your friend wins. Because each pair is an independent experiment with four equally likely outcomes (HH, HT, TH, TT), you basically check which pair appears first in the sequence of pairs. In that new framing, the probabilities shift:

The chance you get HH in a given pair is 1/4.
The chance your friend gets TH in a given pair is 1/4.
The chance neither occurs is 1/2 (HT or TT).

Over multiple pairs, you might find each side has the same 1/4 chance in any single pair, but you’d analyze who sees the winning pattern first across repeated pair trials. The logic is simpler in some ways but no longer matches the original problem’s consecutive-flip structure. A pitfall is conflating these two interpretations (pairs as separate trials vs. consecutive flips in one ongoing sequence).

Could a partial observation of flips skew the probability?

In a real-world setting, you might miss some coin flips or see the results with uncertainty. For example, if you only observe every second flip or have a noisy channel that might incorrectly record H as T or vice versa:

Missing data: If you skip the outcome of certain flips, you might incorrectly deduce how soon HH or TH occurred.
Noise: If the coin is actually fair, but your measurement device occasionally mistakes a head for a tail, your effective process is not purely HH vs TH with perfect fidelity. The mismatches introduce a probability of classification error for each flip.

These factors can lead you to incorrectly calculate the probability of HH or TH first. You might incorrectly conclude that TH came first because you misread the second head as a tail. The pitfall is not accounting for the reliability of your observation mechanism or sensor when analyzing sequences.

What if more than two players compete with different “target patterns” of consecutive flips?

In a more elaborate scenario, you might have multiple players each waiting for a distinct pattern of length 2. For example:

Player 1 wants HH
Player 2 wants TH
Player 3 wants HT
Player 4 wants TT

You can have a game where the moment any pattern among these four emerges, that respective player wins. Analyzing who is favored depends on how likely each pattern is to appear first. The Markov chain approach must be extended to track the possibility of reaching any one of those patterns. A tricky aspect is that some patterns might partially overlap and might appear in a single coin toss transition. For instance, if you have “H” from a previous flip, then flipping another “H” completes HH for Player 1, but flipping a “T” completes HT for Player 3. The design of the chain and analyzing the absorbing probabilities for each of the four patterns requires systematically enumerating transitions from state to state, factoring in the last flip or last two flips. The pitfall is ignoring that patterns can block or overlap each other in complicated ways.

Could you tweak the payoff structure to incorporate risk/reward trade-offs?

In some versions of coin-flip games, each player might have a different “cost” for playing or a different “payout” if they win. For instance, you could say that if you get HH first, you win $2, but your friend wins $1.50 if TH appears first. How does that change your strategy or sense of “fairness” in the game?

The probability of winning remains 1/4 for you, but the expected payout is $2 × 1/4 = $0.50 for you.
Your friend’s probability of winning is 3/4, with a payout of $1.50 × 3/4 = $1.125.
This means that in expectation, your friend has a larger expected earning.

A pitfall is failing to factor in the size of the payout alongside the probability of success. Even though your pattern is rarer, it might have a higher payoff, or vice versa. Combining probability analysis with reward structure is crucial in game-theoretic interpretations and contract/betting designs.

What if you track how many times HH appears versus how many times TH appears in a fixed number of flips?

Instead of looking at which sequence appears first, you might want to look at which sequence appears more frequently over a predetermined number of flips, say 100 flips:

In this scenario, you scan the entire sequence from flip 1 to flip 100, checking consecutive pairs.
A pair is counted as HH or TH if it appears in positions (i, i+1) for i from 1 to 99.
You then see how many times HH occurred and how many times TH occurred, and compare those counts.

Because each pair overlaps in one flip with the next pair, it’s not as straightforward as counting independent pairs. The overlap can create patterns like HHH that produce two HH occurrences, or THT that produces both TH and HT. Calculating exact probabilities requires enumerating the transitions in a more involved Markov chain or using combinatorial arguments that handle overlapping sequences carefully. A pitfall is double-counting or missing edge overlaps in the sequence.

What if the game continues until both HH and TH have appeared at least once?

Another variation is that the game ends only when you have seen HH at least once and TH at least once. In that setup, each player might get partial points for being the first to their pattern, but you still continue flipping until the other pattern also eventually appears.

This scenario requires you to keep track of whether you have seen HH yet and whether you have seen TH yet.
The game might end quite early if both HH and TH appear within a short span, but occasionally it could drag on if you keep flipping tails without heads or consecutive heads, etc.
The complexity comes in the fact that once one pattern is observed, the partial states you track must reflect that it’s already been “collected.” Only the other pattern remains to be checked.

A pitfall is conflating the event “HH was seen first” with “you eventually see TH.” They can both happen but at different times, and the probability distribution changes if the game requires that both appear. Tracking the order in which they appear and the time it takes for each pattern can be more involved than the original “first to appear” version.

What if the coin is not binary?

It may sound unusual, but suppose we have a three-sided coin or a die, and you define certain sequences for each player to try to achieve first. For instance, in a die-rolling scenario, you might be looking for “two consecutive 6s” vs. “a 4 followed by a 6.” Now the probabilities of each face are 1/6. The logic behind the competition for which pattern appears first remains the same in principle, but each pattern has different likelihoods and transition states. The pitfalls:

You might incorrectly assume that the analysis for a binary coin can be trivially applied to dice.
Overlapping sequences in multi-sided dice outcomes might be even more complex to track.
Carefully enumerating or building a Markov chain with states that reflect the partial patterns is important to avoid mistakes.

No matter how many sides the coin/die has, the concept is that you analyze states that track the relevant recent history that might complete your desired pattern. Once the pattern is complete, that state is absorbing for the respective player.

Are there any cryptographic or security applications of such analysis?

Interestingly, yes. Patterns of bits (heads = 1, tails = 0) appear in streams of random data. In cryptographic protocols, sometimes you need to prove that certain patterns of bits are unlikely to appear. Studying how quickly a pattern might emerge can inform:

How secure random number generators are against adversaries searching for a specific bit pattern.
The expected time or probability that a certain “attack pattern” might happen in a stream of outputs.

A pitfall is to treat all bits as isolated random events if an adversary can manipulate or influence the stream (e.g., they might bias certain flips in subtle ways). Also, real cryptographic PRNGs may not be purely random in a classical sense; they rely on deterministic algorithms seeded with randomness. Understanding their output distribution is trickier than analyzing a fair, memoryless coin toss.

These additional questions underscore how the simple idea of searching for first occurrences of short patterns in coin flips can expand into a vast range of topics including advanced Markov chain analysis, overlapping pattern issues, biased or correlated sequences, partial observation, and even cryptographic considerations.

Rohan's Bytes

Discussion about this post