ML Interview Q Series: Recursive Dynamic Programming for Winning Probability in Interrupted Games.
Browse all the Probability Interview Questions here.
Question: Mr. Fermat and Mr. Pascal are playing a game of chance in a café in Paris. The first to win a total of ten games is the overall winner. Each of the two players has the same probability 1/2 to win any given game. The competition is interrupted at a point when Fermat has a wins and Pascal has b wins (with a < 10 and b < 10). What is the probability that Fermat would have been the overall winner if the competition had continued? Show how this problem can be solved by a recursive approach. (Hint: consider that (10 - a) + (10 - b) - 1 additional games would have been played.)
Short Compact solution
Define P(r, s) as the probability that Fermat eventually wins if the current scores are r for Fermat and s for Pascal. We can set up the recursion:
By working backward (for example, via dynamic programming), one finds the required probability as P(a, b).
Comprehensive Explanation
The Recursive Setup
We let r be the number of games Fermat has currently won, and s be the number of games Pascal has currently won. Because each game is won by either Fermat or Pascal with equal probability 1/2, the event of Fermat eventually winning from state (r, s) can be thought of as:
With probability 1/2, Fermat wins the next game. In that case, the new state is (r+1, s).
With probability 1/2, Pascal wins the next game. The new state is then (r, s+1).
Hence, we set up the recursion:
where P(r, s) is the probability that Fermat will ultimately get to 10 total wins before Pascal does, given Fermat currently has r wins and Pascal has s wins.
Boundary Conditions
We know that once either player reaches 10 wins, the contest is over. Therefore:
If r = 10 and s < 10, Fermat has just won the series, so P(10, s) = 1.
If s = 10 and r < 10, Pascal has just won the series, so P(r, 10) = 0.
These two conditions fully characterize the boundary values in this recursive or dynamic programming approach.
Dynamic Programming Approach
To solve for P(a, b), one can fill out a 2D table of dimension 11 x 11, representing all possible states (r, s) for 0 ≤ r ≤ 10 and 0 ≤ s ≤ 10. We typically start from the known boundary states:
P(10, s) = 1 for all s < 10
P(r, 10) = 0 for all r < 10
Then, for r = 9 down to 0 and s = 9 down to 0, we compute P(r, s) using the recursion formula. By the time we reach (a, b), the entry P(a, b) in the table is the probability that Fermat ultimately wins starting from r = a and s = b.
A pseudocode sketch:
def fermat_win_probability(a, b):
# Create a 2D array to store probabilities, indexed by r, s
P = [[0.0 for _ in range(11)] for _ in range(11)]
# Fill boundary conditions
for s in range(10):
P[10][s] = 1.0
for r in range(10):
P[r][10] = 0.0
# Fill table in backward manner
for r in reversed(range(10)): # 9 down to 0
for s in reversed(range(10)): # 9 down to 0
P[r][s] = 0.5 * P[r+1][s] + 0.5 * P[r][s+1]
# The answer is P[a][b]
return P[a][b]
# Example usage:
prob = fermat_win_probability(3, 5)
print(prob) # Probability that Fermat, with 3 wins against Pascal's 5, will reach 10 first
Alternative Closed-Form (Combinatorial) Approach
While the problem specifically requests a recursive solution, it is useful to know a direct combinatorial formula. Because each player has probability 1/2 of winning each game, the probability that Fermat reaches 10 total wins first can be expressed by summing over all the ways Fermat can achieve his remaining (10 - a) victories before Pascal reaches 10:
One known result is that the probability Fermat eventually wins can be computed by considering all possible sequences of the next ( (10 - a) + (10 - b) - 1 ) games in which Fermat gets (10 - a - 1) or fewer of those wins, plus the final winning game. This yields the binomial sum approach often seen in "Gambler’s Ruin" type problems. However, since the question explicitly requires showing a recursive formulation, we typically stick with the P(r, s) recursion and dynamic programming or direct backward recursion to arrive at the answer.
Potential Follow-Up Questions
Could we solve this problem directly without recursion?
Yes, a purely combinatorial approach uses the idea that the series ends exactly when one player hits 10 wins. For instance, with Fermat at a and Pascal at b, one can think of the next ( (10 - a) + (10 - b) - 1 ) games. We sum over all ways Fermat can reach the 10th win first, factoring in that each game is won with probability 1/2. The binomial-coefficient expression emerges by counting all sequences that allow Fermat to get (10 - a) wins before Pascal gets (10 - b) wins.
In practice, the recursive or dynamic programming approach is often simpler to implement in code, since the combinatorial expression (especially with more general probabilities p and 1 - p) can become less intuitive to remember. But both methods are valid and will yield the same result.
What if the probabilities were not 1/2 for each player?
The recursion generalizes easily. If Fermat’s probability of winning any individual game is p (and Pascal’s is 1 - p), we write:
and keep the same boundary conditions:
P(10, s) = 1 for s < 10,
P(r, 10) = 0 for r < 10.
Then a similar dynamic programming approach would work, or one can use a more general binomial-style argument.
Why does the “Hint” mention (10 - a) + (10 - b) - 1?
This number corresponds to the maximum number of additional games that might need to be played before one player reaches 10. More precisely, if Fermat needs (10 - a) more wins and Pascal needs (10 - b) more wins, then the total number of games that can still occur before someone reaches 10 is at most ( (10 - a) + (10 - b) - 1 ), with the final game awarding the 10th win to one of the two players. It is a handy count for framing the combinatorial approach.
How can numerical precision issues appear in real implementations?
In a dynamic programming solution, if the number of steps becomes large, floating-point imprecision might accumulate in languages or frameworks without arbitrary precision arithmetic. In Python, this is typically less of a concern for a small table of size 11 x 11. However, if the logic was extended to a game up to 100 or 1000 wins, we should be mindful of summations of small probabilities and might consider using the decimal
module or other techniques for higher precision.
How would you extend this to more players or different stopping criteria?
You can generalize the same idea to multiple players by defining a function P(n1, n2, ..., nk) to represent the probability that a specific player (say player 1) ends up winning given current scores n1, n2, ..., nk. You then define a recursion based on the probability that each player might win the next game, and you keep boundary conditions where if any player’s total reaches the desired number of wins, that player is declared the winner. The state space grows with the number of players, but the principle remains the same.
All these extensions underscore the broad applicability of the recursive viewpoint.
Below are additional follow-up questions
How does the recursion change if we allow ties or repeated plays for a single game?
In some variations of games, you might have a small probability that a single game ends in a tie or requires a replay (for instance, in certain card games, if both players are tied at the end of a round, they repeat the round). This complicates the recursion because there’s a new outcome to each game besides “Fermat wins” or “Pascal wins.”
Recursion Modification: Let p, q, and r be the probabilities of Fermat winning, Pascal winning, or a tie requiring a replay. Then you would write:
P(r, s) = p * P(r+1, s) + q * P(r, s+1) + r * P(r, s)
But notice the tie outcome leads you back to (r, s), introducing a non-trivial fixpoint condition because you still have the same state after a tie.
Resolving the Tie Probability: You might rearrange that equation to solve for P(r, s) in terms of P(r+1, s) and P(r, s+1). Specifically, you can write:
P(r, s) = [p * P(r+1, s) + q * P(r, s+1)] / [1 - r]
provided r < 1, to eliminate the self-referential loop. The boundary conditions remain the same: if either player reaches 10, that probability is locked at 1 if it’s Fermat or 0 if it’s Pascal.
Pitfall: If r (tie probability) is very close to 1, each game might end in a tie with high probability, leading to practical computational concerns or near-infinite loops in simulation. You must handle numerical stability carefully.
How would you handle a scenario where the score is known to be incorrect or partially uncertain?
Sometimes, the exact values of a and b might be uncertain due to record-keeping errors. For instance, you only know that Fermat has somewhere between a_min and a_max wins so far.
Approach: One might define a probability distribution over the possible scores and compute a weighted average of P(r, s) for all (r, s) that satisfy the constraints. Symbolically, if you have a set of possible states (r_i, s_i) each with some probability alpha_i, you compute:
Weighted Probability = sum over i [ alpha_i * P(r_i, s_i) ].
Edge Case: If your distribution over (r, s) is wide, the final outcome probability might be very different from what it would be if you assume a single pair (a, b). You need to ensure that the alpha_i sum to 1, reflecting a valid probability distribution.
Challenge: If the range for a and b is large, the number of possible states can expand significantly, but you could still use dynamic programming for each possible (r_i, s_i) and aggregate.
How does one incorporate the possibility that some rounds were already played without being recorded?
If you suspect that some number of games have been played and the exact outcome is unknown, you could theoretically be starting from a partial or unknown state (r, s). This is similar to the uncertainty scenario, but with the added complexity that the total unknown games might exceed a small margin.
Method: You can treat the unknown segment as a set of possible final scores (r', s') that might have occurred. Each potential final state (r', s') is assigned a probability. Then for each state you do the recursive calculation from that state forward, effectively turning the unrecorded portion into a partial probability distribution over states. Finally, you average over all possible states weighted by their likelihood.
Pitfall: The combinatorial explosion of possibilities can be large if many games are unrecorded. You must be mindful of computational cost and might opt for approximate methods if the unrecorded portion is very large.
How do we adapt the solution if the game was set to a different total, like first to 15 or first to 100?
The recursion itself is unchanged in structure, only the boundary conditions differ. If the target number of wins is T instead of 10, the boundary conditions become:
P(T, s) = 1 for all s < T,
P(r, T) = 0 for all r < T.
You still use
P(r, s) = p * P(r+1, s) + (1 - p) * P(r, s+1)
for each step, but now r and s each go up to T instead of 10. The logic remains exactly the same, just with a bigger table if T is large.
Pitfall for Large T: If T is very large (like 1000), the naive 2D dynamic programming table with dimension (T+1) x (T+1) becomes large. Memory usage might not be feasible if T is in the tens or hundreds of thousands, so you may need memory-efficient ways (e.g., storing only two rows of the table at a time) or a closed-form approximation.
What if we need not just the probability that Fermat eventually wins, but the expected number of additional games until the winner is determined?
You can define a second function E(r, s) to represent the expected number of additional games needed for the contest to end from state (r, s). Then you can write a similar recursion:
E(r, s) = 1 + p * E(r+1, s) + (1 - p) * E(r, s+1),
since on average you play one more game, and then transition to either (r+1, s) or (r, s+1). The boundary conditions become E(T, s) = 0 if s < T and E(r, T) = 0 if r < T, because no additional games are needed once a player reaches T.
Subtlety: The expectation can grow quite large if p is close to 0.5, since it becomes likely that many games might be played before T is reached. Also, if T is large, E(r, s) can be a big number, and floating-point precision might matter.
In a practical betting scenario, how do you decide payouts if the game is interrupted?
If players put bets on who would eventually win, and you interrupt at a vs. b, you could use the probability P(a, b) as the fair fraction of the pot for Fermat, and 1 - P(a, b) for Pascal.
Pitfall: Sometimes people prefer to see the payout that matches the “expected number of eventual wins.” But in a winner-takes-all setting, it is exactly the probability P(a, b) that matters. Confusions arise if participants expect partial refunds or only a fraction. The fair settlement is to multiply the total pot by P(a, b) (for Fermat) and by (1 - P(a, b)) (for Pascal).
Edge Case: If one player is extremely close to 10 compared to the other, P(a, b) might be near 0 or near 1, which can lead to disputes if the leading player believes they have a sure win. But mathematically, the fair price is still set by the probability that the trailing player can catch up.
How do you efficiently simulate the probabilities for many different pairs (a, b)?
If you want to compute P(a, b) for multiple pairs (a, b), you can compute a single dynamic programming table once and then retrieve results for each pair in O(1) time:
Build a table DP[r][s] for r, s in [0, T].
Use boundary conditions P(T, s)=1, P(r, T)=0.
Fill from T-1 down to 0.
Once the table is built, if you need the probability for (a, b), you just look up DP[a][b].
Pitfall: If T is extremely large (like thousands), building one large 2D table might become computationally expensive. One might consider a more memory-friendly approach (only storing partial arrays) or a closed-form solution if it’s feasible.
What if the game continues but at some point, new conditions or handicaps are introduced?
If the rules change mid-series (e.g., Pascal can skip a turn, or Fermat now has a higher probability p of winning each future game), you need to partition the problem:
For states up to the point of rule change, use the old recursion.
After the rule change, use the updated recursion or probability.
If the rule change is triggered once a certain state is reached, you would have different recurrences for states before that trigger vs. after that trigger.
Challenge: You might end up with a piecewise definition of the dynamic programming table. Each region in (r, s) space could correspond to a different set of rules. In practice, you compute the region-based tables with consistent boundary conditions where they overlap.
How do we handle correlated outcomes between games?
A fair assumption in the original statement is that each game is independent with probability 1/2. If in reality the outcome of each game is correlated (e.g., if a player on a “winning streak” has a higher chance of winning the next game), the Markov property of states (r, s) alone is no longer sufficient.
Method: You might expand the state to include the “streak” or some memory term capturing correlation. For instance, you can define a more general state representation (r, s, streak) and then define transitions. That quickly increases the state space.
Pitfall: Correlations can dramatically increase the complexity of the recursion or dynamic programming table. You might need approximate methods such as Monte Carlo simulations that directly model the correlation and then estimate P(a, b) from repeated simulations.
How do we ensure the recursion or DP solution is efficiently parallelizable?
For large T or many states, it might be desirable to parallelize the calculation of P(r, s).
Parallel DP: Because P(r, s) depends on P(r+1, s) and P(r, s+1), you can fill out the table by diagonals or by blocks. Multiple threads could handle different diagonal slices, ensuring dependencies are respected (each diagonal depends only on the next diagonal).
Memory Bandwidth: The primary constraint might be memory bandwidth if the table is large. You might store only what you need in local caches, but be careful with synchronization.
Edge Case: For smaller T, parallelization might not offer a big benefit because the overhead of threading can exceed the computational cost.