ML Interview Q Series: Winning Probability and Game Length Analysis Using Geometric Distributions
Browse all the Probability Interview Questions here.
QuestionYou and your friend both draw a random number from {1,2,...,10} at the same time, independently of each other. This process is repeated until you have drawn one of the four numbers {1,2,3,4} or your friend has drawn one of the six numbers {5,6,...,10}. The first player to get one of his marked numbers is the winner, with the convention that you win if both of you get your marked numbers in the same draw.a) What is your probability of winning the game?b) What is the probability mass function of the length of the game?
Short Compact solution
Define X1 as the number of draws required for you to draw from {1,2,3,4} (which occurs with probability p1=4/10 each draw) and X2 as the number of draws required for your friend to draw from {5,...,10} (which occurs with probability p2=6/10 each draw). Both X1 and X2 follow geometric distributions (with parameters p1 and p2 respectively).
Your probability of winning is P(X1 <= X2). Exploiting the fact that X1 and X2 are independent geometric random variables and that in the event X1 = X2 you still win, it can be shown that
Plugging in p1=0.4 and p2=0.6 gives approximately 0.5263.
For the length of the game, define L = min(X1, X2). One can show that L itself is a geometric random variable with parameter p = p1 + p2 - p1 p2 = 0.76. Hence,
Comprehensive Explanation
Understanding the winning probability:
At each draw, you succeed in drawing one of your four numbers with probability p1=4/10, and your friend succeeds in drawing one of his six numbers with probability p2=6/10. Let X1 be the (random) draw on which you get your first success, and X2 be the (random) draw on which your friend gets his first success. Because the draws are independent for you and your friend:
• X1 ~ Geometric(p1). This means P(X1 = j) = p1 (1 - p1)^(j-1). • X2 ~ Geometric(p2). This means P(X2 = j) = p2 (1 - p2)^(j-1).
You win if X1 < X2 or if X1 = X2 (tie). Thus,
P(X1 <= X2) = sum over j from 1 to infinity of P(X1 = j and X2 >= j).
Because X2 >= j means X2 > j-1, we sum over all j:
P(X1 = j) = p1 (1 - p1)^(j-1).
Independently, the probability that X2 >= j is (1 - p2)^(j-1). Hence,
P(X1 <= X2) = sum_{j=1 to infinity} [ p1 (1 - p1)^(j-1) * (1 - p2)^(j-1) ] = p1 * sum_{j=1 to infinity} [ ( (1 - p1)(1 - p2) )^(j-1) ].
Because sum_{j=1 to infinity} r^(j-1) = 1 / (1 - r) for |r|<1, let r = (1 - p1)(1 - p2). Then:
P(X1 <= X2) = p1 / [1 - (1 - p1)(1 - p2)].
Observe that:
1 - (1 - p1)(1 - p2) = 1 - (1 - p1 - p2 + p1 p2) = p1 + p2 - p1 p2.
Therefore:
When p1=0.4 and p2=0.6, this evaluates to 0.5263, which is slightly more than 1/2. That 0.5263 > 0.5 might seem counterintuitive initially because p2>p1, but your advantage is explained by the fact that a tie (i.e., X1 = X2) is awarded to you.
Probability mass function of the length of the game:
The game ends at the draw number L = min(X1, X2). We want P(L = l). We note:
P(L > l) = P(X1 > l and X2 > l) = (1 - p1)^l (1 - p2)^l = ( (1 - p1)(1 - p2) )^l.
Denote p = p1 + p2 - p1 p2 for simplicity. Then 1 - p = (1 - p1)(1 - p2). Hence,
P(L > l) = (1 - p)^l.
Therefore,
P(L = l) = P(L > l - 1) - P(L > l) = (1 - p)^(l - 1) - (1 - p)^l = p (1 - p)^(l - 1).
Thus L is geometric with parameter p = p1 + p2 - p1 p2 = 0.76. Consequently:
Because p=0.76, the probability that the game ends on the l-th draw is 0.76 * 0.24^(l-1).
Practical insight and implementation idea:
If one wants to confirm these theoretical results with a simulation, one could run a large number of trials programmatically, count the proportion of times you win, and empirically estimate the distribution of the length of the game. For instance, in Python:
import random
def simulate(num_trials=10_000_000):
p1 = 0.4 # Probability you draw from {1,2,3,4}
p2 = 0.6 # Probability friend draws from {5,...,10}
win_count = 0
lengths = []
for _ in range(num_trials):
length = 0
while True:
length += 1
# Draw outcomes
you_succeed = (random.random() < p1)
friend_succeeds = (random.random() < p2)
if you_succeed and friend_succeeds:
# Tie => you win
win_count += 1
lengths.append(length)
break
elif you_succeed:
win_count += 1
lengths.append(length)
break
elif friend_succeeds:
lengths.append(length)
break
# otherwise continue
empirical_win_prob = win_count / num_trials
return empirical_win_prob, lengths
empirical_pwin, length_samples = simulate()
print("Empirical Probability of Winning:", empirical_pwin)
# One can also plot or analyze distribution of length_samples
Over many trials, this simulation should confirm that you win with probability around 0.5263, and that the length distribution follows a geometric shape with parameter around 0.76.
Follow-up Questions
1) How would the probability of winning change if the tie were awarded to your friend instead?
In that scenario, you only win if X1 < X2 strictly, meaning if you both succeed on the same draw, your friend wins. Then we want P(X1 < X2) = P(X1 <= X2) - P(X1 = X2). The difference is exactly the probability that X1 = X2. Since X1 and X2 are independent geometric distributions, one can work out:
P(X1 = X2) = sum_{j=1 to infinity} p1 (1 - p1)^(j-1) * p2 (1 - p2)^(j-1}.
This sum, combined with the original expression for P(X1 <= X2), shows that awarding the tie to your friend simply subtracts that tie probability from your advantage, thus reducing your winning probability below 0.5 when p1 < p2.
2) What if the numbers {1,...,4} and {5,...,10} were changed? How do we generalize?
If you had k1 winning numbers out of n total, then p1 = k1/n. If your friend had k2 winning numbers out of n total, then p2 = k2/n, assuming no overlap in the sets. By the same reasoning, your probability of winning (with tie in your favor) is:
The parameter that governs the geometric distribution for the length is p = p1 + p2 - p1 p2. Thus the results generalize in a straightforward way.
3) Is the expected length of the game also easy to compute?
Yes. For a geometric random variable L with parameter p, the expectation E[L] is 1/p. Here p = p1 + p2 - p1 p2. Numerically in our example, p=0.76, so E[L] = 1/0.76 ≈ 1.3158 draws on average for the game to end.
4) What about real-world applications of this setup?
This scenario appears in many “race to success” processes, such as network reliability (where p1 might be the chance of a system reporting success), or in certain forms of repeated bidding or repeated exploration tasks. One must always consider carefully who wins if both succeed simultaneously, because that tie-break rule can have a noticeable effect on the overall probabilities.
Below are additional follow-up questions
1) What if p1 + p2 < 1? Could the game theoretically never end?
When p1 + p2 < 1, this means in each draw, there is a positive probability (specifically 1 - p1 - p2) that neither you nor your friend obtains a marked number. The key question is whether the game can continue indefinitely without either player drawing their respective winning numbers.
In theory, for geometric-like processes with parameter p = p1 + p2 - p1 p2, as long as p is strictly greater than 0, the probability that both players fail indefinitely is zero in the limit. However, if either p1=0 or p2=0 (or both), then clearly it might be impossible for one player to ever succeed, or for both to succeed at all. That would create a degenerate situation where the game might never end.
In practical terms, if 0 < p1 < 1 and 0 < p2 < 1, the probability that the game continues past l draws is (1 - p)^l, so as l approaches infinity, that probability goes to 0. Therefore, it is almost sure that the game will eventually end. Even if p1 + p2 < 1, as long as both are nonzero, the game will end with probability 1, but it might take on average more draws compared to the case p1 + p2 > 1.
A potential pitfall is to assume “since p1 + p2 < 1, maybe the game can go on forever.” In practice, for an infinite sequence of independent draws, the event of never seeing a success for either player has probability zero if p>0. Yet in finite simulations, one might occasionally see extremely long sequences of no successes, which can be mistaken for an “infinite” game. But mathematically, with p>0, the expected game length is finite at 1/p, and the probability that the game literally never ends is zero.
2) What if the draws are not i.i.d. across rounds or across players?
Our derivation of a geometric distribution for X1 and X2 depends on each draw being an identical Bernoulli-like trial with probabilities p1 and p2, and on independence both within each player’s draws and across the two players. If these assumptions are broken—for example, if the probability p1 changes from round to round or depends on what was drawn in the previous round—then X1 and X2 would no longer be straightforward geometric random variables.
In a setting where probabilities shift every round, one typically needs a more general Markov chain or a more complex stage-based analysis to capture how the success probability evolves. You can still compute the probability distribution of the time to first success, but you must account for the changing probabilities at each step. Likewise, if the draws between you and your friend are correlated (for instance, if you are more likely to succeed in rounds where your friend fails), you lose the independence assumption that makes the simple product forms for P(X1 = j and X2 >= j) valid. Such correlations drastically complicate the analysis, requiring joint probability distributions for each round.
3) Could either you or your friend adopt a strategy that changes their effective probability p1 or p2 dynamically over time?
In many real-world settings, participants have agency to adjust how they play each round. For example, if you suspect your friend is “less likely” or “more likely” to succeed in a particular round, you might try a different approach (though in a literal random draw from 1..10, there's little strategy to be had). However, if we generalize to scenarios like repeated tasks with modifiable parameters, p1 and p2 become decision-dependent or might shift based on the outcomes of prior draws.
Analytically, you’d need to model each player’s strategy and how it influences their success probability each round. Techniques from stochastic processes and game theory can be applied—particularly if each side's move can alter the other's success chance in the next step. The key pitfall: the memoryless simplicity of geometric distributions breaks down. Instead, you might have time-varying probabilities p1(l), p2(l) in the l-th round, which necessitates more complex dynamic programming or Markov Decision Processes to fully characterize the probability of winning.
4) What if the sets of winning numbers for you and your friend overlap?
In the original scenario, your set {1,2,3,4} and your friend’s set {5,...,10} are disjoint. But suppose they share some elements, for example, if your set is {1,2,3,4} and your friend’s set is {3,4,5,6,7,8,9,10}. Then the event that you both succeed on the same draw might become more or less likely, depending on which numbers overlap. Specifically, you might share the winning numbers {3,4}, so whenever {3,4} is drawn, both you and your friend succeed in the same round. This raises a unique challenge: the probability that “both succeed” is no longer simply p1 * p2, because the events “You succeed” and “Your friend succeeds” are no longer independent—there is now a non-zero intersection in the sets of numbers that yield success.
A correct analysis would require:
• Probability p_you = Probability that the number belongs to your set only. • Probability p_friend = Probability that the number belongs to your friend’s set only. • Probability p_both = Probability that the number is in the overlap.
Then each round, the probabilities for the outcomes (only you succeed, only your friend succeeds, both succeed, no one succeeds) are derived from p_you, p_friend, p_both. The tie event (both succeed) has probability p_both each draw. You’d have to recast the distribution for X1 and X2 carefully to reflect this new dependence structure.
5) How would the analysis change if each draw incurred a different cost or time penalty for you and your friend?
In a purely mathematical sense of “time to first success,” adding different costs does not alter the probability distribution of the random time L = min(X1, X2). However, if you incorporate an economic or time-based factor—such as you pay c1 for each draw, your friend pays c2 for each draw, or the game ends if your budget is depleted—then the problem extends beyond simple geometric random variables.
One scenario is: “If your cost reaches a threshold, you can no longer play, so you lose by forfeit.” Another is “You want to minimize your expected total cost subject to having a high chance of winning.” You might see situations where you reduce your probability p1 (if that’s within your control) to save cost, or your friend invests more heavily (increasing p2) early on to shorten the game. This can lead to non-trivial strategic behavior in real-world contexts like auctions or repeated investment scenarios. A pitfall is to conflate cost with probability: the original formula for P(X1 <= X2) remains correct for purely random draws, but once cost modifies your decisions or imposes constraints, the distribution of successes no longer remains the same.
6) What if there is a maximum number of draws M, and if neither player has succeeded by that time, the game is a draw?
In this truncated scenario, the length of the game L is min(X1, X2, M+1), with the event “nobody wins by the M-th draw” sending you to the (M+1)-th “draw” that indicates a forced termination with no winner. Thus the random variable L can have possible values 1, 2, ..., M, or M+1 (meaning “forced end without success”).
You can still compute:
• Probability that the game ends on or before the M-th draw = 1 - [ (1 - p1)^M (1 - p2)^M ]. • Probability it ends on the exact l-th draw for l <= M can be derived similarly to the standard geometric scenario, but truncated at M. • Probability of a forced draw is (1 - p1)^M (1 - p2)^M, because that is the event no successes occur in the first M draws.
This modifies the distribution and yields a final outcome that includes “no winner” with some positive probability. In practice, this occurs in real-life scenarios where attempts are limited (e.g., a maximum number of tries). A pitfall might be ignoring the possibility that neither side can succeed within the allotted draws.
7) How can we generalize to more than two players?
If instead of two players, there are k players, each with their own success probability p_i in each draw, the time to completion is L = min(X1, X2, ..., Xk). Under independence (and disjoint sets for each player, or at least well-defined probabilities for each to succeed per draw), the probability that nobody wins in a given draw is the product of the probabilities that each player fails that round. For disjoint sets, that product is the product over i of (1 - p_i). As soon as at least one player gets a success, the game ends. In that sense:
• L is geometric with parameter p = 1 - ∏(1 - p_i). • The probability that a particular player i is the winner depends not only on p_i but also on how ties are handled if multiple players succeed in the same draw.
Tie-breaking rules get more complicated with more than two players: you might have a hierarchy of who wins if multiple players succeed simultaneously. Or it might be a scenario that if two or more players succeed on the same draw, the game continues until there is a unique success. Each tie-breaking convention significantly alters the winning probabilities. Failing to properly define and handle simultaneous successes among multiple players is a potential pitfall in multi-player generalizations.
8) If we wanted to track not just the length of the game but also the total number of successes each player got along the way, how would that be approached?
In the basic game, it ends immediately at the first success for either side, so one typically never sees a second success. However, if a variation allowed the game to continue for a fixed number of total draws—perhaps awarding points for each success—and one still compares who first reached a certain threshold of successes, the distribution becomes more complex. You would then keep track of how many times each player has succeeded up to each draw.
A typical approach is to model (X1, X2) as a discrete-time Markov chain on the states that represent the count of each player’s successes so far. The chain terminates upon a state that meets a “winning condition” (for example, your successes >= 1 or your friend’s successes >= 1 for the original version). If the objective is to count total successes up to a maximum number of draws, the process runs for those draws, and the distribution of (final success counts) can be calculated by enumerating all possible paths or using matrix-based Markov chain computations. The pitfall is trying to apply simpler memoryless arguments: once you allow the possibility of multiple successes, you lose the straightforward geometric structure, and a more general combinatorial or Markov chain approach is required.