ML Interview Q Series: Baseball Series Winner & Length: Joint Probability Mass Function Calculation

May 28, 2025

Browse all the Probability Interview Questions here.

In the final of the World Series Baseball, two teams play a series consisting of at most seven games until one of the two teams has won four games. Two unevenly matched teams are pitted against each other, and the probability that the weaker team will win any given game is 0.45. Let X be 1 if the stronger team is the overall winner, and 0 otherwise. The random variable Y is defined as the number of games the final will take. What is the joint probability mass function of X and Y?

Connect with me on X (Twitter)

Short Compact solution

For k = 4, 5, 6, 7, the joint PMF is:

Comprehensive Explanation

Overview of the scenario

We have two teams competing in a best-of-seven series, which ends as soon as one team reaches four total wins. The stronger team has probability 0.55 of winning a single game, while the weaker team has probability 0.45 of winning a single game. Define:

X = 1 if the stronger team is the overall winner; 0 if the weaker team is the winner.
Y is the total number of games played in the series, which can be 4, 5, 6, or 7.

Reasoning behind the PMF formula

Number of games (Y=k) The possible values of k are 4, 5, 6, or 7, because a team might achieve the necessary four wins as early as game 4 or as late as game 7.
Condition for X=1 (stronger team wins) If the stronger team wins the series in exactly k games, then:
- The stronger team has exactly 3 wins in the first k-1 games.
- The stronger team wins the k-th game (the series-ending game).
Counting how many ways the first k-1 games can include exactly 3 wins by the stronger team The number of ways to choose which 3 out of the first k-1 games are won by the stronger team is (k-1 choose 3).
Probability of each specific arrangement In each arrangement for X=1:
- The probability of 3 wins by the stronger team in the first k-1 games is (0.55)^3 (0.45)^( (k-1) - 3 ).
- The final (k-th) game is also a win by the stronger team, so we multiply by 0.55. Combining these exponents: (0.55)^3 * 0.55^(1) = (0.55)^4 in total for the 4 wins. Similarly, we have (k-4) total losses for the stronger team in that k-game series, which leads to (0.45)^(k-4). Hence for the stronger team to wrap up the series at game k:
(k-1 choose 3) * (0.55)^4 * (0.45)^(k-4).
Condition for X=0 (weaker team wins) By similar logic, for the weaker team to win in exactly k games:
- The weaker team has exactly 3 wins in the first k-1 games.
- The weaker team wins the k-th game. The number of ways is again (k-1 choose 3), but now each arrangement’s probability is (0.45)^4 (0.55)^(k-4).
Consolidating the final PMF Putting it all together, we obtain:
- P(X=1, Y=k) = (k-1 choose 3) (0.55)^4 (0.45)^(k-4), for k=4,...,7.
- P(X=0, Y=k) = (k-1 choose 3) (0.45)^4 (0.55)^(k-4), for k=4,...,7.

Why the binomial coefficient (k-1 choose 3)?

The key insight is that the series ends exactly on the k-th game. This means whichever team wins the series must have exactly 3 wins in the first k-1 games and then secure the 4th win on the k-th game. The factor (k-1 choose 3) counts the distinct ways of distributing exactly 3 winning games among those first k-1 games.

Why does k range from 4 to 7?

The best-of-seven format ends as soon as one team has 4 total wins. The earliest this can happen is when one team wins 4-0 (k=4). The latest is when the teams tie 3-3 after 6 games and the 7th game decides the final outcome (k=7).

Relationship to classic Bernoulli processes

One can view each game as a Bernoulli trial with probability p=0.55 of the stronger team winning, but the process stops as soon as a team collects 4 wins. This is essentially a negative binomial concept, though truncated because we only allow up to 7 total games.

Possible Follow-up Questions

1) Why is the exponent of 0.55 exactly 4 in P(X=1, Y=k)?

Because to win the series in k games, the stronger team must have exactly 4 wins in total. One of those wins is specifically the last game (the k-th game). In the first k-1 games, the team must have exactly 3 wins, so 3 plus 1 (the final) gives 4 total. Hence we multiply 0.55^4 for the four wins.

2) How do we interpret the factor (k-1 choose 3)?

It counts all the possible ways in which the stronger team can distribute its 3 wins within the first k-1 games. For instance, if k=6, then across the first 5 games, you choose which 3 games were won by the stronger team. There are (5 choose 3) ways to do that.

3) Could we have an alternative expression for the probability if we considered the weaker team’s perspective?

Yes. If you focus on the weaker team, you get an analogous expression for P(X=0, Y=k), using 4 wins by the weaker team (probability 0.45 each) and k-4 losses (probability 0.55 each). The binomial coefficient remains (k-1 choose 3), but the roles of 0.55 and 0.45 flip.

4) Are there any boundary conditions or edge cases not covered?

We do not consider k < 4, because you cannot finish a best-of-seven series in fewer than 4 games. Also, once a team has 4 wins, no further games are played, so k cannot exceed 7.

5) How could we simulate this in Python?

A simple Monte Carlo simulation could involve repeatedly simulating a best-of-seven series. In each series, keep playing until one team reaches 4 wins, track which game ended the series, and record whether the stronger team was the winner. You then empirically estimate P(X=1, Y=k) or P(X=0, Y=k) by counting the outcomes over many trials.

import random
import collections

def simulate_series(prob_stronger=0.55, trials=10_000_00):
    outcomes = collections.Counter()
    for _ in range(trials):
        stronger_wins = 0
        weaker_wins = 0
        game_count = 0

        while stronger_wins < 4 and weaker_wins < 4:
            game_count += 1
            if random.random() < prob_stronger:
                stronger_wins += 1
            else:
                weaker_wins += 1

        # X=1 if stronger team wins, 0 otherwise
        X = 1 if stronger_wins == 4 else 0
        # Y=game_count
        Y = game_count
        outcomes[(X, Y)] += 1

    # Convert counts to probabilities
    for k in [4,5,6,7]:
        print(f"P(X=1, Y={k}) ~ {outcomes[(1,k)]/trials:.5f}")
        print(f"P(X=0, Y={k}) ~ {outcomes[(0,k)]/trials:.5f}")

simulate_series()

Such a simulation aligns well with the derived closed-form formulas, but the analytical expressions are more precise for an exact probability.

Below are additional follow-up questions

1) How would the joint PMF change if the probability of the stronger team winning each game varied from game to game?

In real sporting events, conditions such as home-field advantage, injuries, or fatigue might alter the probability from one game to the next. Instead of having a single constant probability p = 0.55, you might have p1, p2, p3, … for each game in the series.

To generalize the PMF, you would no longer treat each game outcome as a Bernoulli trial with the same fixed probability. Instead, each individual path leading to a series conclusion (e.g., specific wins and losses in a particular order) would have a product of the distinct probabilities for each game. The standard binomial coefficient (k-1 choose 3) approach would be replaced by enumerating all the ways the series can reach 3-3 or 3-other in the first k-1 games with variable probabilities, then multiply by the probability of the final game outcome. This approach still counts the number of ways to choose when the stronger team wins, but you must carefully multiply the appropriate pi values for each chosen “win” or “loss.” You would effectively sum over all permutations that produce exactly 3 wins for the stronger team in the first k-1 games, and then multiply by the probability of the k-th game’s result.

A practical pitfall is that one might naively continue to use a single binomial coefficient times (p^wins)(1-p)^losses formula, which is no longer valid if each game has a different probability. The correct solution needs a more manual enumeration or a dynamic programming approach to track partial sums of probabilities through the series.

2) How would you compute the expected value of Y (the number of games in the series) from this PMF?

The expected number of games E[Y] is found by summing k * P(Y = k) over all k in {4, 5, 6, 7}. Formally:

E[Y] = 4 * P(Y=4) + 5 * P(Y=5) + 6 * P(Y=6) + 7 * P(Y=7).

You can obtain P(Y=k) by summing over both possibilities for X (X=1 or X=0):

P(Y=k) = P(X=1, Y=k) + P(X=0, Y=k).

Then you compute each probability from the joint PMF, plug them into the above expression, and you get the expected series length. A subtlety is that sometimes you might accidentally sum only one side (e.g., focusing on the stronger team), so be sure you add both X=1 and X=0 probabilities for the complete distribution. Also, always check that the probabilities across k=4 to 7 sum to 1, to confirm there are no missing states.

3) How do you verify that the joint PMF sums to 1 across all valid (X, Y) pairs?

When you have P(X=1, Y=k) and P(X=0, Y=k) for k in {4, 5, 6, 7}, you can verify:

Sum_{k=4 to 7} [P(X=1, Y=k) + P(X=0, Y=k)] = 1.

In other words, you exhaustively include all ways a best-of-seven series can end: either the stronger team wins (X=1) in 4, 5, 6, or 7 games, or the weaker team does (X=0) in 4, 5, 6, or 7 games. A potential pitfall is forgetting to include all four possible values of k or misplacing an exponent, which can cause the sum to deviate slightly from 1 due to arithmetic or combinatorial errors. If the sum is not exactly 1 in your derivation or code, it’s an immediate sign of a mistake, such as miscounting or a missing factor.

4) How do you handle a scenario where the series can end in fewer than four games if, for example, the stronger team concedes or there is a cancellation?

Although not typical in a World Series, you could imagine hypothetical scenarios where the match ends prematurely (e.g., a disqualification in game 3). The standard best-of-seven logic no longer applies. You’d need to treat the distribution over Y differently, because Y could be 3 (or even fewer) if external circumstances terminate the series. You would define new possible values of Y to include 1, 2, or 3 if those outcomes are truly possible. Then you’d build a joint probability model that includes disqualification or cancellation events (maybe with some probability q for an early termination at each stage). This can significantly alter the normal binomial-based approach. A common pitfall is to continue using the standard formula that implicitly assumes a 7-game maximum with no external interruption.

5) How do you incorporate the possibility of ties or extra innings in a single game outcome?

If every game must result in a win for one team or the other, there is no tie in an individual game. However, suppose the league has a rule that allows a game to be replayed if it ends in a tie. Then you’d have random lengths for individual games themselves, or you’d have to keep repeating a “tied” game until one team wins. The simpler best-of-seven framework usually assumes each single game yields a definitive winner. If you introduce possible ties, the distribution over the total number of played games for a final outcome (team reaching 4 wins) might be extended or complicated. You’d have to condition on how many tries each date took, or track how many tie results happened before a final victor is decided. A typical mistake is to ignore or not properly account for the probability mass that goes into repeated games due to ties.

6) How could we adapt this analysis if we cared only about how many games the stronger team won, regardless of whether they won the series?

You might be interested in the random variable “Z = number of games won by the stronger team” rather than specifically who won the series or how many total games were played. In that scenario, you would sum over all ways the stronger team could end up with 0, 1, 2, 3, or 4 wins (or even up to 7 if you consider partial counting in an extended scenario). However, because the series stops once one team reaches 4 wins, the distribution of Z is bounded in a complicated way (Z cannot exceed 4 if X=1, or Z can be less than 4 if the weaker team hits 4 first, etc.). You would have to consider carefully each condition for which the stronger team’s total ends up being 0, 1, 2, 3, or 4. A common pitfall is mixing up the unconditional probabilities with those conditional on the series actually finishing at a specific time.

7) How do you handle “momentum shifts” or psychological factors that affect probabilities after a streak of wins or losses?

A purely mathematical model (like Bernoulli trials) assumes independence: the probability of winning a given game is the same regardless of the sequence of previous wins or losses. In reality, a team’s performance might be influenced by momentum (e.g., after winning two games in a row, confidence is higher, or after falling behind 3-0, morale is lower). To accommodate that, you might incorporate a Markov chain with states reflecting the current series standing (e.g., (wins_stronger, wins_weaker)) and include a transition probability that depends on the state. This is more complex and means you cannot just write a neat closed-form expression in the same way. Instead, you would set up and solve the Markov chain by enumerating all state transitions until one team reaches 4 wins. The subtlety is that the probability can vary depending on (wins_stronger - wins_weaker) or other psychological variables, which can lead to a broad range of possible expansions to the model.

8) How would you confirm the correctness of your PMF derivations using simulations in large-scale practice?

One robust method is to run a Monte Carlo simulation of the best-of-seven series many times. For each simulated series, you record (X, Y) — i.e., which team won and how many total games were played. After enough trials, you estimate P(X=1, Y=k) and P(X=0, Y=k) by counting the relative frequencies of those outcomes. If your analytic formula is correct, the simulation results should closely approximate those theoretical probabilities. Pitfalls here include not running enough simulation trials to achieve stable estimates, which can create large sampling variance and mislead you about whether your formula is correct or incorrect. Another issue is coding mistakes: for instance, continuing the series even after a team hits 4 wins would artificially inflate some of the probabilities of higher Y values.

9) In a real competition, could the actual probability p=0.55 differ from game to game if players are fatigued or the pitching rotation changes?

Yes. In something like baseball, the pitching rotation can significantly change the chances each team has in each new game, because certain pitchers might only start once or twice in a best-of-seven series. If the stronger team’s best pitcher is available in game 1 and then again in game 5, the probability of winning in those games might be different from games where a less dominant pitcher is used. Another influence might be traveling for away games, which could shift the probability. Pitfalls include simplifying a real sport to a single p=0.55 that does not reflect actual complexities. In practice, an advanced model might break down the probability on a per-game basis, similar to the variable probability approach, or even condition on who the starting pitchers are in each game and whether it’s a home or away game.

10) How might you extend this model to a best-of-n series for any even or odd n?

The question states a best-of-seven format, but sometimes you might have a best-of-3, best-of-5, best-of-9, etc. The logic generalizes: the series ends as soon as one team reaches (n+1)/2 wins if n is odd, or n/2 + 1 if n is even. The fundamental approach is the same: one team must achieve the required number of wins, say r = 4 in the best-of-7 case, but that r would become (n/2)+1 for a best-of-n. The maximum possible games is n. The binomial reasoning for the final game requires exactly r-1 wins in the first (k-1) games and the last game is a win. However, keep an eye on changes to the range of k (which now goes from r to n). Also note that if n is even, you still end as soon as one team reaches r = (n/2)+1 wins, so there is no possibility of a final “tie-breaking” game beyond n. A typical mistake is to forget to adjust the binomial coefficients properly or to incorrectly sum the range of k values if the best-of-n format differs from the usual best-of-7.

Rohan's Bytes

Discussion about this post