ML Interview Q Series: Expected Remaining White Balls: Solved with Indicator Variables and Symmetry

May 08, 2025

Browse all the Probability Interview Questions here.

Question: A bag contains R red balls and W white balls. Each time you draw one ball at random and without replacement. You stop as soon as all red balls have been removed. What is the expected number of white balls remaining in the bag when you stop?

Short Compact solution

Label each white ball with an index k from 1 to W. Let I_k be an indicator variable that is 1 if white ball k is still in the bag when all red balls have been drawn, and 0 otherwise. By symmetry, the probability that any specific white ball survives until all R red balls are taken out is 1/(R+1). Therefore, the expected value of I_k is 1/(R+1). Summing across all W white balls gives the expected number of white balls left:

Comprehensive Explanation

To see why each white ball has probability 1/(R+1) of remaining, consider how all R red and W white balls can be arranged in a random order. The moment the last red ball appears in that sequence, we stop and count how many white balls are positioned after that last red ball. For a given white ball k, the event “white ball k is not drawn before the last red ball” is equivalent to “white ball k appears in the random permutation of all R+W balls at a position after the last red ball.” By symmetry, each white ball has an equal chance of appearing at any of the R+W positions. Hence, the probability that white ball k appears after all R red balls is 1/(R+1).

The expected value of the indicator I_k (which is 1 if white ball k remains) is therefore 1/(R+1). By linearity of expectation, we add up these expected values for all W white balls to get W/(R+1).

This reasoning does not depend on the actual distribution of draws at each step. Instead, it leverages the fact that in a random permutation of the entire set of balls, each white ball is equally likely to appear after all R red balls.

Potential Follow-up Questions

What if we wanted the distribution, not just the expectation, of the number of white balls remaining?

One can derive the full distribution by noting that each white ball independently has probability 1/(R+1) of being “after the last red ball.” However, the events are not fully independent across all white balls, so we cannot simply say it’s a Binomial(W, 1/(R+1)) random variable. For exact distributions, more careful combinatorial arguments are needed. Nonetheless, the expectation is correct and is easily computed via the indicator variable approach.

Could we solve this problem with an alternative direct argument (without indicator variables)?

Yes, there is a direct “coupon collector”-type argument: in any arrangement of the R red and W white balls, the position of the last red ball splits the sequence into two segments. Because all permutations of these R+W balls are equally likely, the expected position of that last red ball is (R+W+1)/(R+1). That means the fraction of the sequence after that position is (W/(R+1)) on average. Translating that fraction back into “number of white balls expected to appear after the last red ball” yields W/(R+1).

How would this change if balls were drawn with replacement until we got R red balls in total?

Drawing with replacement changes the process significantly because the red and white balls are not being removed from the population. The time to get R red balls is now a negative binomial process. The number of white balls seen up to the Rth red ball can be modeled by the distribution of the counts of white draws up to that point. However, you would not be “removing” white balls from the bag, so the concept of “remaining white balls” is no longer well-defined in the same sense, as the total number of white balls in the bag remains constant. A different question would have to be posed to capture “how many white draws occur” or “how many draws in total” until R red draws are observed.

Can we simulate this scenario easily in Python?

Absolutely. For instance, we can simulate random draws from the bag (without replacement) multiple times and observe the average number of white balls left. Here is a simple example:

import random
import statistics

def simulate_expected_white_left(R, W, num_sims=10_000):
    results = []
    for _ in range(num_sims):
        # Create the bag of balls
        # Represent 'R' for red and 'W' for white
        bag = ['R']*R + ['W']*W
        random.shuffle(bag)

        # Draw until all R are out
        red_count = 0
        index = 0
        while red_count < R:
            if bag[index] == 'R':
                red_count += 1
            index += 1

        # Count remaining white balls after the last red
        white_left = bag[index:].count('W')
        results.append(white_left)

    return statistics.mean(results)

# Example usage
print(simulate_expected_white_left(R=5, W=3, num_sims=100000))

If you run this simulation, the result will be close to W/(R+1), illustrating the validity of the analytical solution.

Are there edge cases where R=0 or W=0?

If R=0, we never draw a red ball, so we would stop immediately (since there are no red balls to remove). The expected number of white balls remaining is just W because you never remove any white ball. Checking our formula W/(R+1) = W/(0+1)=W also yields W, so it remains consistent.

If W=0, the bag contains only red balls. We stop once all red balls are removed, leaving no white balls anyway. In that trivial case, the expected number of white balls remaining is 0. The formula also gives W/(R+1)=0/(R+1)=0, which holds true.

These edge cases demonstrate that the formula still works correctly in degenerate scenarios.

Below are additional follow-up questions

What if we want to compute the variance of the number of white balls remaining?

To find the variance, we need to consider not just the expectation of the number of white balls remaining but also the second moment. Let X be the total number of white balls left. X can be expressed as X = I_1 + I_2 + ... + I_W, where each I_k is an indicator for the k-th white ball remaining. We already know E[I_k] = 1/(R+1). To compute Var(X), we need E[I_k I_j] for k ≠ j. The subtlety is that these indicators I_k and I_j are not fully independent, because if one white ball remains in the bag, that might affect (albeit slightly) the probability of another also remaining.

Concretely, we have:

E[X] = W/(R+1).
Var(X) = E[X^2] - (E[X])^2.

We can expand E[X^2] = E[ (I_1 + ... + I_W)^2 ] = ∑ E[I_k] + 2 ∑ E[I_k I_j] for k < j. Each E[I_k] is 1/(R+1). Each E[I_k I_j] requires a combinatorial argument to find the probability that both the k-th and j-th white balls remain after the last red ball is drawn. It turns out that each pair of white balls remains with probability 1/(R+2), but we must be meticulous to show that this is indeed the correct joint probability. Once we find the correct E[I_k I_j], we can piece together the variance.

This is a common point where candidates might slip up, because the dependence among these indicators is not always trivial. Ensuring you properly handle the combinatorial aspect (or direct probability arguments) to compute E[I_k I_j] is crucial to obtaining the correct variance.

If the drawing process were not uniformly random, how would that affect our expectation?

In the standard problem, each draw is assumed to be uniformly random from the remaining balls. Suppose we alter that assumption so some balls might be more likely to be drawn than others. For instance, imagine a scenario in which red balls are heavier, or the process is biased toward picking larger/smaller balls.

The key to the original conclusion that the expected number of white balls remaining is W/(R+1) relies on symmetry: we treat each ball as equally likely to appear in each position in a random ordering. If the drawing mechanism is not uniform, this symmetry argument breaks down. In such a biased setting, the probability that a specific white ball remains until after all red balls are removed could differ from 1/(R+1). Without uniformity or a clear symmetry, we need to explicitly model the biased drawing probabilities at each step. The result for the expected number of white balls left might no longer be a neat fraction W/(R+1), and each white ball’s survival probability might differ, depending on the bias.

What if we replace each red ball with a white ball immediately upon drawing it?

In this modified process, whenever we draw a red ball, we immediately put a new white ball back into the bag (without returning the red ball). Over time, the number of white balls in the bag increases while the number of red balls decreases. We want to know the expected number of white balls left when the last original red ball is removed.

This scenario no longer has a simple combinatorial symmetry because the composition of the bag changes in a non-traditional way. The time to remove all R original red balls might grow, and you might add up to R new white balls if you remove all red balls one by one. A direct combinatorial formula becomes more involved, and you typically have to set up a more advanced Markov chain or a Pólya’s urn-type argument. The expectation is unlikely to remain as straightforward as W/(R+1).

What happens if the process is stopped once at least M red balls (where M < R) are drawn?

Sometimes, you only need M out of the R red balls, and you stop early as soon as you get those M red balls. In that case, you would want the expected number of remaining white balls after obtaining M red draws, rather than all R. The structure of the problem is similar, but now there are still some red balls in the bag when you stop. In a random permutation viewpoint, you look for the position of the M-th red ball in a random arrangement of R red and W white balls. The expected position of the M-th red ball in the sequence can be computed using order-statistics arguments, but it requires a bit more complexity than the last red ball. Alternatively, you can still use indicator variables, but each white ball’s survival probability is that it appears after the M-th red ball in the random sequence, which is M/(R+1) in the special case M=R, but for general M < R, you have a more nuanced formula involving combinations.

Pitfalls here include forgetting that the M-th red ball is not necessarily the last red ball; there are additional red balls that remain in the bag. The effect is that each white ball’s “chance” of appearing after that M-th red ball changes depending on how many red balls remain to be placed in the sequence.

What if balls were partially defective so sometimes a ball is destroyed upon drawing?

In some real-world applications, a ball (or item) might have a probability p of being destroyed (removed) each time it is drawn, whether or not it’s red or white. This modifies the problem drastically: the number of total balls can shrink faster than just from removing one ball per draw. You would have to track probabilities of each ball’s state (destroyed or still in the bag) as the draws progress.

A naive approach might try to adapt the same random permutation logic, but that logic depends on each ball exactly occupying one distinct position in a sequence of R+W draws. If a ball can be destroyed with probability p at each draw, the distribution of final bag contents is not as simple. You could attempt a more general state-based approach or dynamic programming to figure out the expected count of white balls left by the time all original red balls have either been removed or destroyed. The complexity increases significantly, and the simple 1/(R+1) symmetry for each white ball no longer holds.

How does this result generalize to multiple colors beyond just red and white?

Imagine we have C different colors instead of just red and white. Suppose each time we draw one ball at random (without replacement), and we stop when all balls of a specific color (say color #1) have been removed. Now we want to know the expected number of balls of each other color that remain.

A straightforward extension can be made if we label all balls of each color and then use symmetry arguments specific to each color. If the color of interest has R_1 balls, and we are focusing on some other color with R_2 balls, the probability that a given ball of color #2 remains after all R_1 color #1 balls are drawn is typically 1/(R_1+1). But this only applies if we are ignoring all other colors and focusing solely on color #1 and color #2. If we mix in more colors and the stop condition is “all of color #1 is out,” each color #2 ball’s survival probability might still come out to 1/(R_1+1) by an extension of the same combinatorial argument. However, ensuring no hidden assumptions is crucial (such as each color truly being distinct and no partial stops occur for other colors).

Could sampling strategies or reservoir sampling ideas extend to a similar question?

Sometimes you have a streaming or sampling context. For example, in reservoir sampling, you have a stream of R+W items, and you randomly choose k of them without knowing the full size in advance. If you only stop when you have encountered all items of a certain type, you might wonder about how many items of the other type remain unchosen. However, the classic reservoir sampling procedure is somewhat different from drawing without replacement from a fixed bag. The logic of expected counts might still hinge on random permutation arguments but must be carefully adapted to streaming contexts.

A pitfall is assuming the same 1/(R+1) probability applies when the sampling method or stopping criterion changes drastically. Each modification can break the symmetry or independence that you relied on in the simpler scenario.

What if we are interested in the probability that exactly k white balls remain, instead of the expected number?

While we know the expectation of the number of white balls left, the probability that exactly k white balls remain is a more nuanced distribution question. One might guess it is given by certain combinations of red and white positions in a random permutation, specifically counting the permutations in which exactly k white balls come after the last red ball. If we let X be the number of white balls remaining, we want P(X=k). A high-level approach is:

Choose which k white balls end up after the last red ball.
Choose an ordering for the R red balls and (W–k) white balls that appear before the last red ball.
Place those k white balls in the positions following the last red ball.

One must carefully account for the fact that the last red ball’s position is simultaneously chosen in the arrangement. The combinatorial formula can be derived, but it is more intricate than simply stating the expectation. A common pitfall is to treat all events as independent and incorrectly write a binomial expression, whereas the correct distribution typically involves hypergeometric-style reasoning with constraints on the position of the last red ball.

Rohan's Bytes

Discussion about this post