ML Interview Q Series: Conditional Probability for Sequential Ball Draws Without Replacement
Browse all the Probability Interview Questions here.
An urn contains r red balls and b blue balls, where r≥1 and b≥3. Three balls are selected, without replacement, from the urn. Using the notion of conditional probability to simplify the problem, find the probability of drawing the sequence Blue, Red, Blue.
Short Compact solution
Comprehensive Explanation
The fundamental idea is that we want the probability of drawing a blue ball on the first draw, then a red ball on the second draw, and finally a blue ball on the third draw, with no replacement between draws. To handle such a sequence problem, we use the chain rule of conditional probabilities.
Let B1 be the event that the first ball is blue, R2 be the event that the second ball is red, and B3 be the event that the third ball is blue. Then:
P(B1 R2 B3) = P(B3 given R2 and B1) times P(R2 given B1) times P(B1)
We break it down as follows:
First draw (Blue): The probability of drawing a blue ball first is b / (r + b), because there are b blue balls in total and r red balls, so b out of r+b total balls are blue.
Second draw (Red given the first was Blue): After drawing a blue ball in the first draw (and not replacing it), the urn has b-1 blue balls and r red balls, for a total of (r + b - 1) balls. Therefore, the probability of drawing a red ball at this second draw is r / (r + b - 1).
Third draw (Blue given the first was Blue and the second was Red): Now we have drawn one blue ball and one red ball. Thus, we have b-1 blue balls left and r-1 red balls left if we needed to consider a different color—however, because the second draw was red, the remaining count of red balls is r - 1 only if that is relevant for other probabilities. For the third draw, we specifically want another blue ball. The number of blue balls remaining after removing one blue ball on the first draw is b - 1. But the second draw was red, so the number of blue balls is still b - 1, and the total number of balls left is (r + b - 2). Therefore, the probability of drawing a blue ball on the third draw, given the first two were B and R, is (b - 1) / (r + b - 2).
Multiplying these probabilities together yields the probability for the specific sequence Blue, Red, Blue:
(b / (r + b)) × (r / (r + b - 1)) × ((b - 1) / (r + b - 2))
Each factor corresponds to the probability of drawing the required color at each draw, taking into account the urn's changing composition due to the lack of replacement.
This same approach extends to any particular sequence of draws of different colors. The key is to use conditional probability each time, updating the total number of balls and the number of the relevant color.
Follow-up Question 1
How does this result compare to drawing three balls at once and simply counting how many ways to get the same color sequence?
If you think of drawing three balls simultaneously, you do not see an order in which they are drawn, so you must carefully account for permutations of chosen colors in that scenario. In our problem, the order matters: specifically, Blue, then Red, then Blue. If you were to compute it via a combinatorial approach in a single draw, you would have to multiply by the number of permutations that place blues and reds in the correct positions. However, because we explicitly care about the draw order, the most straightforward route is to use conditional probabilities or the direct hypergeometric approach that incorporates ordering.
In a simultaneous-draw framing, the probability that exactly two blues and one red appear in a specific ordered arrangement is the same as we obtained, but you would have to count permutations for an ordered sequence. The chain-rule method is often simpler when the problem specifically states an ordered sequence.
Follow-up Question 2
Why do we multiply probabilities instead of adding them?
We multiply them because we are dealing with the intersection of multiple dependent events in a specific order. For dependent events, where the outcome of one draw changes the probability of what happens on subsequent draws, the joint probability is the product of conditional probabilities.
If you were dealing with a scenario of mutually exclusive events (like “drawing a ball that is either blue or red in one single draw”), you might add probabilities. But here, we want B1 and R2 and B3 all happening in that exact sequence, so we multiply.
Follow-up Question 3
What are the edge cases when r or b takes smaller values?
The problem requires at least 1 red ball and at least 3 blue balls, since we want two blues to appear in the sequence. If b < 3, you cannot draw Blue twice, and that probability becomes zero. Similarly, if r < 1, you cannot draw Red in the second draw. Hence, the formula only truly makes sense for r≥1 and b≥3. If the parameters do not satisfy these constraints, the probability of Blue, Red, Blue is obviously 0 because one of the required draws cannot occur.
Follow-up Question 4
How would this probability change if the draws were performed with replacement?
With replacement, each draw is an independent event. The probability of drawing a blue ball on any single draw would remain b / (r + b), and the probability of drawing a red ball on any single draw would remain r / (r + b). For a sequence of Blue, Red, Blue, you would simply multiply (b / (r + b)) × (r / (r + b)) × (b / (r + b)), because none of the urn’s contents change from one draw to the next. That is different from the without-replacement scenario where the composition of the urn changes after each draw.
Follow-up Question 5
Can we confirm the probability by using a direct count-based approach?
Yes. You can count the ways to pick an ordered triple of balls (Blue, Red, Blue). Specifically, the number of ways to pick a blue ball for the first position is b, then a red ball for the second is r, and then a blue ball for the third is b - 1 (assuming the first ball was already blue). Overall, that results in b × r × (b - 1) ordered choices. Meanwhile, the total number of ways to draw three balls in sequence from r + b is (r + b)(r + b - 1)(r + b - 2). Dividing the favorable outcomes by the total outcomes gives the same fraction we computed through conditional probability.
The conditional-probability approach is often clearer when the question explicitly focuses on sequential events, but the combinatorial approach is a good consistency check.
Follow-up Question 6
Why do we write the probability in terms of conditional probabilities rather than just counting?
Both approaches are correct, but conditional probability is especially intuitive for sequential draw problems. It makes each step transparent and straightforward to interpret. For instance, in many data science or machine learning settings where events happen step by step and conditions evolve over time, it is much more intuitive to update probabilities with each event. This is especially helpful in Markov chain modeling or Bayesian reasoning, where the chain rule of probability neatly expresses the sequence of events.
When dealing with real-time predictions (such as predicting that an event will happen on the first step, then another event on the second, etc.), expressing the probability via conditional probabilities often ties in neatly with how probabilities are updated incrementally.
Follow-up Question 7
How can this concept be applied in machine learning or data science contexts?
In machine learning, you often deal with sequential data or processes (for example, time series analysis, reinforcement learning, or Markov decision processes). Conditional probabilities allow you to model transitions from one state to another, updating your beliefs or predictions at each step. The approach demonstrated by the urn example generalizes to hidden Markov models, Bayesian networks, and other sequential models where the occurrence of one event modifies the probability of the next event.
This basic probability principle is also central in dealing with real-world data cleaning problems or hyperparameter search processes, where you might sample configurations sequentially without replacement. Understanding how probabilities evolve step by step is critical for designing and analyzing algorithms that adapt to changes in data distributions or environment states.