ML Interview Q Series: Markov Chain Ball Swapping: Probability of Reaching All Black State Within 5 Picks.
Browse all the Probability Interview Questions here.
A jar contains three white balls and two black balls. Each time you pick at random one ball from the jar. If it is a white ball, a black ball is inserted instead; otherwise, a white ball is inserted instead. You continue until all balls in the jar are black. What is the probability that you need no more than five picks to achieve this?
Short Compact solution
The problem is equivalent to finding the probability that the process ends in either 3 picks or 5 picks. The probability of finishing in exactly 3 picks is
The probability of finishing in exactly 5 picks—after accounting for the conditional ways in which the draws can lead to precisely two black-to-white replacements (and hence requiring two more successful white-to-black draws)—also comes out to 6/125. Summing these yields a total probability of 12/125.
Comprehensive Explanation
To understand why these events happen in exactly 3 or 5 picks, consider the sequence of transformations:
We start with 2 black and 3 white.
Each time we draw a ball:
If we draw a white ball (3 out of 5 at the start), we replace it with a black ball, increasing the number of black balls by 1.
If we draw a black ball (2 out of 5 at the start), we replace it with a white ball, decreasing the number of black balls by 1.
We are interested in the event that all 5 balls become black no later than the fifth draw.
Ending in exactly 3 picks
Reaching all black balls in 3 picks means that in the first 3 draws, you must have picked a white ball every single time (i.e., you keep adding black balls). Concretely:
Initially: 2 black, 3 white
First pick must be white (probability 3/5) → new state: 3 black, 2 white
Second pick must be white (now probability 2/5) → new state: 4 black, 1 white
Third pick must be white (probability 1/5) → new state: 5 black
Therefore, the probability is 3/5 × 2/5 × 1/5 = 6/125.
Ending in exactly 5 picks
Finishing exactly on the fifth pick means you do not succeed by the end of the third pick. In other words, among the first three picks, at least one must be a black ball (causing a step backward from black to white). However, by the time you get to the fifth pick, you must have exactly 5 black balls. The calculation accounts for these conditional sequences of picks, ultimately producing 6/125. Adding the two probabilities (finishing in 3 picks or 5 picks) gives 12/125.
Markov chain viewpoint
Another way to see this problem is by treating the count of black balls as a Markov chain with possible states from 0 black to 5 black (though you start at state 2). You transition +1 black with probability (number of white)/(5) and −1 black with probability (number of black)/(5). The probability that you reach state 5 (all black) for the first time on pick 3 or pick 5 can be computed using standard Markov chain first-passage probabilities. In this small-state problem, direct enumeration of sequences (the approach used above) tends to be simpler.
Example of a simulation approach
Below is a Python snippet illustrating how one might empirically verify the probability via Monte Carlo simulation:
import random
def simulate_one_run():
# 3 white, 2 black initially
black = 2
white = 3
picks = 0
while black < 5:
picks += 1
# Draw a ball
# Probability of drawing black = black/5
# Probability of drawing white = white/5
if random.random() < black/5:
# We drew black, so we convert it to white
black -= 1
white += 1
else:
# We drew white, so convert it to black
white -= 1
black += 1
return picks
N = 10_000_000
count_no_more_than_5 = 0
for _ in range(N):
picks_needed = simulate_one_run()
if picks_needed <= 5:
count_no_more_than_5 += 1
print("Estimated probability:", count_no_more_than_5 / N)
If you run this simulation with a large number of trials, you will see the result approach 12/125 ≈ 0.096.
Possible Follow-Up Questions
Why can't the process end in exactly 4 picks?
To move from 2 black to 5 black, you need exactly 3 net gains in black balls. Each time you draw a white ball, you gain 1 black; each time you draw a black ball, you lose 1 black. For the total number of picks to be exactly 4, you would need to have three net additions of black balls within 4 picks. That would require an odd number of positive steps (white draws) in an even number of total draws (4). Each black draw would undo one white draw, so effectively you cannot net +3 after 4 draws. Hence, 3 picks or 5 picks are the only ways to finish at exactly 5 black within 5 draws.
How would we extend this to different initial counts?
You could generalize the approach. Suppose you start with W white and B black balls, total T = W + B. Each draw changes a ball’s color from white to black or vice versa. If you want the probability of reaching all-black states within some fixed number of draws, you again track possible sequences of color draws. Alternatively, you set up the Markov chain with states 0 through T (the number of black balls) and calculate first-passage probabilities to the state T from the initial state B.
What is the expected number of picks to reach all black?
You can set up a system of linear equations for the expected time to go from a given number of black balls to T black. Specifically, let E(n) = expected number of picks to reach T black from n black. Then for n in [1, T−1]:
E(n) = 1 + (n/T) * E(n−1) + ((T−n)/T) * E(n+1)
with boundary conditions E(0) = 1 + E(1) (since from 0 black you cannot go to fewer black balls, but that’s a more careful case) and E(T) = 0 because you are already at T black. Solving this system gives you a closed-form expression. In the specific case of starting with 2 black and 3 white, you could solve the smaller system of equations for n = 2, 3, 4 to get E(2).
How do we verify these theoretical probabilities in production code?
In practical data science or simulations, you could:
Write a simulation (like the code above).
Cross-check partial event probabilities (e.g., finishing in 3 picks, finishing in 5 picks) for consistency.
Use built-in libraries to handle Markov chain computations in more complex scenarios.
These checks confirm your analytical solution.
Below are additional follow-up questions
What if we wanted to know the probability of finishing in exactly 6 picks or more?
A crucial point in the original solution is that the process can only end in 3 or 5 picks if we restrict to no more than 5 picks. But you might wonder how this logic extends beyond 5 picks:
Detailed Explanation
After 5 picks, if the jar is not yet all black, the process continues. The probability of finishing exactly on pick 6 or 7, etc., can also be computed. For instance, to end exactly on pick 6, you need an odd number of net “white draws” (i.e., black gains) from the initial 2 black to total 5 black, but you also need the last pick to be the critical one that transitions you from 4 black to 5 black.
In principle, we can track sequences or use Markov chain first-passage times. For the Markov chain approach, let p(n, k) = Probability that starting from n black balls, you first reach 5 black on the kth pick. Then we can write recursive relationships based on the transition probabilities: from n black, you go to n+1 black with probability (5−n)/5, or to n−1 black with probability n/5. Summation of p(2, k) for k = 3, 4, 5, 6, … would be 1 if we consider an infinite horizon.
A potential pitfall is forgetting that multiple transitions back and forth between states can occur, especially if the process takes many picks. Over more draws, you might see state changes such as 2 → 3 → 2 → 3 → 4 → 3 → 4 → 5, and so on.
Is the process guaranteed to terminate in a finite number of draws?
Since every draw can flip a ball color in either direction, one might worry about the possibility of oscillation or cycling (e.g., going back and forth between states).
Detailed Explanation
This process, despite having transitions in both directions, is a finite-state Markov chain with states {0 black, 1 black, 2 black, 3 black, 4 black, 5 black}. A fundamental result in Markov chain theory states that if there is a positive probability to move from any state to the absorbing state (in this case, 5 black is absorbing), you will eventually reach it with probability 1, given infinite time.
In some draws, you might revert from 4 black to 3 black, or from 3 black back to 2 black. But there is always a nonzero probability of drawing the right color that moves you closer to 5 black. Over many draws, you will eventually accumulate enough white-ball picks to reach 5 black permanently.
A subtle edge case is if you accidentally think you can get stuck in a loop, like toggling between 2 black and 3 black forever. While it is possible to remain in these states for many draws, the probability of never leaving them given infinite draws is 0 because each draw provides an opportunity to move toward 5 black.
How would we compute the variance or higher moments of the time to absorption?
It is often useful in stochastic processes to look beyond just probabilities of finishing by a certain pick and examine the distribution’s variance, skew, or other moments.
Detailed Explanation
One systematic method involves the Markov chain’s fundamental matrix. For a chain with transition matrix P (after removing the absorbing state), you can find the matrix N = (I − Q)^(-1), where Q is the submatrix of transition probabilities among transient states. Then the expected time spent in each transient state and higher moments can be derived from N.
To compute variance of the number of picks, you typically use formulas of the form Var(T) = E(T^2) − [E(T)]^2. The quantity E(T^2) can be found via specialized matrix derivations or known identities for absorbing Markov chains.
A possible pitfall is ignoring that partial dependencies between state transitions can complicate direct attempts at moment calculations without the Markov chain approach or a carefully enumerated set of paths.
What if the process involved adding more balls to the jar instead of simply swapping colors?
Sometimes, a question might be modified: If you draw a white ball, you add a black ball in addition (so the jar grows in total size), or perhaps a real-world process where items get duplicated in some color.
Detailed Explanation
The entire probability space changes if the total number of balls is not fixed at 5. You lose the simple property that “probability of drawing a black ball is black_count/5.” Instead, if the jar size is dynamic, you must keep track of the changing denominator as well.
This becomes a branching-type problem if you are increasing the number of items. The probability that all balls eventually become black could actually be 1 or less than 1 depending on how additions occur.
A pitfall is to apply the same logic of states 0 to 5 black, ignoring that the total might exceed 5. In that scenario, the Markov chain would have a different structure, possibly infinite states if you can keep growing the jar.
Could the problem change if we labeled the balls distinctly (e.g., Ball #1, Ball #2, etc.)?
Another twist is if you care about the identity of each ball or if the question does.
Detailed Explanation
In the classical approach, all balls of the same color are identical. Hence, we only track the count of black or white balls.
If each ball had a unique label, we might ask: “What is the probability that each labeled ball eventually becomes black for the first time by pick 5?” However, the unlabeled version only cares about the total color composition.
A pitfall is to overcomplicate with labeled states when the problem only requires the aggregate number of black balls. Conversely, if an interviewer modifies the question to track each labeled ball’s color history, the solution approach and state space become more intricate.
How would you analyze the scenario if the probability of converting a ball was not 100% (e.g., we draw a white ball, but only with probability p do we switch it to black, otherwise we put it back as white)?
Sometimes in real-world processes, the “color switch” is not guaranteed. Instead, there is a probability p < 1 that the change actually happens.
Detailed Explanation
In that case, your transition probabilities from n black to (n+1) black or (n−1) black would become:
Probability of going from n to n+1 black = (T−n)/T * p.
Probability of going from n to n−1 black = n/T * p.
Probability of staying in the same state = 1 − p (when a ball does not get converted).
You still have a Markov chain with T+1 states (0 to T black). However, states are no longer guaranteed to transition out if p < 1, and so analyzing the time to absorption might be more complex.
A common pitfall is to assume you can keep the same logic that each draw definitely flips a color. Now, a certain fraction of draws will do nothing, which can increase the expected time to reach all black and change the probabilities of finishing by certain pick counts.
Could negative binomial or other discrete distributions be relevant here?
Sometimes, instead of enumerating paths, a candidate might mention well-known discrete distributions like geometric, negative binomial, or binomial to short-circuit the problem.
Detailed Explanation
The direct mapping to a negative binomial distribution does not hold here because each pick changes the composition in a self-referential way. In a negative binomial setting, each trial is independent with a fixed probability of “success,” but in our scenario, the probability of drawing white changes after each pick.
While it might look at first like “We need 3 successes (white draws) before a certain number of failures,” that is not entirely correct. Each success changes the fraction of white/black balls in the jar. Hence we cannot treat the draws as independent or identically distributed.
A pitfall is to try to directly apply a negative binomial formula without noticing that the probability of success is not constant across draws. The Markov chain approach or direct enumeration are safer methods.
How might approximate methods (like continuous approximations or large-limit approximations) help if we scaled up the jar size?
For much larger T, enumerating paths becomes infeasible.
Detailed Explanation
One could use a diffusion approximation, which treats the fraction of black balls as a continuous variable x in [0, 1]. Then, the fraction changes by ±1/T in each step with probabilities depending on x. This can be modeled by a stochastic differential equation (SDE) for large T.
Such SDE approximations (e.g., Ornstein–Uhlenbeck or a drift-diffusion process) give insight into how quickly we expect to hit x = 1. They are approximate but can be very accurate when T is large.
A key pitfall is ignoring the boundary behavior near x = 1 or x = 0. In small jars, boundary effects are huge (1 out of 5 is a big fraction). For large T, boundary effects matter less in early or mid dynamics but are still crucial near x = 1, so specialized boundary-correction techniques might be needed.
How do we handle concurrency or parallel draws in a real-world system?
Imagine a scenario where multiple draws happen in parallel (e.g., several picks occur simultaneously from disjoint subsets of the jar). While not exactly the same problem, it’s a question interviewers might pose to see if you can generalize.
Detailed Explanation
Parallel draws from the same jar complicate the model. If you draw from disjoint subsets, the transitions are not strictly a single-ball replacement event. You can flip multiple balls at once.
You could approximate by splitting the jar into smaller independent compartments, but strictly, each parallel draw still depends on the overall state if the compartments overlap.
A pitfall is to assume you can just multiply probabilities from independent single draws. Once concurrency is introduced, the single-step transitions are replaced by multi-step transitions that might jump multiple states at once in a Markov chain perspective.
What is a simple sanity check on the probabilities computed?
Even in a small problem, it’s good practice to see if probabilities sum to 1 over all possible pick counts.
Detailed Explanation
The chain must eventually reach 5 black. Hence the probabilities that it finishes on pick k for k = 3, 4, 5, 6, … must sum to 1. We already reason that finishing on pick 4 is impossible, but if you computed p(3) + p(5) + p(6) + ... = 1.
A quick sanity check is to confirm that p(3) + p(5) = 12/125 for finishing by pick 5, which is consistent with simulation results.
A potential error is forgetting that the distribution over finishing times extends beyond 5 picks. If your sum of probabilities for picks 3, 4, and 5 already exceeds 1, that’s a sure sign your computations are incorrect.
Could these probabilities be related to gambler’s ruin probabilities?
This process has similarities to gambler’s ruin, where you have a random walk between 0 black and 5 black, with absorbing boundaries at both ends.
Detailed Explanation
Gambler’s ruin typically says that if a gambler starts with i dollars and can either gain or lose 1 dollar on each bet with certain probabilities, the probability of eventually reaching some total M is well-studied. Here, “i” is the initial number of black balls (2), “M” is 5 black, and “0” black is also an absorbing boundary.
The difference is that gambler’s ruin often has a fixed probability p of winning each bet. In this ball problem, p changes as the fraction of black to white changes. Thus, it’s not exactly the classic gambler’s ruin.
A common pitfall is to assume gambler’s ruin formula applies directly. In our scenario, the fraction of black vs. white changes with each pick, so the probability of “winning” or “losing” at each step is dynamic rather than constant.
Could we accidentally misinterpret the procedure if we draw a black ball?
One final subtlety is how some might misread “otherwise, a white ball is inserted” as an entirely separate event from removing the black ball. It’s critical to confirm whether the black ball is indeed removed and replaced by a white ball, or if a white ball is added in addition to the black ball.
Detailed Explanation
The correct interpretation is that if you draw a black ball, you remove it from the jar and insert a white ball in its place, so the total count remains 5.
A misinterpretation might lead you to incorrectly model the jar’s size as 6 if you think you’re adding a white ball on top of the black ball that was drawn.
This drastically changes the probability structure if you do not keep the total at 5. So always confirm the logic that “insert instead” means removal+insertion of the same slot, not an additional ball.