ML Interview Q Series: Probability Calculation for Most Likely Sequence with a Biased Die.
Browse all the Probability Interview Questions here.
Question
A six-sided die has four green and two red faces and is balanced so that each face is equally likely to come up. The die will be rolled several times. You must choose one of the following three sequences of colours; you will win £25 if the first rolls of the die give the sequence that you have chosen:
R G R R R R
R G R R R G
G R R R R R
Without making any calculations, explain which sequence you choose. (In a psychological experiment, 63% of 260 students who had not studied probability chose the second sequence. This is evidence that our intuitive understanding of probability is not very accurate. These and other similar experiments are reported by A. Tversky and D. Kahneman, “Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment,” Psychological Review 90 (1983), pp. 293–315.)
Short Compact solution
All three sequences contain exactly one green and four reds in the first five rolls. That portion of each sequence has the same probability. However, the second and third sequences each have an extra roll that differs from the first sequence. Since an additional roll (with probability less than one) is involved, the second and third sequences are each strictly less probable than the first. Hence, without explicit calculation, the first sequence R G R R R R has the highest probability of occurring.
A direct calculation also confirms this. If p(R) = 1/3 and p(G) = 2/3, then:
Probability of R G R R R R is 2/243
Probability of R G R R R G is 4/729
Probability of G R R R R R is 2/729
Numerically, 2/243 is about 0.0082, which is greater than 0.0055 (4/729) and also greater than 0.0027 (2/729). Hence, the first sequence is indeed the most probable.
Comprehensive Explanation
Probability Setup
When a fair six-sided die has four green faces and two red faces, the probability of rolling a red face (R) is 1/3, and the probability of rolling a green face (G) is 2/3. Because each roll is independent of previous rolls, the probability of any specific sequence of outcomes is simply the product of the probabilities of each individual outcome.
Key Observations Without Direct Calculation
Same Distribution of Colours in First Five Rolls Each of the three sequences has one green and four reds in the first five rolls (though arranged differently). Therefore, for these first five rolls, the probability of seeing the pattern “1 green, 4 reds in some order” is the same across all three sequences.
Extra Roll The second and third sequences each have a sixth roll that changes the pattern (one ends with G, the other starts with G). In the second sequence, there is an extra green roll after the first five, while in the third sequence, there is an initial green roll followed by five more rolls including four reds and one green. Essentially, whenever you compare a five-roll outcome to a six-roll outcome, the six-roll outcome must multiply the original five-roll probability by the probability of that extra outcome—since that extra event has probability strictly less than 1, it reduces the overall probability.
Hence, “R G R R R R” is more likely than either “R G R R R G” or “G R R R R R,” just by noticing that two of the sequences effectively include an extra improbable event compared to the first.
Direct Numerical Calculation
We label the three sequences as follows:
Sequence 1: R G R R R R
Sequence 2: R G R R R G
Sequence 3: G R R R R R
Let p(R) = 1/3 and p(G) = 2/3.
Probability of Sequence 1
The factor (1/3) appears five times (for each R) and (2/3) appears once (for G). Numerically, this is 2/243 ≈ 0.0082.
Probability of Sequence 2
Numerically, 4/729 ≈ 0.0055, which is smaller than 2/243.
Probability of Sequence 3
Numerically, 2/729 ≈ 0.0027, which is even smaller. Therefore, the first sequence is the most probable of the three.
Underlying Principle of Independence
The probability of a particular ordered sequence of events, given independent trials, is the product of the probabilities of each event. Thus, once we fix how many times green or red appears in the first five rolls, what matters is whether there is an additional event tacked on. Because an extra roll (especially a less-than-100%-probability event) will multiply the existing probability by something less than 1, it must reduce the overall probability.
Common Pitfalls and Additional Insights
Conjunction Fallacy: People often believe that a sequence that “feels more representative” (like ensuring at least one more green after the first green) might have a higher probability. This is an example of the conjunction fallacy or representativeness heuristic.
Order vs. Count: Even though each of the three sequences has the same count of red and green faces in the first five rolls, the introduction of an extra roll (G or R) in the second or third sequence strictly decreases the probability compared to the simpler first sequence.
Potential Follow-Up Questions
1) How might we simulate this scenario in Python to verify empirically?
You can perform a large number of Monte Carlo simulations, roll a virtual die many times, and estimate how often each sequence appears at the start of the rolls:
import random
def roll_die():
# 4 green (G), 2 red (R)
faces = ['G','G','G','G','R','R']
return random.choice(faces)
def simulate(n=10_000_000):
seq1_count = 0
seq2_count = 0
seq3_count = 0
for _ in range(n):
rolls = [roll_die() for _ in range(6)]
# Check if first six rolls match each sequence
if rolls[:6] == ['R','G','R','R','R','R']:
seq1_count += 1
if rolls[:6] == ['R','G','R','R','R','G']:
seq2_count += 1
if rolls[:6] == ['G','R','R','R','R','R']:
seq3_count += 1
print("Empirical Probability Seq1:", seq1_count / n)
print("Empirical Probability Seq2:", seq2_count / n)
print("Empirical Probability Seq3:", seq3_count / n)
simulate()
As
n
grows large, the empirical probabilities for each sequence will converge to the theoretical values: approximately 0.0082 (first), 0.0055 (second), and 0.0027 (third).
2) What if we only asked whether the first five rolls contained one green and four reds, without caring about the order?
The probability of exactly one green in five independent rolls is governed by a binomial distribution with p=2/3 for G and 1/3 for R. So we compute:
Probability = (Number of ways to choose where the single green appears) × (2/3)^1 × (1/3)^4.
This calculation helps confirm why each of the first five-roll partial sequences is equally likely in terms of the distribution of colours. The difference among the given sequences arises once you specify the exact order or you tack on an additional roll that changes the final probability.
3) Could the second sequence ever be more probable if the die were biased differently?
Yes. If we changed the probabilities in such a way that p(G) or p(R) were drastically different (for instance, if rolling G were extremely likely, more than 90%), then a sequence with more G’s might outweigh the minimal multiplication penalty introduced by an extra roll. However, for this particular distribution (p(R) = 1/3, p(G) = 2/3), the penalty of multiplying by p(G) or p(R) remains less than 1, so adding extra steps will reduce the overall probability.
4) Does the order of outcomes matter if we just care about the total counts of red and green in the first six rolls?
If the question was only about the total counts of red and green (e.g., “exactly one green, five reds in the first six tosses”), then any arrangement with exactly one green and five reds would have the same probability. But once we specify the exact order—e.g., “the first roll is red, the second is green, the third is red, etc.”—we must multiply out the probability for each position, which changes things drastically when sequences differ in length or pattern.
5) How does this relate to real-world data science or machine learning scenarios?
In practice, this demonstrates how humans often underestimate or overestimate probabilities based on intuitive heuristics, especially in sequential data. In ML, especially when designing time-series models or generative models (e.g., language models), it is crucial to remember that the probability of a specific sequence is typically the product of step-by-step conditional or independent probabilities. Misinterpretation of independence or the influence of additional conditions can lead to suboptimal models or incorrect conclusions about data patterns.
Such examples underscore the importance of carefully distinguishing between intuitive hunches about probability and the strict mathematical definitions in a data-driven or model-based environment.