ML Interview Q Series: Calculating Dice Roll Probabilities Using Combinatorics and the Inclusion-Exclusion Principle.

May 15, 2025

Browse all the Probability Interview Questions here.

You roll a fair die six times. (a) What is the probability that three of the six possible outcomes do not show up and each of the other three possible outcomes shows up two times? (b) What is the probability that some outcome shows up at least three times?

Short Compact solution

For part (a), the probability that three of the six faces never appear and the remaining three faces each appear exactly twice is

For part (b), define A_i as the event that face i appears exactly three times, for i = 1,...,6. By applying the inclusion-exclusion principle and then adding the probability that some face shows up exactly 4, 5, or 6 times, we get

Probability(some outcome appears at least three times) = 0.3151 + 0.0522 = 0.3672.

Comprehensive Explanation

Part (a): Probability that exactly three faces each appear twice

When rolling a fair six-sided die six times, we have 6⁶ equally likely outcomes. We want the probability that exactly three distinct faces show up, with each of those three faces appearing exactly twice.

To count the number of favorable outcomes, we do the following. First, we choose which 3 faces (out of the 6) will appear. Then, once those faces are chosen, we must distribute exactly two appearances to each of these three chosen faces across the 6 rolls.

Step 1: Choose the 3 faces out of 6. The number of ways to do that is (6 choose 3).

Step 2: We have 6 distinct roll positions, and we want to place exactly 2 occurrences of face 1, 2 occurrences of face 2, and 2 occurrences of face 3. The multinomial coefficient for arranging these three faces in the six slots can be broken down as a product of binomial coefficients: Choose 2 positions (out of 6) for the first chosen face, then 2 of the remaining 4 positions for the second chosen face, and finally 2 of the remaining 2 positions for the third chosen face. This gives (6 choose 2)(4 choose 2)(2 choose 2).

Multiplying these counts and dividing by 6⁶ (the total number of possible outcomes of 6 rolls) gives the desired probability:

This is the probability that exactly three faces appear and that each such face appears exactly twice.

Part (b): Probability that some face appears at least three times

We want the probability that there is at least one face i for which the count of that face is 3 or more in six rolls. We split the calculation into two parts:

The probability that some face appears exactly three times.
The probability that some face appears exactly 4, 5, or 6 times.

We first define A_i as the event "face i appears exactly three times." We use the inclusion-exclusion principle to find P(A₁ ∪ A₂ ∪ ... ∪ A₆).

Probability that a specific face appears exactly three times

To have a specific face i appear exactly three times, we choose which 3 roll positions (out of 6) yield face i, and the remaining 3 positions must show any of the other 5 faces:

P(A_i) = (6 choose 3) * (5^3) / 6⁶.

Using inclusion-exclusion for events A₁, ..., A₆

Inclusion-exclusion for two events A_i and A_j is used to correct for double counting:

P(A_i ∪ A_j) = P(A_i) + P(A_j) – P(A_i ∩ A_j).

For our six events:

P(A₁ ∪ ... ∪ A₆) = ∑ P(A_i) – ∑ P(A_i ∩ A_j) + (higher-order terms).

However, the main contributions come from: ∑ P(A_i) over i = 6 * P(A₁) (by symmetry), ∑ P(A_i ∩ A_j) over i ≠ j.

It turns out that P(A_i ∩ A_j) = ( (6 choose 3)(3 choose 3) / 6⁶ ) or something that must be carefully computed: it’s the probability that face i appears exactly three times and face j appears exactly three times. But the final result from the snippet is:

P(A₁ ∪ ... ∪ A₆) = 6 × P(A₁) – (6 choose 2) × P(A₁ ∩ A₂) = 0.3151 (approximately).

Probability of having a face appear 4, 5, or 6 times

Next, we add the probability that some face shows up exactly 4 times, plus exactly 5 times, plus exactly 6 times. The probability that a specific face shows up k times is:

( (6 choose k) (5^(6 – k)) ) / 6⁶.

So the probability that “some face” shows up exactly k times is 6 times that (because there are 6 faces). Summing from k = 4 to 6, we get around 0.0522.

Adding these contributions together:

0.3151 (from at least one face appearing exactly three times) + 0.0522 (from at least one face appearing 4, 5, or 6 times) ≈ 0.3673.

Hence the probability is approximately 0.3672 or 0.3673 depending on rounding.

Potential Follow-Up Question 1: Why do we use the inclusion-exclusion principle in part (b)?

When we say "some outcome shows up exactly three times," that event can happen for any of the 6 faces. If we simply summed P(A₁), P(A₂), ..., P(A₆), we would double-count the scenarios in which two different faces both appear three times. Inclusion-exclusion corrects for that double counting by subtracting off P(A_i ∩ A_j) for i ≠ j.

In more detail, to find P(A₁ ∪ ... ∪ A₆):

P(A₁ ∪ ... ∪ A₆) = ∑ P(A_i) – ∑ P(A_i ∩ A_j) + ∑ P(A_i ∩ A_j ∩ A_k) – ...

Higher-order intersections (like three different faces each appearing exactly three times) have zero probability if all rolls are only six in total, so the series stops early. This is why inclusion-exclusion is the cleanest way to account for overlapping events.

Potential Follow-Up Question 2: Could we directly compute the probability for part (b) without inclusion-exclusion?

It is theoretically possible but typically more cumbersome. One could enumerate each possible pattern (e.g., exactly one face appears 3 times, a different face appears 4 times, etc.) but that quickly becomes complicated. Inclusion-exclusion offers a systematic approach to avoid overcounting or undercounting when dealing with multiple overlapping events.

Potential Follow-Up Question 3: How does this generalize to n rolls or dice with more faces?

For n rolls of a fair m-sided die, the event “some face appears at least k times” would again involve partitioning the n outcomes, and you would similarly use inclusion-exclusion if you needed the probability that at least one face meets some frequency threshold. The counting arguments would generalize as follows: choose which face(s) meet the threshold, count the number of ways the dice rolls can yield those thresholds, and carefully handle overlap among events with multiple faces meeting the threshold at once. As n or m grows, direct enumeration becomes more difficult, and inclusion-exclusion or generating functions can help handle the combinatorial complexity more systematically.

Potential Follow-Up Question 4: How can we implement a simulation to verify these probabilities?

One can write a simple Python script using random die rolls, run it many times, and estimate the probabilities by the empirical frequency of the events. For instance:

import random
import math

trials = 10_000_000
count_a = 0  # Count how many times part (a) event happens
count_b = 0  # Count how many times part (b) event happens

for _ in range(trials):
    rolls = [random.randint(1, 6) for __ in range(6)]
    face_counts = [rolls.count(i) for i in range(1, 7)]

    # Part (a): check if exactly three faces have count=2
    if face_counts.count(2) == 3:
        count_a += 1

    # Part (b): check if any face count >= 3
    if any(fc >= 3 for fc in face_counts):
        count_b += 1

prob_a = count_a / trials
prob_b = count_b / trials

print("Estimated Prob (Part a):", prob_a)
print("Estimated Prob (Part b):", prob_b)

This Monte Carlo approach will, with enough trials, approximate the analytical results 0.0386 and 0.3672, confirming the correctness of the derived probabilities.

Below are additional follow-up questions

Potential Follow-Up Question 1: How would the probability change if the die were biased (not fair)?

One subtle issue is that the entire reasoning for parts (a) and (b) assumes each face has a probability 1/6. If the die is biased, then the probability model 6^6 for equally likely outcomes is no longer valid. Instead, each die face i would have a probability p_i (with p₁ + ... + p₆ = 1), and the probability of a particular sequence of length 6 would be the product of the corresponding p_i values for each roll. This complicates both questions:

For part (a), the multinomial coefficient logic still applies for counting how many ways to assign rolls to faces, but each assignment has a different probability depending on p₁, ..., p₆. The expression would become a multinomial expansion:
Probability = ( (6 choose 2) (4 choose 2) (2 choose 2) ) × p_i1² p_i2² p_i3²
but we would sum over all ways of choosing which 3 faces appear exactly twice.
For part (b), the inclusion-exclusion principle can still be used, but P(A_i) = Probability that face i appears exactly 3 times is (6 choose 3) p_i³ (1 − p_i)³ only if we assume “any other face” is grouped together. In reality, we would need to carefully treat the events "face i appears k times" by considering all possible distributions of the remaining rolls among the other faces. So it’s much more complicated than in the fair case, because p_i can vary.

A big pitfall is to assume that the simple binomial or multinomial counts for the fair case still apply directly, which would yield incorrect answers if the die is loaded or if probabilities differ across faces.

Potential Follow-Up Question 2: What if we want the probability that no face appears at least three times?

Sometimes in problems like part (b), people might be interested in the complement event: “No face appears three or more times,” which translates to “each face appears at most twice.” The approach used in part (b) can be adapted:

Probability(no face appears at least 3 times) = 1 − Probability(some face appears at least 3 times).
From part (b), we have Probability(some face appears at least 3 times) ≈ 0.3672. So the probability that no face appears three or more times is roughly 1 − 0.3672 = 0.6328.

One subtlety is that this approach hinges on the correctness of the computed probability for part (b). If there were any mistake or if the die isn’t fair, the resulting complement would also be off. Another pitfall is that events like “no face appears at least three times” can be enumerated by considering patterns like “(2,2,1,1,0,0), (2,2,2,0,0,0), (2,1,1,1,1,0), etc.,” but enumerating all patterns directly can get complicated, especially if the number of rolls increases or if there are more faces.

Potential Follow-Up Question 3: Could we use generating functions or probability mass functions for this problem?

Yes, generating functions can be employed to handle the distribution of counts across multiple die faces:

For a single die face with probability 1/6, the probability generating function for that face in one roll is G_single(x) = 1/6 ⋅ x + 5/6 ⋅ 1 (because with probability 1/6 you “gain” one count of that face, and with probability 5/6 you gain none).
For six rolls, you take [G_single(x)]⁶, and then the coefficient of x^k in that expansion tells you the probability that this face appears k times. But to see how at least one face appears three or more times, you’d need to consider multiple faces simultaneously, or approach with multivariate generating functions. That method can be powerful but is not trivial to implement for multiple faces. One can also combine it with inclusion-exclusion or with expansions over all faces.

A subtlety is that the dimensionality of the generating function grows quickly if we track all faces simultaneously. A typical pitfall might be to oversimplify by ignoring the joint distribution across faces and only focus on one face at a time, which can lead to undercounting or overcounting when multiple faces have significant frequencies.

Potential Follow-Up Question 4: How reliable is the result with a finite number of simulation runs?

While a simulation can verify our probabilities, there are practical pitfalls:

Randomness and Variance: With fewer simulation trials, the estimates for low-probability events (like a face appearing exactly 6 times) can have high variance. You might need millions of runs to get a stable estimate within a small margin of error.
Confidence Intervals: If we want to be sure the estimated probability is within, say, ±0.001 of the true probability with high confidence, we have to compute confidence intervals (for instance, using a normal approximation or a Clopper-Pearson approach). Failing to account for these intervals can lead to misinterpretation: we might erroneously conclude that our simulation-based estimate conflicts with a correct theoretical result just because we do not have enough trials.
Random Seeds and Implementation Errors: It is easy to code a simulation incorrectly, perhaps by mixing up indexing or miscounting outcomes. Such bugs can distort the results in subtle ways. Always verifying with known smaller problems or using partial enumerations for checks helps mitigate this pitfall.

Potential Follow-Up Question 5: What if we only roll the die four or five times instead of six? How does the logic adapt?

The same general methods apply, but with fewer total rolls:

Part (a) would no longer make sense as stated (it asks for three faces each appearing exactly twice, which is impossible with fewer than six rolls). However, you might have a question like: “Probability that exactly two faces appear at least once, or each appears exactly k times,” etc. The counting approach and dividing by total outcomes 6ⁿ would still be valid.
For the “at least three times” event, if we roll four or five times, the probability of a face appearing three or more times could be computed either by direct enumeration or using smaller binomial/multinomial calculations. The only real pitfall is forgetting to re-check which outcomes are possible: for instance, if n=4, you can’t have a face appear five times.
A subtle real-world edge case arises if your number of rolls is so small that certain events become impossible, or if you artificially keep rolling until you see a certain event. That can bias your sampling if you inadvertently condition on partial outcomes.

Potential Follow-Up Question 6: What if we are interested in the distribution of the maximum count rather than just “at least three times”?

Sometimes, the question might not just be “some face appears at least three times,” but rather “what is the probability the maximum count across all faces is exactly x?” You can derive this from:

P(Max count = x) = P(Max count ≤ x) − P(Max count ≤ x−1).

To find P(Max count ≤ x) you might use the complementary approach and consider P(Max count > x), or apply direct enumeration if x is relatively small. One subtlety is that “at least three times” is effectively “Max count ≥ 3.” If we want the exact distribution of the maximum count, we can break it down over x = 1, 2, 3, 4, 5, 6. Each is computed by enumerating possible ways to assign counts. A pitfall is to forget that these events (max count = 3, 4, 5, 6) can overlap in complicated ways if you are not systematic, so carefully enumerating or using inclusion-exclusion is key.

Potential Follow-Up Question 7: How would this analysis extend if we rolled the die N times, with N significantly larger than 6?

When N is large, we frequently rely on limit theorems:

By the law of large numbers, the fraction of times each face appears is close to 1/6. So the event that some face appears more than (1/6 + ε)N times might become increasingly unlikely (by the Chernoff bound or similar large-deviation inequalities).
However, for moderate frequencies like “at least 3 times,” when N is large, that becomes extremely likely. In fact, for N much bigger than 6, having no face appear 3 times is close to impossible. One subtlety is that as N grows, “some face appears at least k times” might turn into a near certainty for relatively small k, but a more interesting question might be “some face appears at least αN times.”
A pitfall is to treat large N the same way as small N and try enumerating events as we did for N=6. That approach becomes intractable quickly. Instead, using binomial approximations, Poisson approximations, or normal approximations for the count distribution of each face can be more practical.

Potential Follow-Up Question 8: In real-world experiments, how might data contamination or partial observations affect these probabilities?

In practical scenarios (e.g., rolling a physical die under experimental conditions), data might be missing or partially observed:

Missing Rolls: If some rolls were not recorded, you have an incomplete dataset. That complicates the event definitions. For example, you might only know 4 out of 6 outcomes. The probability that a certain face appeared three times in total might be updated using conditional probabilities or Bayesian inference if partial data is observed.
Human or Mechanical Error: Suppose we suspect that when the same face appears on consecutive rolls, we might record data incorrectly due to glare or distractions. This introduces biases. One must correct or model these biases, or the computed probability will differ from the standard theoretical one.
Pitfalls: Without acknowledging these real-world effects, we might assume that the clean theoretical model applies directly, but the empirical frequencies could deviate. Being aware of potential systematic errors helps avoid incorrect conclusions.

Potential Follow-Up Question 9: What if we cared about the probability that exactly two faces each appear at least three times?

This is a different event than “some face appears at least three times.” Specifically, we want the probability that there exist exactly two distinct faces that each show up at least three times in the six rolls, while the other faces show up strictly fewer than three times:

Potential enumeration route: You can define events B_i,j = “face i and face j each appear at least three times.” Then you would want P(B_i,j) over all distinct i,j. But there could be overlap events with more faces also appearing three times. You would adapt the inclusion-exclusion principle to handle triple overlaps or more. In fact, with 6 rolls, you cannot have three distinct faces each appear 3 times, so that helps limit the complexity.
Pitfall: Sometimes people treat these events as though they are disjoint (they’re not). The event that face 1 and face 2 each appear at least three times can indeed overlap with the event that face 1 and face 3 each appear at least three times, if the rolls allow face 1 to appear three times and face 2, face 3 each appear three times. However, with six total rolls, you quickly see that certain overlaps are impossible: three or more distinct faces all having at least three counts cannot occur in only six rolls. Still, being systematic with overlaps is crucial to get the correct probability.
Real-World Edge: This might model the scenario, for example, of wanting to see if exactly two machines in a production line produce a certain number of defective parts. The combinatorial logic is analogous, but many might overlook constraints or misapply binomial assumptions when the sample size is constrained to six.

Potential Follow-Up Question 10: How do we handle the possibility of ties in the number of occurrences?

When analyzing events like “some face appears at least three times,” it doesn’t matter if more than one face also crosses that threshold. However, for more nuanced questions like “which face has the highest number of occurrences?” or “does exactly one face have more occurrences than every other face?,” you can get ties:

Ties complicate direct counting if you want “a single face that strictly outnumbers the others.” For example, you can’t have two faces each appearing three times if you want a unique winner. The solution again depends on systematically enumerating the ways the six rolls can be distributed among the faces.
Pitfall: A naive approach might just take the most frequent face in each simulated outcome and count how often that face is strictly above the others, forgetting that multiple faces might tie for that count. This can yield an overestimate or underestimate of the probability, depending on how you handle ties.
Real-World Example: In certain gambling or gaming scenarios, you might track which symbol (face) appears the most across a small number of draws. Ties can reduce payouts or lead to re-rolls. An accurate model of tie probabilities can help design game rules or identify expected payouts.

Rohan's Bytes

Discussion about this post