ML Interview Q Series: Modular Arithmetic Explains Why Ten Dice Sum Divisible by 6 Probability is 1/6

May 19, 2025

Browse all the Probability Interview Questions here.

Suppose you roll ten fair dice at once. What is the probability that the sum of their top faces ends up divisible by 6?

Short Compact solution

Consider the sum of the first nine dice. This sum must be in one of the following residue classes modulo 6: 0, 1, 2, 3, 4, or 5. No matter which residue the sum of these nine dice falls into, there is exactly one face on the tenth die that will make the overall total (including that tenth die) congruent to 0 modulo 6. Therefore, the chance is always 1 out of 6. As a result, the final probability that the total on all ten dice is divisible by 6 is simply 1/6.

Comprehensive Explanation

Core Idea and Modular Arithmetic

When we talk about the probability that the sum of dice rolls is divisible by 6, a very useful tool is modular arithmetic. Specifically, we look at sums modulo 6. Whenever we roll several dice, the total sum can be described by its remainder when divided by 6 (i.e., which residue class the sum belongs to).

Generalization to Any Number of Dice

Interestingly, the same reasoning extends to any number of dice. If you rolled nn dice, consider the sum of the first (n−1) dice. That sum is again in {0,1,2,3,4,5} modulo 6, and exactly one face of the n-th die can correct that sum to make the total divisible by 6. Thus, you can generalize to n fair dice and see that the probability remains 1/6 for the sum being 0 modulo 6.

Illustrative Example

Imagine you only had two dice. The sum of the first die (call it X) can be 1, 2, 3, 4, 5, or 6. That means Xmod 6 can be any of 0, 1, 2, 3, 4, or 5 (though specifically for a single die, Xmod 6 actually can never be 0 because the faces are 1 through 6, but let’s keep the logic consistent for multiple dice). For each possible modulo result of the first die, there is exactly one face for the second die that yields a sum divisible by 6. This perfectly matches the same principle.

A Small Python Simulation

Below is a brief Python snippet to simulate the rolling of 10 dice many times and estimate the probability that the sum is divisible by 6. This provides empirical verification that the probability approaches 1/6.

import random

def roll_dice(num_dice=10, num_simulations=10_000_00):
    count_divisible_by_6 = 0
    for _ in range(num_simulations):
        total = sum(random.randint(1, 6) for _ in range(num_dice))
        if total % 6 == 0:
            count_divisible_by_6 += 1
    return count_divisible_by_6 / num_simulations

estimated_probability = roll_dice()
print(f"Estimated Probability: {estimated_probability:.5f}")

If you run this code, the output should be very close to 0.16666…, which is 1/61/6. The more simulations you run, the closer it should get to 1/6.

Follow-up question: What if some dice are not fair (i.e., loaded dice)?

If the dice are not fair, the outcome space changes because the probability distribution for each face is no longer uniform. The approach of “one face out of six will correct the sum to 0 modulo 6” does not hold in the same straightforward way because each face does not have the same probability. You could still use modular arithmetic to reason about possible sums, but you would need to account for each loaded die’s probabilities for faces 1 through 6. The probability that X≡r(mod6) would become a more complicated function of those probabilities. In a loaded scenario, there is not necessarily an equally likely chance of “the needed face” appearing. As a result, the probability of the total being divisible by 6 would, in general, deviate from 1/6.

Follow-up question: What if the number of dice changes?

The same fundamental reasoning applies for any number of fair six-sided dice. Suppose you have n dice. You look at the sum of the first (n−1) dice and find its value modulo 6. There will be exactly one face on the n-th die that forces the entire sum to be divisible by 6. Hence, when each die is fair, the probability remains 1/6 for any n. In short, if you vary n, as long as the dice are fair, you still end up with a 1/6 chance for the total to be divisible by 6.

Follow-up question: Why does the same logic not trivially apply to other moduli?

If you decide to check divisibility by a different number, say 7, the question becomes: “What is the probability that the sum is divisible by 7?” For 7, you’d notice that the same “one face out of six” logic may not hold if you keep the number of dice fixed. For example, if you have only one die, the sum is never divisible by 7 unless you had 7-sided dice. For multiple 6-sided dice tested against modulo 7, the pattern might be more complicated because the face values (1 to 6) do not necessarily cover every residue class in a way that ensures exactly one out of six faces corrects the sum. Detailed enumeration or generating function methods would be needed. Therefore, while the 1/6 argument works perfectly for modulo 6 with standard six-sided dice, the situation is different when you change the modulus or the number of faces on the dice.

Follow-up question: Could the distribution of sums still be approximately uniform for large numbers of dice?

When rolling many fair dice and considering the sum modulo 6, one might wonder if the distribution of sums is approximately uniform. Generally, for large numbers of dice, the sums approximate a normal distribution in absolute values. However, for modulo arithmetic, every residue class (0 through 5) ends up nearly equally likely when the dice are fair. Indeed, the core reason is that each die can change the residue class by 1, 2, 3, 4, 5, or 6, and those changes “mix” the distribution enough to approach uniformity across the six residues, leading to the 1/6 result. This argument is closely related to the concept of cyclic groups and random walks in modular spaces.

Thus, for fair dice, no matter how many you roll, the probability that the total is a multiple of 6 remains 1/6.

Below are additional follow-up questions

What if the dice are of different types (e.g., some 6-sided, some 8-sided) and we want to know the probability that the total is divisible by 6?

When mixing dice of different shapes, each die has its own set of possible face values and respective probabilities. For example, if you have a combination of standard 6-sided dice and 8-sided dice, you will need to consider all possible sums across these diverse dice. The key difference from the standard scenario is that you can no longer argue that “exactly one face out of six always fixes the sum to be divisible by 6” because an 8-sided die has faces {1, 2, 3, 4, 5, 6, 7, 8}. In this situation:

Determine Each Die’s Contribution to Modulo 6 Every die (6-sided or 8-sided) modifies the running total’s remainder modulo 6 in a unique way. For a 6-sided fair die, each face changes the residue class by 1 through 6, while for an 8-sided die, possible face values 1 through 8 shift the residue by 1 through 8 (modulo 6).
Calculate the Combined Residue Probabilities You would need to compute the probability distribution over residue classes for one die of each type, then convolve or combine these probability distributions for all dice. Each additional die changes the distribution of sums modulo 6 in a calculable but less trivial way.
Exact vs. Approximate Methods
- Exact: You can use generating functions or dynamic programming to compute precise probabilities for each residue class modulo 6.
- Approximate: For large numbers of dice, central-limit-like behavior can sometimes make the distribution of sums across different modulo classes trend toward uniform, but the presence of 8-sided or other dice means you need to be careful. It is not always perfectly uniform but might still be close, depending on how many dice of each type you have.
Potential Pitfalls
- Overlooking that 8-sided dice can yield faces 7 or 8, which may influence the modulo sums in ways that break the simple “1 out of 6” argument.
- Assuming the final distribution of sums is uniform when the mix of different dice is small. An actual calculation (or a simulation) is often needed to confirm the probability.

In short, with different-sided dice, the probability that the sum is divisible by 6 is no longer guaranteed to be 1/6. A detailed probabilistic or enumerative analysis is required to find the exact value.

What if the dice have some dependency or correlation between their outcomes?

In standard probability settings for dice, we assume each die roll is independent of all others. If, however, there is a correlation or constraint among dice—perhaps a physical constraint or a tricky puzzle scenario—then the usual multiplication of probabilities is invalid. Examples might include dice attached by a mechanism that forces their faces to move in tandem, or a special rule in a board game that re-rolls certain dice based on the outcome of others.

Impact on the Modular Sum When dice outcomes are correlated, the distribution of the total sum changes. You cannot simply treat each die as an independent contributor to the sum’s residue class modulo 6.
Complex Computations To find the probability of a divisible-by-6 sum, you would need to model or measure how one die’s face correlates with the next. In such a scenario, enumerating all possible outcomes with their joint probabilities may be required.
Edge Cases
- Perfect Negative Correlation: If, for some reason, one die always “compensates” for another, the total might deviate significantly from the usual uniform distribution among modulo classes.
- Partial Correlation: Even mild correlations can skew the distribution somewhat.
Practical Real-World Issues Real dice might have minuscule correlations if they bounce or if a non-random rolling mechanism is used. Typically, those correlations are small enough that the 1/6 conclusion remains an excellent approximation, but in strictly controlled or contrived scenarios, the standard independence assumptions can fail.

What if we keep rolling dice until a certain condition is met, and then check if the sum is divisible by 6?

In some games, you might keep rolling dice until you reach a target or a stopping criterion, and then look at the resulting sum modulo 6. This changes the distribution of final sums dramatically.
1. Changing the Stopping Rule If you stop rolling as soon as the total sum exceeds a certain threshold (for example, greater than 50), the final sum is more likely to cluster just above that threshold. Hence, the sum is not uniformly distributed among large values, and the probability of it being divisible by 6 could differ from 1/6.
  Analysis Approach
  - You need to condition on the event that you stopped rolling at a specific time.
  - The sums that appear at the moment you stop are not equally likely to be any residue class modulo 6, because some sums might be reached more easily or are more frequent as you approach the stopping condition.
2. Example Suppose you roll dice and stop the very moment your cumulative total is at least 20. The distribution for the final sum is heavily skewed toward 20 through about 25, depending on how quickly you reach 20. You would compute the probability that each of those possible final sums is divisible by 6, weighted by the likelihood of stopping at that sum.
3. Real-World Implications In many board games, a turn might end once you pass a certain tile or achieve a certain minimum sum. This modifies the probability distribution of final outcomes.
Therefore, the neat 1/6 result holds only for a fixed number of dice, each rolled exactly once with no additional conditions.

How does one leverage generating functions to solve this exactly for 10 dice?

Generating functions are a powerful tool to analyze the probability distribution of dice sums.
1. Constructing the Generating Function For a single fair, 6-sided die, the generating function (in variable x) capturing the possible sums is:

1. Extending to 10 Dice For 10 such dice, the generating function is the 10th power of G(x):

1. Subtlety
  - Human Error in Expansion: Manually expanding such a polynomial is prone to errors. In practice, a computer algebra system is used.
  - Deep Understanding: Generating functions illustrate a general approach that also works for “what if some dice have different faces” or if you want to find the probability the sum is divisible by some other integer.

Could the result be different if the faces of the dice are relabeled but still show six distinct values?

Imagine you take a standard die and relabel its faces with numbers that are distinct but not 1 to 6 in the usual order. For instance, you might label them {0, 2, 3, 5, 7, 11} or any set of six unique integers.
1. Key Observation Regardless of the numbering scheme, if each label is equally likely, you still have an equal probability of landing on any of those six labels. The question is: does this preserve the same “1 out of 6 times the sum is divisible by 6” logic?

If it covers all 6 residues once each: Then effectively you haven’t changed the modular behavior, and the probability remains 1/6.
If it does not cover all 6 residues: Then you may not have the same balancing effect. For instance, if your labels modulo 6 are {0, 0, 1, 2, 3, 4}, the repeated 0 might skew outcomes.
1. Practical Example Suppose the die faces are {0, 1, 2, 3, 4, 5} in place of {1, 2, 3, 4, 5, 6}. Modulo 6, this is {0, 1, 2, 3, 4, 5}, i.e., each residue exactly once. So each roll changes the sum’s residue class in a uniform way, giving the same 1/6 result. But if your labeling leads to an uneven coverage of residues, you lose that uniformity.
  Hence, for any re-labeling that yields a perfect one-to-one coverage of the residues modulo 6, the probability remains 1/6. Otherwise, it may deviate.
  If we consider extremely large numbers of dice, could slight imperfections in the dice cause a measurable deviation?
  When rolling, say, millions of dice, you might suspect that even tiny manufacturing biases or asymmetries could accumulate.
  1. Statistical Law of Large Numbers If each die is truly fair, the 1/6 conclusion holds. If each die has a small bias (e.g., face 6 is slightly heavier), the probability distribution for the total can gradually shift away from the perfect uniform distribution across the six residues.
  2. Deviation Scale
    - Small but Non-Zero Bias: Over many trials, small biases can manifest in statistically significant ways. The sum might land more often in certain residue classes.
    - Detecting Bias: You would need a large enough sample of rolls. In real-world casino settings, dice are replaced frequently to minimize such issues.
  3. Realistic Outcome In normal everyday rolling conditions with widely accepted manufacturing tolerances, the deviation from 1/6 is negligible. Only in extremely high-precision or large-scale data collection might you see a difference.
  4. Pitfall
    - If you assume fairness without testing, you might incorrectly conclude 1/6. In a high-stakes environment (e.g., casinos), rigorous testing or frequent dice replacement ensures fairness is maintained.
  Thus, while the 1/6 result is theoretically exact for ideal fair dice, real-world production or usage quirks can introduce small deviations that become detectable with sufficiently large samples.

Rohan's Bytes

If we consider extremely large numbers of dice, could slight imperfections in the dice cause a measurable deviation?

Discussion about this post