ML Interview Q Series: Sum of Two Dice: Probability Distribution and Expected Value Calculation
Browse all the Probability Interview Questions here.
Question
For two standard dice, all 36 outcomes of a throw are equally likely. Find P(X1 + X2 = j) for all j and calculate E(X1 + X2). Confirm that E(X1) + E(X2) = E(X1 + X2).
Short Compact solution
The possible sums are j = 2, 3, …, 12. For j = 2,…,7, P(X1 + X2 = j) = (j − 1)/36, and for j = 8,…,12, P(X1 + X2 = j) = (13 − j)/36.
Each die has expectation E(Xi) = 7/2. Therefore, E(X1) + E(X2) = 7. Directly summing over all possible totals also shows that E(X1 + X2) = 7, confirming the result.
Comprehensive Explanation
Distribution of the sum of two fair dice
Each die (X1 and X2) can take values 1 through 6, each with probability 1/6. When rolling two dice, there are 6 × 6 = 36 total equally likely outcomes for the pair (X1, X2). The sum S = X1 + X2 can range from 2 to 12.
To see why the probability function looks like (j−1)/36 for j in 2,…,7 and (13−j)/36 for j in 8,…,12, note that:
For j = 2, only the pair (1,1) works, so there is 1 favorable outcome → P(S=2) = 1/36.
For j = 3, there are pairs (1,2) and (2,1), so 2 favorable outcomes → P(S=3) = 2/36.
This pattern continues up to j = 7, where there are 6 favorable outcomes (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) → P(S=7) = 6/36.
After j = 7, the pattern reverses, because for j = 8 there are 5 favorable outcomes, for j = 9 there are 4, and so forth, down to j = 12 which has 1 favorable outcome.
In a concise formula:
Here, j is the total sum (an integer) taking values from 2 to 12.
Expected value of each die
A single fair six-sided die has outcomes {1, 2, 3, 4, 5, 6}, each with probability 1/6. The expectation E(Xi) for one die Xi is:
E(Xi) = (1 + 2 + 3 + 4 + 5 + 6) / 6 = 21 / 6 = 3.5
Hence, E(X1) = 3.5 and E(X2) = 3.5.
Expected value of the sum
By linearity of expectation:
E(X1 + X2) = E(X1) + E(X2) = 3.5 + 3.5 = 7
By directly summing over the distribution of j:
If you substitute the probabilities above and multiply them by j, you obtain:
(2 × 1 + 3 × 2 + 4 × 3 + 5 × 4 + 6 × 5 + 7 × 6 + 8 × 5 + 9 × 4 + 10 × 3 + 11 × 2 + 12 × 1) / 36 = 252 / 36 = 7
So either way, E(X1 + X2) = 7, which is consistent with E(X1) + E(X2).
Python enumeration example
Below is a small Python code snippet illustrating how to compute these probabilities and the expected value by enumerating all pairs (x1, x2):
from collections import Counter # All possible outcomes for two dice outcomes = [(x1, x2) for x1 in range(1, 7) for x2 in range(1, 7)] # Count frequencies of sums sum_counts = Counter(x1 + x2 for x1, x2 in outcomes) # Probability distribution prob_sum = {s: count/36 for s, count in sum_counts.items()} # Expected value of the sum expected_sum = sum(s * prob_sum[s] for s in prob_sum) print(f"Distribution: {prob_sum}") print(f"Expected sum: {expected_sum}")
If you run this code, you will see that
prob_sum
exactly matches the theoretical distribution andexpected_sum
prints out 7.0.
Potential Follow-Up Questions
How can we generalize the probability of sums to more than two dice?
For n fair six-sided dice, the total number of outcomes is 6^n. One can count the number of ways each possible sum can be formed by enumerating all n-tuples or by using combinatorial arguments (e.g., counting how many solutions exist to X1 + X2 + … + Xn = k for k from n to 6n). Another approach involves the use of generating functions:
The generating function for one six-sided die is (x + x^2 + x^3 + x^4 + x^5 + x^6)/6. For n dice, we take this expression to the nth power. The coefficient in front of x^k in that expanded polynomial then gives the probability of obtaining a sum k.
Why does E(X1 + X2) = E(X1) + E(X2) even if the dice are not independent?
Actually, the simpler statement is that the linearity of expectation does not require independence:
E(X1 + X2) = E(X1) + E(X2)
This holds for any random variables X1 and X2, whether independent or not. However, to derive the specific distribution P(X1 + X2 = j) easily, we typically rely on independence for factorizing probabilities. If dice are not fair (or not independent), you can still compute E(X1 + X2) = E(X1) + E(X2) by linearity, but P(X1 + X2 = j) would need additional knowledge of how the dice outcomes are correlated or how their probabilities are weighted.
What if the dice are “loaded”? How does that affect the calculation?
If each die is biased so that the probability of rolling a particular face is not 1/6, the general approach still applies, but the probability distribution for each X1 and X2 would be different. Specifically, if the first die has probabilities p1(1), p1(2), …, p1(6), and the second has p2(1), p2(2), …, p2(6), then we compute:
P(X1 + X2 = j) by summing p1(a) × p2(b) over all pairs (a, b) such that a + b = j.
E(X1) by summing a × p1(a) over a = 1..6, and similarly E(X2). Then E(X1 + X2) = E(X1) + E(X2) by linearity of expectation. The exact distribution of the sum simply changes due to the new probabilities.
Could we use these methods in real-world scenarios?
Absolutely. In real applications (e.g., probability models in finance or operations research), we often deal with sums of random variables. The principle that E(X1 + X2 + … + Xn) = E(X1) + E(X2) + … + E(Xn) is universally useful. If the random variables are independent, the probability distribution for their sum can sometimes be derived more simply (via convolution or generating functions). If not, more careful approaches or additional data about correlation structures must be used.
These questions highlight the importance of understanding probability distributions, the linearity of expectation, and how to compute probabilities for sums of random variables—essential skills for data science and machine learning interviews.