ML Interview Q Series: Expected Value for All Die Faces: The Coupon Collector Problem Explained.

May 24, 2025

Browse all the Probability Interview Questions here.

How many throws of a fair six-sided die should you expect on average so that all six faces will have appeared at least once?

Short Compact solution

Let the current count of unique faces observed be denoted by k (ranging from 0 to 5). Whenever k distinct faces have already shown up, the probability of rolling a new face on the next throw is (6−k)/6. The expected number of additional throws needed to see the next new face follows a geometric distribution with success probability (6−k)/6. Hence, its mean waiting time is 6/(6−k). Summing these waiting times over k from 0 to 5:

That total is approximately 14.7 throws.

Comprehensive Explanation

The problem of finding the expected number of rolls to see each face of a fair six-sided die at least once is a classic form of what is commonly called the Coupon Collector’s Problem. In general, if you have n distinct “items” (in this case n=6 sides of the die), the expected number of independent trials to collect all n items once is

For a six-sided die, we let n=6. The reasoning can be broken down into the stages of collecting new faces:

Initially, no faces are observed (k = 0). The first roll immediately shows a new face (k = 1). Once you have k distinct faces, the probability of seeing a new face on the next roll is (6−k)/6. Because each roll is independent, you can model the time to get from k distinct faces to k+1 distinct faces as a geometric random variable with parameter (6−k)/6. The mean of that geometric distribution is 6/(6−k). You then add these mean waiting times from k = 0 up to k = 5.

Concretely:

Going from 0 to 1 distinct face: you always succeed on the first roll, so the expected number of rolls is 6/6=1.
Going from 1 to 2 distinct faces: success probability is 5/6, so the expected waiting is 6/5.
Going from 2 to 3 distinct faces: success probability is 4/6, so the expected waiting is 6/4.
Going from 3 to 4 distinct faces: success probability is 3/6, so the expected waiting is 6/3.
Going from 4 to 5 distinct faces: success probability is 2/6, so the expected waiting is 6/2.
Going from 5 to 6 distinct faces: success probability is 1/6, so the expected waiting is 6/1.

Summing these gives a well-known harmonic structure:

Hence, on average, it takes roughly 14.7 throws of a fair six-sided die to see all six faces at least once.

Follow-up question: Why is each stage modeled with a geometric distribution?

This arises from the memoryless property of independent die rolls. When you already have k distinct faces, the event of rolling a “new” face on the next roll has a fixed probability of (6−k)/6. Each roll is an independent trial with that fixed probability of “success” (i.e., seeing a new face). For a geometric distribution, the number of trials until the first success has an expected value of 1/p, where p is the success probability per trial. That is the key to each stage in the summation.

Follow-up question: What if the die is biased?

In the biased scenario, each face could have a different probability of appearing. Then the probability of seeing a new face at each stage is no longer simply (6−k)/6. Instead, you would need to account for whichever faces have not yet been observed and sum up their probabilities. The waiting time calculation becomes more involved, since the next “new” face might not arrive with a single uniform probability. In principle, you could still treat it as a “collect all distinct items” problem, but the expected time would require solving a more general form of the Coupon Collector’s Problem with non-uniform probabilities. One can use techniques like Markov chains to derive the expected time, but there is no simple closed-form expression analogous to the uniform case.

Follow-up question: What about the distribution of the number of rolls, not just the mean?

Although we commonly focus on the expected value, the full distribution is also of interest. For the uniform (fair) case, one can derive exact or approximate probabilities for the number of throws needed. This involves convolving the geometric distributions at each stage or can be analyzed via a Markov chain that tracks how many faces have been collected. It can be shown that, while the mean is about 14.7, there is a certain variance that allows for times both much shorter and much longer than that expected value. Often, people employ either direct enumeration methods or use a Markov chain approach to find the entire probability mass function.

Follow-up question: How can we verify these results empirically?

We can run a simulation in Python that repeatedly rolls a fair six-sided die until all faces have been seen, and then record how many rolls it took. Averaging over many trials gives a Monte Carlo approximation to the expected number of rolls. A simple code snippet might look like this:

import random

def simulate_dice_rolls(num_experiments=1_000_00):
    total_rolls = 0
    for _ in range(num_experiments):
        seen_faces = set()
        roll_count = 0
        while len(seen_faces) < 6:
            face = random.randint(1, 6)
            seen_faces.add(face)
            roll_count += 1
        total_rolls += roll_count
    return total_rolls / num_experiments

print(simulate_dice_rolls(1_000_00))

If you run this enough times (for example, one million experiments), you should see the average number of rolls converge to a value near 14.7.

Follow-up question: Does this approach generalize if we have more than six sides?

Below are additional follow-up questions

What if we are only interested in collecting a subset of the faces, not all six?

In some variations of this problem, one might only want to see three specific faces (e.g., faces 1, 3, and 5) rather than all six. This changes the success probability at each stage:

Only the faces in the desired subset are relevant.
Once you have k of the needed faces, there remain (subset_size - k) faces left to see.
The probability of rolling a new face from that subset depends on how many are still unseen among the subset, divided by 6.