ML Interview Q Series: Sum of Binomials is Binomial, Conditional Distribution is Hypergeometric.

May 08, 2025

Browse all the Probability Interview Questions here.

Question: Let X and Y be independent random variables, where X is binomial with parameters n and p, and Y is binomial with parameters m and p. (a) Explain, in terms of Bernoulli experiments, why X+Y is binomial with parameters n+m and p, and then give a formal proof. (b) Show that for a fixed k, the probabilities P(X = j | X + Y = k) for j = 0,…,k form a hypergeometric distribution.

Short Compact solution

One way to see that X + Y follows a binomial distribution with parameters n + m and p is to imagine a total of n + m independent Bernoulli trials, each with probability p of success. The random variable X counts the number of successes in the first n trials, and Y counts the number of successes in the remaining m trials. By definition, X + Y then counts the total number of successes across all n + m trials, so X + Y is binomial(n + m, p).

Formally, because X and Y are independent, the probability P(X + Y = k) is the sum of P(X = r) P(Y = k − r) over r = 0,…,k. That summation and the binomial expansions simplify to the binomial(n + m, p). For the conditional distribution, P(X = j | X + Y = k) can be written as the ratio of P(X = j, Y = k − j) to P(X + Y = k). After simplification, those probabilities match a hypergeometric distribution with parameters n, m, and k.

Comprehensive Explanation

Intuitive Reasoning with Bernoulli Trials

We can think of a total of n + m independent Bernoulli trials, each having success probability p. The variable X counts how many successes occur in the first n trials, while Y counts how many successes occur in the last m trials. Because each trial is independent and has the same probability p of success, if we sum up successes over all n + m trials, that total X + Y must be binomial with parameters n + m and p.

Formal Proof That X + Y Is Binomial(n + m, p)

Because X is binomial with parameters n and p, it follows that for any integer r:

P(X = r) = (n choose r) p^r (1 − p)^(n − r)

Similarly, Y is binomial with parameters m and p, so for integer s:

P(Y = s) = (m choose s) p^s (1 − p)^(m − s)

Given X and Y are independent, the joint probability P(X = r, Y = s) is the product:

P(X = r, Y = s) = P(X = r) P(Y = s)

Hence, if we want P(X + Y = k), we sum over all ways X = r and Y = k − r can occur:

By substituting the binomial PMFs for X and Y, we get:

P(X + Y = k) = sum_{r=0..k} [ (n choose r) p^r (1 − p)^(n − r) * (m choose (k − r)) p^(k − r) (1 − p)^(m − (k − r)) ]

Combine powers of p and (1 − p):

= sum_{r=0..k} [ (n choose r) (m choose (k − r)) ] p^k (1 − p)^(n + m − k)

It can be shown through a known combinatorial identity that

sum_{r=0..k} (n choose r) (m choose (k − r)) = (n + m choose k)

Therefore,

Hence, X + Y follows a binomial distribution with parameters n + m and p.

Conditional Distribution Is Hypergeometric

We now look at P(X = j | X + Y = k). By definition,

P(X = j | X + Y = k) = P(X = j, Y = k − j) / P(X + Y = k).

Because X and Y are independent:

P(X = j, Y = k − j) = P(X = j) P(Y = k − j).

Thus:

P(X = j, Y = k − j) = (n choose j) p^j (1 − p)^(n − j) (m choose (k − j)) p^(k − j) (1 − p)^(m − (k − j)).

Meanwhile, we have already shown:

P(X + Y = k) = (n + m choose k) p^k (1 − p)^(n + m − k).

Hence,

$$ P(X = j \mid X + Y = k)

;=;\frac{(n \choose j),(m \choose k-j),p^k,(1 - p)^{,n+m-k}}{(n+m \choose k),p^k,(1 - p)^{,n+m-k}}. $$

Canceling out the common p^k and (1 − p)^(n + m − k), we get:

P(X = j | X + Y = k) = (n choose j) (m choose (k − j)) / (n + m choose k).

This is precisely the hypergeometric distribution with parameters n, m, and k. In other words, given the total number of successes k across n + m trials, the probability that j of those successes occurred in the first n trials matches the hypergeometric probability formula.

Follow-up Question 1: Interpretation via Sampling Without Replacement

An interviewer might ask how the hypergeometric distribution arises from “sampling without replacement.” We can see the binomial setting as sampling with replacement (each trial is independent). However, once we condition on X + Y = k total successes, we are effectively selecting which j successes are among the first n. This is analogous to sampling j successes out of k from the first n, while the remaining k − j are in the last m, much like a hypergeometric draw from an urn of k “success tickets” within a total of n + m tickets. Thus, the conditional distribution P(X = j | X + Y = k) appears as a hypergeometric.

Follow-up Question 2: Boundary and Edge Cases

Interviewers might also test understanding by asking about boundary cases:

When p = 0, both X and Y are almost surely 0, so X + Y is 0. This is consistent with binomial parameters n + m and p = 0.

When p = 1, both X and Y are almost surely n and m, respectively, so X + Y = n + m. This again matches the binomial(n + m, 1) distribution.

When k = 0, then X + Y = 0 implies X = 0 and Y = 0 with probability 1. Similarly, when k = n + m, that means all trials succeeded. These edge outcomes also align with standard binomial properties.

Follow-up Question 3: Practical Implementation with Python

An interviewer may ask how to simulate or compute these probabilities in code. For instance, using Python:

import math
from math import comb
import random

def binomial_pmf(k, n, p):
    return comb(n, k) * (p**k) * ((1-p)**(n-k))

def hypergeom_pmf(j, n, m, k):
    # j successes in n draws, out of total k successes in n+m
    return comb(n, j)*comb(m, k-j) / comb(n+m, k)

# Simulate X ~ Binomial(n, p), Y ~ Binomial(m, p)
n, m, p = 5, 7, 0.3
num_samples = 10_000_00

count_XplusY = [0]*(n+m+1)
for _ in range(num_samples):
    # Generate X and Y
    X = sum(random.random() < p for _ in range(n))
    Y = sum(random.random() < p for _ in range(m))
    count_XplusY[X + Y] += 1

# Empirical distribution of X+Y
empirical_probs = [cnt / num_samples for cnt in count_XplusY]

# Compare with theoretical binomial(n+m, p)
theoretical_probs = [binomial_pmf(k, n+m, p) for k in range(n+m+1)]

This code illustrates generating independent binomial samples for X and Y, and then checking how often their sum equals each integer k from 0 to n + m, comparing with the binomial(n + m, p) formula for verification.

Follow-up Question 4: Typical Pitfalls

An interviewer may probe whether a candidate knows common pitfalls. One mistake might be to assume X + Y could be something other than binomial just because X and Y are different binomial distributions (with same p but different n and m). Another subtlety is forgetting the independence condition. If X and Y were not independent, the result for X + Y might not remain binomial. Finally, not recognizing the hypergeometric structure when conditioning on X + Y = k is a common oversight.

All these nuances help emphasize the importance of understanding both combinatorial identities and independence in probability theory.

Below are additional follow-up questions

Follow-up Question 1: Behavior When n or m Is Random

Suppose instead of having fixed integers n and m, we let them be random variables themselves (still independent of each other and of any Bernoulli trials). If n and m are random (with some known distribution on the nonnegative integers) and conditional on n and m we define X ~ Binomial(n, p) and Y ~ Binomial(m, p), how would we characterize the distribution of X+Y?