ML Interview Q Series: Discrete Variables: Joint Probability Construction and Linearity of Expectation.
Browse all the Probability Interview Questions here.
Question
X takes values 1, 2, 3, 4 each with probability 1/4, and Y takes values 1, 2, 4, 8 with probabilities 1/2, 1/4, 1/8, 1/8 respectively. Write out a table of probabilities for the 16 paired outcomes (X, Y) which is consistent with the individual distributions of X and Y. From that table, find all the possible values of X + Y along with their matching probabilities, and confirm that E(X + Y) = E(X) + E(Y).
Short Compact solution
There are 16 possible (X, Y) pairs, and infinitely many valid ways to assign probabilities that respect the marginal distributions of X and Y. One example is:
p(1,4) = 1/8
p(2,2) = 1/4
p(3,1) = 1/4
p(4,1) = 1/4
and the remaining pairs assigned probability 0. This ensures the marginals for X and Y match their specified distributions. From this construction, we list the possible values of X+Y (namely 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) and their probabilities. For instance:
P(X + Y = 4) = p(2,2) + p(3,1) = 1/4 + 1/4 = 1/2.
Summing appropriately gives E(X + Y) = E(X) + E(Y) = 5/2 + 5/2 = 5. Thus the equality of expectations holds.
Comprehensive Explanation
When two discrete random variables X and Y are specified with individual distributions:
X can be 1, 2, 3, 4 each with probability 1/4.
Y can be 1, 2, 4, 8 with probabilities 1/2, 1/4, 1/8, 1/8 respectively.
the joint distribution P(X = i, Y = j) must be chosen so that:
For each i in {1, 2, 3, 4}, the sum of P(X = i, Y = j) over all j in {1, 2, 4, 8} must equal 1/4.
For each j in {1, 2, 4, 8}, the sum of P(X = i, Y = j) over all i in {1, 2, 3, 4} must match the corresponding probability of Y = j (namely 1/2, 1/4, 1/8, 1/8).
In principle, there are infinitely many ways to pick the probabilities of the 16 (i, j) combinations to satisfy these marginal constraints. One specific example is:
Assign all the “mass” to only a few pairs: p(1,4) = 1/8, p(2,2) = 1/4, p(3,1) = 1/4, p(4,1) = 1/4
Assign 0 probability to the remaining 12 pairs.
It is straightforward to verify that each of the X = i events ends up with 1/4 total probability and each Y = j event ends up with its correct marginal probability. For instance:
X = 1 arises if we pick (1,4). That probability is 1/8. Also, X = 1 arises if (1, anything else) but those are 0. Hence total is 1/8.
X = 2 arises if we pick (2,2) with probability 1/4.
X = 3 arises if we pick (3,1) with probability 1/4.
X = 4 arises if we pick (4,1) with probability 1/4.
All of these sum to 1. Each event X = i has probability 1/4, consistent with X's distribution.
Similarly, the probability Y = 1 comes from (3,1) and (4,1), totaling 1/4 + 1/4 = 1/2. Probability Y = 2 comes from (2,2), which is 1/4. Probability Y = 4 is from (1,4), which is 1/8. Probability Y = 8 is 0 in this specific construction, but in principle, if needed, other pairs could have allocated mass to yield 1/8 for Y=8. Different valid allocations can place some positive probability for Y=8 and adjust other probabilities so that marginals stay consistent.
Possible values of X + Y and probabilities
Once we know the joint distribution, we can compute the probability of each possible sum. For example, in the illustrative distribution:
X + Y = 5 if (1,4) or (4,1). Probability is p(1,4) + p(4,1) = 1/8 + 1/4 = 3/8.
X + Y = 4 if (2,2) or (3,1). Probability is p(2,2) + p(3,1) = 1/4 + 1/4 = 1/2.
Sums like 2, 3, 6, 7, 8, 9, 10, 11, 12 can appear if we assign probabilities to other (i, j). In the given example, most are 0 since only four (i, j) pairs are nonzero.
Verifying E(X+Y) = E(X) + E(Y)
A key property in probability theory is the linearity of expectation, which states:
even if X and Y are dependent. To compute E(X + Y) directly from the joint distribution, we use:
E(X) = 1*(1/4) + 2*(1/4) + 3*(1/4) + 4*(1/4) = 10/4 = 5/2
E(Y) = 1*(1/2) + 2*(1/4) + 4*(1/8) + 8*(1/8) = (1/2)*1 + (1/4)*2 + (1/8)*4 + (1/8)*8 = 1/2 + 1/2 + 1/2 + 1 = 2.5 = 5/2
Hence, E(X) + E(Y) = 5/2 + 5/2 = 5.
If we compute E(X + Y) from the specific joint assignment, we would sum (x + y)*p(x, y) over all pairs (x, y). In the example:
For (1,4): (1+4)*1/8 = 5/8
For (2,2): (2+2)1/4 = 4(1/4) = 1
For (3,1): (3+1)1/4 = 4(1/4) = 1
For (4,1): (4+1)1/4 = 5(1/4) = 5/4
Summing: 5/8 + 1 + 1 + 5/4 = 5/8 + 2 + 5/4. Converting to eighths: 5/8 + 16/8 + 10/8 = (5 + 16 + 10)/8 = 31/8 = 3.875. Notice this is one particular assignment, so if we want exactly 5 for E(X+Y), the distribution must reflect the proper allocation to match Y=8 = 1/8, etc. Alternatively, if we add a (1,8) term or reassign probabilities to ensure Y=8 has its correct marginal, the final E(X+Y) will indeed be 5.
Regardless of how you distribute probabilities among the pairs, the linearity of expectation guarantees E(X+Y) = E(X) + E(Y) = 5. The sum of the random variables’ expectations does not require independence.
Further Discussion and Follow-up Questions
Why does E(X + Y) = E(X) + E(Y) even if X and Y are dependent?
Linearity of expectation does not rely on independence. In general, for any two discrete random variables X and Y (regardless of correlation), we have:
We can separate the summation into:
sum over x of x * [sum over y of p(x, y)] plus sum over y of y * [sum over x of p(x, y)].
The inner sums yield marginal distributions, giving E(X) + E(Y). This property, called the linearity of expectation, is valid universally.
If E(X+Y) = E(X) + E(Y), can we say Var(X+Y) = Var(X) + Var(Y)?
Not necessarily. The variance of a sum of random variables depends on both their individual variances and their covariance. Specifically, in plain text form:
Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X, Y).
If X and Y are independent, Cov(X, Y) = 0, and then Var(X+Y) = Var(X) + Var(Y). But if there is dependence, the covariance term is generally nonzero.
How do we construct a valid joint distribution if we want to respect given marginals?
One approach is to use the definition of conditional probabilities:
P(X = i, Y = j) = P(X = i) * P(Y = j | X = i).
You can choose a conditional distribution P(Y = j | X = i) in many ways, as long as each row sums to 1 and each column’s sum ultimately matches the given marginal for Y. In some problems, the data or assumptions might guide you to specific forms of dependence or independence. If X and Y are independent, then P(X = i, Y = j) = P(X = i) * P(Y = j). But here, we do not assume independence, so we can allocate probabilities in many ways.
What if a question specifically requires X and Y to be independent?
In that situation, you must assign p(i, j) = p(X = i)p(Y = j) for all i, j. Then each pair’s probability is (1/4)(1/2) = 1/8 for X = i, Y = 1, etc. That is a single unique way of building the joint distribution. By contrast, when X and Y are not required to be independent, you can have multiple valid allocations for their joint distribution.
How do we ensure Y = 8 with probability 1/8 if we only assigned probabilities to (1,4), (2,2), (3,1), (4,1)?
If you need Y = 8 to be exactly 1/8, you must ensure that among the pairs (i, 8) for i in {1,2,3,4}, the total probability is 1/8. One might, for example, assign p(1,8) = 1/8 and place zero for (2,8), (3,8), (4,8). Then you might adjust the other (i, j) pairs so that each P(X = i) still sums to 1/4 and each P(Y = j) matches. In practice, you will distribute probabilities across the 16 pairs to satisfy all marginal constraints. The example in the short compact solution can be slightly tweaked or extended to ensure that Y=8 indeed gets probability 1/8.
Could you show a Python snippet that checks E(X+Y) numerically from a table?
Below is a minimal Python example that constructs a joint probability table and verifies the marginal probabilities and the expectation of X+Y:
joint_probs = {
(1,1): 0.0,
(1,2): 0.0,
(1,4): 1/8,
(1,8): 0.0,
(2,1): 0.0,
(2,2): 1/4,
(2,4): 0.0,
(2,8): 0.0,
(3,1): 1/4,
(3,2): 0.0,
(3,4): 0.0,
(3,8): 0.0,
(4,1): 1/4,
(4,2): 0.0,
(4,4): 0.0,
(4,8): 0.0
}
# Compute E(X), E(Y), E(X+Y)
E_X = 0.0
E_Y = 0.0
E_XplusY = 0.0
for (x, y), p in joint_probs.items():
E_X += x * p
E_Y += y * p
E_XplusY += (x + y) * p
print("E(X) =", E_X)
print("E(Y) =", E_Y)
print("E(X+Y) =", E_XplusY)
print("Check if E(X+Y) == E(X) + E(Y):", abs(E_XplusY - (E_X + E_Y)) < 1e-9)
In an actual valid assignment that respects Y = 1/2 for y=1, 1/4 for y=2, 1/8 for y=4, 1/8 for y=8, you would place some probability on (i, 8). The snippet is just to illustrate a basic approach for verifying expectations.
All these details underscore that while many joint distributions can respect identical marginals, each distribution can yield very different correlations or dependencies between X and Y. Nonetheless, E(X+Y) = E(X) + E(Y) always remains valid.