ML Interview Q Series: Deriving the Erlang Density for X+Y via Integration of the Joint PDF.

May 28, 2025

Browse all the Probability Interview Questions here.

Question

The joint density function of the random variables X and Y is given by

and f(x, y) = 0 otherwise. What is the density function of the random variable X+Y?

Connect with me on X (Twitter)

Short Compact solution

From the provided steps of the derivation, the PDF of Z = X+Y is found to be

and f(z) = 0 otherwise. This corresponds to an Erlang distribution (a special case of Gamma) with shape parameter 3 and scale parameter 1.

Comprehensive Explanation

To understand why Z = X+Y has the PDF above, it helps to see how the density of a sum of two continuous random variables can be obtained via integration or via known properties of specific distributions. Here is a high-level approach explaining the result in detail:

1. Interpreting the Joint PDF

The given joint PDF f(x,y) = (1/2)(x+y)e^{-(x+y)} for x>0, y>0 indicates that the density depends on x+y. Because exponentials and gamma-type functions often appear when dealing with sums, it is natural to suspect that X+Y might yield a gamma/Erlang distribution when integrated properly over the domain x>0, y>0.

2. Distribution Function Approach

One typical method is to compute P(Z ≤ z) by integrating f(x,y) over the region {x>0, y>0, x+y ≤ z}. For a fixed z>0, that region is 0 < x < z and 0 < y < z−x. Symbolically, we do:

P(Z ≤ z) = ∫(x=0 to z) [ ∫(y=0 to z−x) (1/2)(x+y)e^{-(x+y)} dy ] dx.

Differentiating that expression with respect to z gives fZ(z), the PDF of Z. The resulting integral simplifies to (1/2) z² e^(−z) for z>0.

3. Convolution View

Another perspective is through the convolution formula for the sum of two continuous random variables:

fZ(z) = ∫(0 to z) fX,Y(x, z−x) dx,

though in this case we have a joint PDF f(x,y), not separate marginals. However, if we carefully integrate over all x and y such that x+y=z, we reach the same conclusion. The integral is a bit more involved because of the factor (x + (z−x))e^{−(x + (z−x))} = z e^{−z}, plus the measure of all possible ways to split z into x and y. The exact details lead again to fZ(z) = (1/2) z² e^(−z).

4. Gamma/Erlang Interpretation

The final expression fZ(z) = (1/2) z² e^(−z), z>0, is precisely the PDF of an Erlang (or Gamma) distribution with shape parameter k=3 and scale parameter θ=1. In a gamma distribution with shape k and rate λ=1 (equivalently scale θ=1), the PDF is λ^k z^(k−1) e^(−λz)/Γ(k). Here, k=3, Γ(3)=2, and λ=1, yielding (z^(3−1) e^(−z))/2 = (z² e^(−z))/2.

5. Why Shape Parameter is 3

Since the final PDF has z² in front, it corresponds to z^(k−1) for k=3. This reveals that effectively, Z might behave like the sum of three independent exponential(1) random variables or a generalized scenario that yields the same shape. In simpler terms, an Erlang distribution of shape 3 arises whenever you can interpret the sum of 3 exponentials with the same rate, although here X and Y do not each individually have standard exponential distributions in the usual sense. Instead, the given joint PDF is structured so that X+Y ends up as Erlang(3,1).

Potential Follow-up Question: Why does shape parameter 3 appear if we only have two variables X and Y?

This is a natural question, since one might initially think the sum of two random variables would yield a shape parameter of 2 in the Erlang family. However, the joint PDF here is not the product of two exponentials (which would be the case if X and Y were independent exponential(1) random variables). Instead, the joint density f(x,y) = 1/2 (x+y)e^{−(x+y)} is a different distribution over (x,y), effectively adding another exponent-like component. The integral over the region x>0,y>0, x+y=z leads to an extra factor of z in the final expression, making the sum’s PDF behave like a gamma with shape parameter 3.

Potential Follow-up Question: Can we verify this using moment generating functions or Laplace transforms?

Yes. If we define M_XY(s,t) = E[e^(sX + tY)], that is the moment generating function of the pair (X,Y), we can attempt to evaluate it and then set s=t for the sum. Though one has to be careful in calculating M_X+Y(u) = E[e^(u(X+Y))], it should reduce to a form that matches that of a gamma distribution with shape 3 and rate 1. Specifically, the Laplace transform of z² e^(−z)/2 matches that of the gamma(3,1).

Potential Follow-up Question: How would I generate random samples for Z from X,Y in a simulation?

If you wanted to simulate the random variable Z = X+Y from the given joint PDF, you could:

Generate (X,Y) from their joint distribution by an appropriate accept-reject or transformation method.
Sum them to get Z.

Or you could note that Z itself is Erlang(3,1) and directly generate it by summing three i.i.d. exponential(1) random variables. Interestingly, generating X and Y individually from 1/2 (x+y) e^{−(x+y)} would require more specialized methods, but you could skip that step for Z since you already know Z is Erlang(3,1).

Below is a brief Python example that uses direct sampling for Z as a gamma(3,1). If we wanted to do a more thorough simulation, we would sample from the pair (X,Y) to confirm that the sum indeed matches the derived distribution. For large N, the empirical distribution of X+Y should converge to the theoretical Erlang(3,1).

import numpy as np
import matplotlib.pyplot as plt

N = 10_000_00
# Erlang with shape=3, rate=1 can be implemented as Gamma(k=3, theta=1/rate=1):
samples = np.random.gamma(shape=3, scale=1, size=N)

# Plot a histogram to check shape
plt.hist(samples, bins=100, density=True, alpha=0.5, label='Simulated')
z = np.linspace(0, 15, 200)
pdf = 0.5 * z**2 * np.exp(-z)
plt.plot(z, pdf, 'r-', label='Theoretical PDF')
plt.legend()
plt.show()

This code compares the simulated histogram to the theoretical PDF fZ(z) = (1/2) z² e^(−z) for z>0.

Potential Follow-up Question: What are some boundary and regularity conditions to keep in mind?

One key point is that both x and y are strictly positive, which means the support is x>0, y>0. As a result, Z>0. Also, the integrals that define P(Z ≤ z) are well-defined for z>0 because the exponential factor ensures the integrand decays quickly. Outside of z>0, the density is zero.

All these factors confirm that the final answer for the PDF of Z is indeed (1/2) z² e^(−z), z>0, consistent with a gamma/Erlang distribution of shape parameter 3, rate 1.

Below are additional follow-up questions

Could the random variable Z = X + Y be negative?

It is worth examining whether Z can ever be negative given the specified joint distribution f(x, y) = (1/2)(x + y) e^(−(x + y)) for x>0, y>0. Since both x and y are constrained to be strictly positive, the smallest possible sum is any value strictly greater than 0. As a result, the support for Z is indeed z>0. This implies P(Z < 0)=0. One subtlety is at z=0, but that is a boundary point at which neither x nor y can be exactly 0 under the given support. Hence, fZ(z)=0 for z≤0, making it impossible for Z to take negative values or zero.

A potential pitfall is if someone incorrectly includes z=0 in the domain or assumes the distribution is valid for z≤0. That would contradict the original definition of x>0, y>0. In practical data analysis scenarios, one must ensure that the original random variables cannot be zero or negative. If the real-world scenario allows x=0 or y=0, we would need to adjust the domain or distribution accordingly.

Could X and Y each be recognized as standard Gamma random variables individually?

One might ask if X alone or Y alone, by marginalizing out the other variable, follows a gamma distribution with a certain shape or rate. To investigate X, one would integrate f(x, y) over y>0:

marginal fX(x) = ∫(y=0 to ∞) (1/2)(x + y) e^(−(x + y)) dy, for x>0.

Carrying out the integration carefully reveals that fX(x) = (1/2) e^(−x) ∫(y=0 to ∞) (x + y) e^(−y) dy. The integral over y must be finite, but we need to see if it matches the form of a gamma distribution. Explicitly evaluating that integral, we get:

∫(0 to ∞) (x + y) e^(−y) dy = x ∫(0 to ∞) e^(−y) dy + ∫(0 to ∞) y e^(−y) dy = x(1) + 1.

Hence fX(x) = (1/2) e^(−x) (x + 1). While this is a legitimate PDF on x>0, it does not reduce to a simple exponential or a standard gamma distribution with integer shape because of the (x+1) factor. A similar argument applies to Y. Consequently, X and Y themselves do not individually follow a single-parameter exponential nor a basic Erlang distribution. This can be a pitfall if someone incorrectly assumes that X and Y must each be exponential. Instead, the structure of the joint density is such that X+Y turns out to be Erlang(3,1).

What if the shape parameter were not an integer? Could the sum still be gamma?

In this problem, the shape parameter of the resulting sum Z is 3, which is an integer, making it an Erlang distribution. In general, sums of certain continuous distributions can result in gamma distributions that need not have an integer shape parameter. However, for sums of i.i.d. exponential random variables with rate λ, the shape parameter does become an integer equal to the number of exponentials being summed. In contrast, this problem’s joint density is crafted such that even though only two variables are being summed, the shape parameter is 3. This underscores that an Erlang distribution (integer shape) can emerge from more complex dependencies, not merely from summing three i.i.d. exponentials.

A subtle pitfall would be to assume that each new random variable we add to the sum always increments the shape by 1 in a naive manner. Here, the structure of the joint distribution contributed an additional factor that mimics having a total shape of 3. To generalize, a sum of variables with certain correlated distributions can yield gamma-like outcomes with shape parameters that differ from the number of terms being added.

How does one confirm the correctness of the derived PDF for Z in a real-world computational setting?

To verify fZ(z) = (1/2) z^2 e^(−z) for z>0 in a simulation, one would want to sample from the pair (X,Y) under the given joint density. Because f(x, y) = (1/2) (x + y) e^(−(x + y)), we cannot simply generate X and Y independently. One approach is to use a direct method:

Consider using acceptance-rejection:
- Propose (X', Y') from a simpler known joint distribution whose support is x>0, y>0 (for instance, an exponential distribution for each coordinate, but that might not match the shape exactly).
- Compute the ratio of the target density to the proposal density to decide acceptance or rejection.
Once (X, Y) is accepted, compute Z = X+Y.
Compare the empirical histogram of Z to the theoretical Erlang(3,1) PDF.

A subtle pitfall arises if you incorrectly sample X and Y as though they were independent exponentials. That sampling would not match f(x,y), so the resulting sum distribution would not be the correct Erlang(3,1). Another challenge is ensuring the acceptance-rejection bounding function is chosen so that the acceptance rate is not prohibitively low.

How does knowledge of covariance or correlation between X and Y factor into the derivation for Z?

From the given joint PDF, one can in principle compute E[X], E[Y], E[XY], and thus find Cov(X,Y). The correlation might not be zero. If X and Y were independent, their joint PDF would factor into f(x) g(y), which it does not here. That indicates some form of dependence. In particular, the presence of (x + y) e^(−(x + y)) suggests that whenever x is large, y is somewhat less likely to also be large, although the relationship is more intricate than a straightforward negative or positive correlation.

Knowing that X and Y are dependent or have some correlation does not obstruct the fact that Z = X+Y is gamma(3,1). A pitfall arises if someone incorrectly assumes that only sums of independent exponentials produce a gamma distribution. Dependence does not necessarily preclude gamma sums, as demonstrated by this example. The main lesson is that the details of the joint PDF can lead to surprising or non-obvious results for the distribution of sums.

If one wanted the PDF of max(X,Y) or min(X,Y), how would we proceed?

Although the primary question focuses on X+Y, sometimes one might be interested in the distribution of maximum or minimum. For example, let M = max(X,Y). Then:

P(M ≤ m) = P(X ≤ m, Y ≤ m),

and we would integrate over the square region 0<x<m, 0<y<m. Because X and Y are dependent through (x + y) e^(−(x + y)), the integration approach is more involved. After finding the CDF by integrating f(x,y) appropriately, one differentiates to get the PDF. A subtlety is that the boundary conditions are not just x<m, y<m, but also x>0, y>0. This does not necessarily yield a simple closed form like with Z = X+Y.

A pitfall here is to try a standard maximum-of-two-independent-variables approach, P(M ≤ m) = P(X ≤ m)P(Y ≤ m), which only applies if X and Y are independent. In the presence of correlation, that factorization is invalid. So you must integrate over the correct region of the joint distribution carefully.

Rohan's Bytes

Discussion about this post