ML Interview Q Series: Bivariate Normal Parameters from Linear Combinations of Independent Standard Normals
Browse all the Probability Interview Questions here.
Let (Z_{1}) and (Z_{2}) be independent random variables each having the standard normal distribution. Define
Argue that the joint distribution of ((X, Y)) is a bivariate normal distribution, and find its parameters.
Short Compact solution
Any linear combination of the random variables (X) and (Y) can be written as (aZ_{1} + bZ_{2}) for some constants (a) and (b). Because (Z_{1}) and (Z_{2}) are each normally distributed and independent, every such linear combination is also normally distributed. Hence, by a standard result (Rule 12.3 in many texts), the vector ((X, Y)) must be a bivariate normal vector.
The parameters (means, variances, and correlation) follow from direct calculation. The expectations are
(\mathrm{E}[X] = 0)
(\mathrm{E}[Y] = 0)
so (\mu_{1} = 0) and (\mu_{2} = 0).
Next, one finds the variances:
(\mathrm{Var}(X) = 10)
(\mathrm{Var}(Y) = 2)
Finally, from the computation of (\mathrm{Cov}(X, Y)), the result given is [ \huge \mathrm{Cov}(X, Y) ;=; 5. ] Hence the correlation (\rho) is [ \huge \rho ;=; \frac{5}{\sqrt{,10\times 2,}} ;=; \frac{5}{\sqrt{,20,}} ;=;\tfrac{1}{2},\sqrt{5}. ] Thus the distribution of ((X, Y)) is bivariate normal with [ \huge \mu_{1} = 0,\quad \mu_{2} = 0,\quad \sigma_{1}^{2} = 10,\quad \sigma_{2}^{2} = 2,\quad \rho = \tfrac{1}{2},\sqrt{5}. ]
Comprehensive Explanation
Why ((X, Y)) is Bivariate Normal
A well-known property of jointly normal (Gaussian) vectors is that all their linear combinations must themselves be (univariate) normal. Equivalently, if (Z_{1}) and (Z_{2}) are independent, standard normal random variables, then any vector whose components are linear combinations of (Z_{1}) and (Z_{2}) will form a multivariate (in this case, bivariate) normal vector.
Concretely, observe that
(X = Z_{1} + 3,Z_{2})
(Y = Z_{1} + Z_{2})
Any linear combination (\alpha X + \beta Y) can be rewritten as (\alpha(Z_{1} + 3Z_{2}) + \beta(Z_{1} + Z_{2})), which simplifies to [ \huge (\alpha + \beta),Z_{1} ;+; (3\alpha + \beta),Z_{2}. ] Since (Z_{1}) and (Z_{2}) are independent normals, any such linear combination is also normal. Therefore, the pair ((X, Y)) must be jointly normal.
Mean and Variance Calculations
Because (Z_{1}) and (Z_{2}) each have mean 0, we immediately get:
(\mathrm{E}[X] = \mathrm{E}[Z_{1}] + 3,\mathrm{E}[Z_{2}] = 0.)
(\mathrm{E}[Y] = \mathrm{E}[Z_{1}] + \mathrm{E}[Z_{2}] = 0.)
Hence (\mu_{1} = 0) and (\mu_{2} = 0).
For the variances:
(\mathrm{Var}(X) = \mathrm{Var}(Z_{1} + 3Z_{2}) = \mathrm{Var}(Z_{1}) + 9,\mathrm{Var}(Z_{2}) = 1 + 9 = 10.)
(\mathrm{Var}(Y) = \mathrm{Var}(Z_{1} + Z_{2}) = \mathrm{Var}(Z_{1}) + \mathrm{Var}(Z_{2}) = 1 + 1 = 2.)
Covariance and Correlation
The short solution above states that (\mathrm{Cov}(X, Y)) is (5). Thus the correlation (\rho) is:
Putting it all together, one describes ((X, Y)) fully by the means (\mu_{1}=0), (\mu_{2}=0), the variances (\sigma_{1}^{2}=10), (\sigma_{2}^{2}=2), and correlation (\rho = \tfrac{1}{2},\sqrt{5}).
Follow Up Question 1
How would the conclusion change if (Z_{1}) and (Z_{2}) were not independent?
If (Z_{1}) and (Z_{2}) are not independent (yet still individually normal), one would need to check whether they jointly form a bivariate normal vector. In particular, if ((Z_{1}, Z_{2})) itself is a bivariate normal vector (but possibly correlated), then any linear combination of them is still normal, and ((X,Y)) would still remain bivariate normal. The main difference would be in computing the parameters:
You would still have (X = Z_{1} + 3Z_{2}) and (Y = Z_{1} + Z_{2}).
The means remain (\mathrm{E}[X]) and (\mathrm{E}[Y]) as before, but
The variances and covariance would have additional cross terms involving (\mathrm{Cov}(Z_{1}, Z_{2})).
If ((Z_{1}, Z_{2})) were not jointly normal, however, then ((X, Y)) need not be jointly normal. So the independence (or at least joint normality) of (Z_{1}) and (Z_{2}) is the crucial assumption that guarantees the bivariate normality of ((X,Y)).
Follow Up Question 2
When would the correlation between (X) and (Y) be zero, or (\pm 1)?
Correlation (= 0): This would mean (\mathrm{Cov}(X, Y) = 0). For instance, if (X) and (Y) were constructed in such a way that their cross terms canceled perfectly, or one was orthogonal to the other in distribution. In our example, you would need (\mathrm{Cov}(Z_{1}+3Z_{2},,Z_{1}+Z_{2})=0). That would require an appropriate relationship among the coefficients and also the underlying correlation structure of (Z_{1}, Z_{2}).
Correlation (= 1) or (-1): Perfect (positive or negative) correlation would mean ((X, Y)) lie on a straight line with probability 1, i.e., (Y = a,X + b) almost surely for some constants (a, b). In the context of linear combinations of normal variables, this typically arises when one variable is just a constant multiple plus a shift of the other. For our definitions (X = Z_{1} + 3Z_{2}) and (Y = Z_{1} + Z_{2}), you cannot get (\pm 1) correlation with independent (Z_{1}, Z_{2}) unless the coefficients forced them to be exact multiples.
Follow Up Question 3
Can you show a quick Python code snippet (e.g. using PyTorch) to generate samples of ((X, Y))?
Below is an illustrative Python snippet. We draw large batches of (Z_{1}, Z_{2}) from (\mathcal{N}(0,1)) independently, then form (X) and (Y):
import torch
# Number of samples
N = 100000
# Draw Z1, Z2 from standard normal
Z1 = torch.randn(N)
Z2 = torch.randn(N)
# Define X = Z1 + 3*Z2, Y = Z1 + Z2
X = Z1 + 3 * Z2
Y = Z1 + Z2
# Now X and Y hold samples from the desired distribution
# You could compute empirical statistics to check
mean_X = X.mean()
mean_Y = Y.mean()
var_X = X.var()
var_Y = Y.var()
cov_XY = ((X - mean_X)*(Y - mean_Y)).mean()
print("Empirical mean(X):", mean_X.item())
print("Empirical mean(Y):", mean_Y.item())
print("Empirical var(X):", var_X.item())
print("Empirical var(Y):", var_Y.item())
print("Empirical cov(X,Y):", cov_XY.item())
# correlation
corr_XY = cov_XY / (var_X.sqrt() * var_Y.sqrt())
print("Empirical correlation:", corr_XY.item())
In practice, you should see the empirical values converge toward the theoretical values: mean(X)=0, mean(Y)=0, var(X)=10, var(Y)=2, cov(X,Y)=5 (assuming the snippet’s result), and correlation(X,Y)= 5 / sqrt(20).
This straightforward simulation is a good check that highlights how linear transformations of independent normal variables produce correlated bivariate normal variables.
Below are additional follow-up questions
What if we are interested in the distribution of X + Y or any other linear combination of X and Y?
If (X) and (Y) are jointly normal, any linear combination of them, such as (U = a,X + b,Y), will also be normally distributed. This result holds because for jointly Gaussian variables, linear combinations preserve normality. Specifically, if ((X, Y)) follows a bivariate normal distribution, then the marginal distributions of (X) and (Y) are each univariate normal (which we already know), and any linear combination of (X) and (Y) is also normal with mean (a,\mathrm{E}[X] + b,\mathrm{E}[Y]) and variance (a^{2},\mathrm{Var}(X) + b^{2},\mathrm{Var}(Y) + 2ab,\mathrm{Cov}(X, Y)).
A potential pitfall is assuming that this property (all linear combinations are normal) holds for variables that are not truly jointly Gaussian but might simply each be marginally normal. Marginal normality alone does not guarantee that all linear combinations are normal. One must verify joint Gaussianity. This can be a real-world issue if your data appear “unimodal” and individually normal yet show hidden dependence structures that break the bivariate normal assumption.
How do we compute conditional distributions such as p(X | Y = y) in a bivariate normal setting?
In the bivariate normal case, the conditional distribution of one variable given the other is always normal. Specifically, if ((X, Y)) has mean vector ((\mu_{1}, \mu_{2})), variances (\sigma_{1}^{2}, \sigma_{2}^{2}), and correlation (\rho), then the conditional distribution (X \mid (Y = y)) is normal with:
Mean = (\mu_{1} + \rho,\frac{\sigma_{1}}{\sigma_{2}},(y - \mu_{2}))
Variance = ((1 - \rho^{2}),\sigma_{1}^{2})
In real practice, a subtlety arises if the correlation (\rho) is close to (\pm1). This leads to near-degenerate conditional distributions with extremely small variance. Numerically, such scenarios can be unstable when working with finite floating-point precision. Implementation must handle these edge cases carefully, or you might see large rounding errors or ill-conditioned covariance matrices.
How do we generate bivariate normal samples with a specified correlation?
A common method is to start from two independent standard normals and then introduce the desired correlation using a linear transform. Specifically, if you want random variables ((X, Y)) with a given correlation (\rho), you can do:
Draw (Z_{1}, Z_{2}) i.i.d. from (\mathcal{N}(0, 1)).
Construct [ \huge X = \sigma_{1},Z_{1} + \mu_{1}, \quad Y = \rho,\sigma_{2},Z_{1} + \sqrt{1-\rho^{2}},\sigma_{2},Z_{2} + \mu_{2}. ]
Check that (X) and (Y) have the desired means (\mu_{1}, \mu_{2}), variances (\sigma_{1}^{2}, \sigma_{2}^{2}), and correlation (\rho).
An alternative general approach for the multivariate case is to apply the Cholesky decomposition of the desired covariance matrix to a vector of i.i.d. standard normals. The main pitfall here is ensuring that the covariance matrix you provide is positive semidefinite. If it is not (e.g., if the specified correlation does not fall between (-1) and (+1)), then you cannot generate a legitimate bivariate normal.
How do we verify in practice whether data follows a bivariate normal distribution?
One practical approach is to use graphical methods:
Plot the scatter plot of ((X, Y)) and look for elliptical contours, which is characteristic of bivariate normal data.
Use a Q-Q plot (quantile-quantile plot) for each variable marginally, though this only checks univariate normality.
Examine contour plots or 2D histograms to see if the shape matches the elliptical pattern typical of a bivariate Gaussian.
Statistical tests such as Mardia’s test for multivariate normality can be applied, but they might have limited power in smaller samples. A pitfall is that real-world data often deviates from normality in the tails (e.g., heavier or lighter tails) even if it looks “normal enough” near the center. This deviation can be critical in risk-sensitive applications, such as finance, where tail behavior matters more than average-case scenarios.
How can maximum likelihood estimation (MLE) be carried out for bivariate normal parameters?
For bivariate normal data ({(x_{i}, y_{i})}_{i=1}^{n}), the MLE for the means, variances, and correlation involves:
Estimating (\mu_{1}) and (\mu_{2}) by the sample averages.
Estimating (\sigma_{1}^{2}, \sigma_{2}^{2}) by the sample variances.
Estimating (\rho) by the sample correlation, using the sample covariance divided by (\sigma_{1},\sigma_{2}).
The likelihood function for the bivariate normal is well-known, and its log-likelihood is simpler to work with. The main difficulty arises if the sample covariance matrix becomes close to singular (particularly in small-sample scenarios or if the data are nearly collinear). Numerical instability can lead to inaccurate parameter estimates or non-invertible covariance matrices. In practice, a regularization or shrinkage approach might be used to handle such cases robustly.
What if we consider more than two linear combinations, leading to a higher-dimensional normal?
The principles generalize. If ((Z_{1}, Z_{2})) is a bivariate normal vector (in particular, if each is standard normal and they have a correlation structure), any higher-dimensional set of variables ((U_{1}, U_{2}, \ldots, U_{k})) formed by linear transformations of ((Z_{1}, Z_{2})) remains jointly normal. The dimension of the resulting vector is limited by the fact that you only have two base random variables. So while you can have multiple linear combinations, the resulting vector can span at most a 2-dimensional space.
A subtle point is that, to get a true (k)-variate normal for (k>2), you need (k) independent normal “sources.” If you only have two independent normals, you can produce multiple correlated variables that lie in a 2D subspace, but not a general (k)-dimensional normal distribution.
How would non-stationary or time-varying properties affect the bivariate normal assumption?
In many real-world cases, the distributions of (X) and (Y) may evolve over time (e.g., changing means, variances, or correlation). If we assume a stationary bivariate normal process but the real data violates stationarity, parameter estimates that assume constancy can be misleading. One could see temporal drift in correlation, or variance that changes seasonally. A typical solution is to break the timeline into segments short enough to assume approximate stationarity within each segment, or to apply more advanced models like state-space models or time-varying covariance models (e.g., GARCH for financial data). The pitfall is ignoring these dynamics and applying a static bivariate normal model, which leads to poor forecasts and risk assessments.
Can the correlation be misleading if X and Y are not centered or if they exhibit outliers?
Yes, correlation can be heavily influenced by outliers, especially in small sample sizes. For example, a few extreme values in the tails of one variable can inflate or deflate the empirical correlation. Additionally, if (X) and (Y) have strong nonlinear relationships, the correlation might fail to capture that structure. Even with a bivariate normal assumption, you generally want to center the data (subtract the mean) before computing the sample covariance or correlation to get correct parameter estimates. Forgetting to de-mean is a common pitfall: it causes the sample correlation to deviate from the theoretical correlation we derived.
Is it possible for one marginal distribution to be normal while the other marginal distribution is skewed?
If the pair ((X, Y)) is truly bivariate normal, then both marginals must themselves be normal. A scenario where one marginal is skewed and the other is normal implies that ((X, Y)) cannot be bivariate normal, because in a true bivariate Gaussian, each component (and any linear combination) is guaranteed to be univariate Gaussian.
In practice, you might see data that appear univariate normal for one variable but not for the other, raising red flags that the joint distribution is not bivariate normal. One must examine both marginals — and, ideally, their joint contour plots or other diagnostic checks — to ensure that the bivariate normal assumption is valid.