ML Interview Q Series: Calculating Normal Distribution Probabilities and Percentiles via Standardization
Browse all the Probability Interview Questions here.
Suppose X is N(10, 1). Find (i) P[X > 10.5], (ii) P[9.5 < X < 11], (iii) x such that P[X < x] = 0.95. Use Standard Normal tables.
Short Compact solution
Let Z denote a N(0,1) random variable. Then:
• For part (i): P[X > 10.5] = 1 - Φ(0.5) = 0.3085 • For part (ii): P[9.5 < X < 11] = Φ(1) - Φ(-0.5) = 0.5328 • For part (iii): We need x such that P[X < x] = 0.95. Transforming to Z: (x - 10) = 1.645, which gives x = 11.645
Comprehensive Explanation
Transforming X to the Standard Normal Variable
Because X has mean μ = 10 and standard deviation σ = 1, we define the standard normal variable Z as:
Here:
X is our original normally distributed variable with mean 10 and standard deviation 1.
μ is the mean of X (10).
σ is the standard deviation of X (1).
Z is a standard normal random variable, which means Z ~ N(0, 1).
Part (i): Computing P[X > 10.5]
We first convert the event X > 10.5 into an event on Z. Note that: 10.5 - 10 = 0.5, and 0.5 / 1 = 0.5
Hence, P[X > 10.5] = P[Z > 0.5] = 1 - P[Z <= 0.5] = 1 - Φ(0.5).
From standard normal tables, Φ(0.5) is approximately 0.6915, so P[X > 10.5] = 1 - 0.6915 = 0.3085.
Part (ii): Computing P[9.5 < X < 11]
We convert the bounds to Z:
Lower bound: (9.5 - 10) / 1 = -0.5
Upper bound: (11 - 10) / 1 = 1
Therefore, P[9.5 < X < 11] = P[-0.5 < Z < 1] = Φ(1) - Φ(-0.5).
Using standard normal tables: Φ(1) ≈ 0.8413 and Φ(-0.5) ≈ 0.3085, so P[-0.5 < Z < 1] = 0.8413 - 0.3085 = 0.5328.
Part (iii): Finding x such that P[X < x] = 0.95
We want the 95th percentile of X. We know that:
P[X < x] = 0.95
Transforming to Z:
P[Z < (x - 10)/1] = 0.95.
The 95th percentile of the standard normal distribution (z-value) is often denoted as z0.95 ≈ 1.645. Hence,
(x - 10) = 1.645 x = 10 + 1.645 = 11.645.
Follow-up Question 1
Why do we use Z ~ N(0,1) in solving problems involving X ~ N(μ, σ²)?
A standard normal variable Z is a special case of the normal distribution with mean 0 and variance 1. By converting any normal random variable X ~ N(μ, σ²) to Z using the formula (X - μ)/σ, we can leverage standardized tables or well-known software functions that give the cumulative distribution function (CDF) for Z. This standardization is a universal way of handling any normal distribution without compiling separate tables for each possible (μ, σ).
Follow-up Question 2
How do we compute these probabilities in Python without using manual tables?
In Python, one can use libraries such as scipy.stats
to compute normal distribution probabilities and quantiles. For example:
import math
from scipy.stats import norm
# Part (i): P[X > 10.5], X ~ N(10, 1)
p_i = 1 - norm.cdf(10.5, loc=10, scale=1)
# Part (ii): P[9.5 < X < 11]
p_ii = norm.cdf(11, loc=10, scale=1) - norm.cdf(9.5, loc=10, scale=1)
# Part (iii): x such that P[X < x] = 0.95
x_95 = norm.ppf(0.95, loc=10, scale=1)
print(p_i, p_ii, x_95)
Here:
norm.cdf(x, loc=μ, scale=σ)
returns the value Φ((x - μ)/σ).norm.ppf(q, loc=μ, scale=σ)
is the inverse CDF (i.e., the quantile function).
Follow-up Question 3
Are there any edge cases if σ ≠ 1?
Yes. If the standard deviation σ is not 1, the transformation becomes Z = (X - μ)/σ. You must still look up Z in standard normal tables or use software for Φ. That is the main reason standardization is widely used: it always brings the distribution back to a form where mean=0, std=1. If σ were 2, for example, then P[X > 12] turns into P[Z > (12 - μ)/σ] = P[Z > (12 - 10)/2] = P[Z > 1], and you would look up Φ(1) to find the probability.
Follow-up Question 4
Why is 1.645 used for the 95th percentile instead of 1.96?
The value 1.645 is associated with a one-sided 95th percentile, meaning P[Z < 1.645] = 0.95. On the other hand, 1.96 is associated with a two-sided 95% confidence interval, where we typically want the central area between -1.96 and 1.96 to be 0.95. So 1.96 is the z-value used when we talk about two-sided coverage. If the question explicitly asks for the one-sided 95th percentile cut-off, 1.645 is correct.
These nuances often arise when interpreting confidence intervals vs. percentile cut-offs. In summary, 1.645 for a single tail at 5% above that point, and 1.96 for splitting 5% across both tails.