ML Interview Q Series: Analyzing Linear Probability Density Functions: Normalization, CDF, Quantiles, and Interval Probabilities
Browse all the Probability Interview Questions here.
Question
Let the random variable X have the density
and 0 otherwise. Find the constant k that makes this a valid probability density function. Next, find x1 and x2 such that P(X <= x1) = 0.1 and P(X <= x2) = 0.95. Finally, compute P(|X - 1.8| < 0.6).
Short Compact solution
To ensure the integral from 0 to 3 of f(x) dx equals 1, we solve for k and find k = 2/9.
For any c in (0, 3), the cumulative distribution function is P(X <= c) = c^2 / 9. Therefore:
x1 is found by solving x1^2 / 9 = 0.1, which gives x1 = sqrt(0.9) ≈ 0.9487
x2 is found by solving x2^2 / 9 = 0.95, which gives x2 = sqrt(8.55) ≈ 2.9240
The probability that |X - 1.8| < 0.6 translates to 1.2 < X < 2.4, so
(2.4^2 - 1.2^2) / 9 = 0.48.
Comprehensive Explanation
Finding k
A probability density function (pdf) f(x) must integrate to 1 over its domain. Given f(x) = k x for 0 <= x <= 3 and 0 otherwise, we solve:
When integrating k x from 0 to 3:
The indefinite integral of k x is k * (x^2 / 2).
Evaluating from 0 to 3, we get k * (3^2 / 2) - k * (0^2 / 2) = k * (9 / 2).
So we have k * (9/2) = 1. Solving for k:
9k / 2 = 1 => k = 2/9.
Hence the valid pdf is f(x) = (2/9) x for 0 <= x <= 3.
Cumulative Distribution Function (CDF)
For 0 <= x <= 3, the CDF F(x) = P(X <= x) is obtained by integrating the pdf from 0 to x:
F(x) = ∫[0 to x] (2/9) t dt = (2/9) * (x^2 / 2) = x^2 / 9.
When x < 0, F(x) = 0, and when x > 3, F(x) = 1, by definition of the distribution.
Finding x1 and x2 for given probabilities
We want x1 such that P(X <= x1) = 0.1. Using the CDF for 0 <= x <= 3:
x1^2 / 9 = 0.1.
Solving for x1:
x1^2 = 0.9, so x1 = sqrt(0.9) ≈ 0.9487.
Similarly, for x2 such that P(X <= x2) = 0.95:
x2^2 / 9 = 0.95.
So x2^2 = 8.55, giving x2 = sqrt(8.55) ≈ 2.9240.
Finding P(|X - 1.8| < 0.6)
The event |X - 1.8| < 0.6 is equivalent to 1.8 - 0.6 < X < 1.8 + 0.6, i.e. 1.2 < X < 2.4. We compute:
P(1.2 < X < 2.4) = F(2.4) - F(1.2),
where F(x) = x^2 / 9. Thus:
F(2.4) = 2.4^2 / 9 = 5.76 / 9 = 0.64, F(1.2) = 1.2^2 / 9 = 1.44 / 9 = 0.16,
so P(1.2 < X < 2.4) = 0.64 - 0.16 = 0.48.
Hence the probability that |X - 1.8| < 0.6 is 0.48.
Follow-up Question 1: What if the domain were [0, b] instead of [0, 3]?
If the domain of X were changed to 0 <= x <= b (for some positive b), the pdf would be f(x) = k x on that interval. We would still require:
∫[0 to b] k x dx = 1.
Performing the integral:
k * (b^2 / 2) = 1 => k = 2 / b^2.
Then the corresponding CDF for x in [0, b] would be x^2 / b^2.
To find a quantile c such that P(X <= c) = α for some α in (0,1), we solve:
c^2 / b^2 = α => c = b * sqrt(α).
Hence, the overall pattern is straightforward to generalize.
Follow-up Question 2: Is this distribution related to any known family?
The function f(x) = k x on [0, b] is a special case of the triangular-like distributions and can also be considered a scaled version of the Beta(2,1) distribution. Specifically, if Y ~ Beta(2,1) on [0,1], then Y has pdf 2y. Scaling Y by b gives X = bY, which has the pdf (2 / b^2) x for x in [0,b]. Although not always referred to as a “standard named distribution” (like normal or uniform), it can be viewed as a “Beta-type” distribution on [0,b].
Follow-up Question 3: How do we find the median or a general q-th quantile?
To find the median m, we solve P(X <= m) = 0.5, which means F(m) = 0.5. In this problem (with b=3, k=2/9):
m^2 / 9 = 0.5 => m^2 = 4.5 => m = sqrt(4.5).
In general, for a q-th quantile, we set x^2 / 9 = q. Then x = 3 sqrt(q). This approach generalizes to any percentile or quantile.
Follow-up Question 4: How would we compute E[X] and Var[X]?
Once we know f(x) = (2/9)x for x in [0,3], we can compute expectations:
E[X] = ∫[0 to 3] x f(x) dx = ∫[0 to 3] x * (2/9)x dx = (2/9) ∫[0 to 3] x^2 dx.
That integral is (2/9) * (3^3 / 3) = (2/9) * (27 / 3) = (2/9) * 9 = 2.
So E[X] = 2.
To find Var(X), we use Var(X) = E[X^2] - (E[X])^2. First we compute E[X^2]:
E[X^2] = ∫[0 to 3] x^2 * (2/9)x dx = (2/9) ∫[0 to 3] x^3 dx = (2/9) * (3^4 / 4) = (2/9) * (81 / 4) = 162 / 36 = 4.5.
Hence Var(X) = 4.5 - (2)^2 = 4.5 - 4 = 0.5.
Follow-up Question 5: What if we want P(|X - a| < d) for general a and d?
In general, P(|X - a| < d) = P(a - d < X < a + d). We would use the CDF:
F(x) = x^2 / 9, for 0 <= x <= 3, and clamp it to 0 or 1 if x is outside [0,3].
So the probability would be:
P(a - d < X < a + d) = F(a + d) - F(a - d),
provided 0 <= (a - d) < (a + d) <= 3. If those endpoints exceed the domain, we would clip them to [0, 3] to keep the probability in a valid range.
This approach is a common step in dealing with absolute-value inequalities involving random variables.