ML Interview Q Series: Geometric Probability: Solving Sum Inequalities for Uniform Random Variables in (0,1).
Browse all the Probability Interview Questions here.
Question
Choose three random numbers X_1, X_2, X_3 from (0, 1), independently of each other. What is P(X_1 > X_2 + X_3)? What is the probability that the largest of the three random numbers is greater than the sum of the other two?
Short Compact solution
By independence, each triple (x_1, x_2, x_3) in (0,1)^3 has joint density 1. Consider the region C = {0 < x_1, x_2, x_3 < 1 : x_1 > x_2 + x_3}. Integrating the joint density over C gives 1/6. By symmetry, each of the three numbers has the same chance to exceed the sum of the other two, and these events are mutually exclusive. Hence the probability that the largest random number is greater than the sum of the other two is 3 × (1/6) = 1/2.
Comprehensive Explanation
Setting up the problem
We select three real numbers X_1, X_2, X_3 independently from the interval (0,1). We want two main probabilities:
P(X_1 > X_2 + X_3)
The probability that the largest of {X_1, X_2, X_3} is strictly greater than the sum of the other two.
First probability: P(X_1 > X_2 + X_3)
Because X_1, X_2, X_3 are independent and uniformly distributed on (0,1), the joint probability density function (pdf) f(x_1, x_2, x_3) = 1 for 0 < x_1, x_2, x_3 < 1 and is 0 otherwise.
We want P(X_1 > X_2 + X_3), which corresponds to integrating the joint density over the region where x_1 > x_2 + x_3, subject to 0 < x_1, x_2, x_3 < 1. In mathematical form:
Explanation of the integral setup:
We fix x_1 in the range (0,1).
For a given x_1, we want x_2 + x_3 < x_1. We can iterate x_2 from 0 to x_1.
For each x_2 in that range, we then require x_3 < x_1 - x_2, so x_3 goes from 0 to x_1 - x_2.
Since the pdf is 1 everywhere inside (0,1)^3, the volume of this region is just the triple integral of 1 over that region.
Carrying out the integrals step by step gives 1/6.
Second probability: The largest number is greater than the sum of the other two
We want P( max{X_1, X_2, X_3} > (sum of the other two) ). Observe:
The event X_1 > X_2 + X_3
The event X_2 > X_1 + X_3
The event X_3 > X_1 + X_2
Because the X_i are i.i.d. (identically and independently distributed), these three events have the same probability, which we have already found to be 1/6 each. They are also mutually exclusive (it is impossible for more than one of these to happen simultaneously when all X_i > 0). Therefore:
Hence, the probability that the single largest random number among the three exceeds the sum of the other two is 1/2.
Intuitive geometric interpretation
An alternative way to see this is to think about the unit cube (0,1)^3 in 3D space. We are looking at the portion of the cube where one coordinate is larger than the sum of the other two. That portion is symmetrical across all three coordinates. Each portion amounts to a volume of 1/6, and there are three disjoint portions that correspond to each variable being the one that exceeds the sum of the remaining two.
Connection to a general n-variable case
There is a well-known extension: for X_1, X_2, …, X_n i.i.d. Uniform(0,1), the probability that a particular X_i is greater than the sum of the remaining n−1 variables is 1/n!. By symmetry, any one of the X_i’s can be the one that is larger than the sum of the others, and those events are disjoint. Hence,
P(X_1 > X_2 + … + X_n) = 1/n!
and
P( max{X_1, …, X_n} > sum of the others ) = n × (1/n!) = 1/(n−1)!
For n=3, 1/n! = 1/6, which matches our computed result above.
Potential Follow-Up Questions
Could we use order statistics instead?
Yes. We could label X_(1) ≤ X_(2) ≤ X_(3) as the sorted values. Then X_(3) is the maximum. One might try to find P(X_(3) > X_(1) + X_(2)) using the joint distribution of order statistics. However, direct integration in the original variables is usually simpler, since each region is more straightforward to set up.
How would we verify the result by simulation?
One can quickly verify the result using a Monte Carlo approach. For instance:
import numpy as np
N = 10_000_000
X1 = np.random.rand(N)
X2 = np.random.rand(N)
X3 = np.random.rand(N)
count1 = np.sum(X1 > X2 + X3)
prob1 = count1 / N
max_vals = np.maximum(X1, np.maximum(X2, X3))
sum_pairs = X1 + X2 + X3 - max_vals # sum of the other two
count2 = np.sum(max_vals > sum_pairs)
prob2 = count2 / N
print("Estimated P(X1 > X2 + X3) =", prob1)
print("Estimated Probability that largest > sum of others =", prob2)
Running this simulation should yield approximate results near 1/6 for P(X_1 > X_2 + X_3) and near 1/2 for the largest-sum probability, providing empirical confirmation.
Why are the events X_1 > X_2 + X_3, X_2 > X_1 + X_3, X_3 > X_1 + X_2 mutually exclusive?
Since X_1, X_2, X_3 are all positive (and in fact in (0,1)), if X_1 > X_2 + X_3, there is no way X_2 can simultaneously be greater than X_1 + X_3, because that would require X_2 > X_1 + X_3 while also X_1 > X_2 + X_3, which is impossible. Similar arguments hold for other pairs of events.
What if the numbers are not uniform on (0,1)?
If the X_i come from a distribution other than Uniform(0,1), the integrals and probabilities generally change. The uniform distribution simplifies things greatly because the joint pdf is constant (equal to 1) in the unit hypercube. For other distributions, one would need to integrate their joint pdf appropriately over the region defined by X_1 > X_2 + X_3, or rely on more advanced distributional properties if available.
How does this relate to practical situations?
These kinds of problems appear in reliability engineering or risk assessment, where the condition “one random quantity exceeding the sum of the others” might represent scenarios in which one factor dominates the rest. The uniform assumption is a simplifying abstraction, but the same geometric integration idea can be extended to other distributions, provided we handle the pdf carefully.
Edge cases to consider
Very small random variables: In the uniform (0,1) setting, any X_i can be arbitrarily small, but this does not affect the overall probability—everything is handled uniformly by the integration.
Numerical precision: In a simulation with extremely large N, floating-point rounding might slightly affect the empirical estimates. But for a moderately large N, the results converge fairly quickly to the exact theoretical values.
All these considerations underscore how the 1/6 and 1/2 probabilities arise specifically from uniform distributions over (0,1) and the simple geometry of the unit cube.
Below are additional follow-up questions
How does the problem change if the three random numbers can be correlated instead of independent?
If X_1, X_2, X_3 are not independent, we can no longer assume that the joint density factorizes into the product of individual densities. Consequently, the probability P(X_1 > X_2 + X_3) would require integrating the actual joint density f(x_1, x_2, x_3) over the region x_1 > x_2 + x_3 within the unit cube (assuming they are still bounded in (0,1)), but now f(x_1, x_2, x_3) might vary from point to point.
For instance, if there is a positive correlation between X_1 and X_2, whenever X_2 takes on larger values, X_1 would also tend to be large, making P(X_1 > X_2 + X_3) potentially smaller or larger depending on how X_3 behaves. One cannot simply multiply marginal probabilities. The key pitfall is that any symmetrical arguments (e.g., each event having the same probability) might break if correlations differ across pairs of variables.
In such cases, symmetry might not exist or might be different (e.g., if X_1 and X_2 are correlated differently than X_2 and X_3). Furthermore, the events {X_1 > X_2 + X_3}, {X_2 > X_1 + X_3}, and {X_3 > X_1 + X_2} might not all be equally likely. One would have to carefully account for how correlation structures influence each part of the space. Integration (or simulation) is typically the surest way to get an exact result.
What if each X_i is drawn from a discrete distribution rather than a continuous one?
When X_1, X_2, X_3 come from a discrete distribution, the probability mass function (pmf) replaces the probability density function, and we would sum over discrete outcomes rather than integrate over a region. The logic of “X_1 > X_2 + X_3” remains, but we must account for how many discrete combinations (x_1, x_2, x_3) satisfy x_1 > x_2 + x_3. Each valid combination has probability p(x_1) × p(x_2) × p(x_3) if the variables remain independent.
A subtlety arises if the discrete distribution has limited support. For example, if the values each X_i can take lie in {0, 1, 2}, many or all outcomes might trivially fail or satisfy x_1 > x_2 + x_3. We can no longer rely on the geometry of the unit cube; we must do a finite summation. Also, if the discrete support is large, enumerating all possibilities can be cumbersome, so a simulation-based approach or partial summation might be more practical.
What changes if the random variables are selected from (0, b) for some b > 1?
If X_1, X_2, X_3 are uniform over (0, b) with b > 1, the event X_1 > X_2 + X_3 still has a certain volume in the 3D space. However, the region is now within the cube (0, b)^3, and the condition x_1 > x_2 + x_3 represents a plane bounding a region in a cube of side length b.
One can rescale the problem by dividing everything by b. In that case, Y_1 = X_1 / b, Y_2 = X_2 / b, Y_3 = X_3 / b each lie in (0, 1). Then P(X_1 > X_2 + X_3) is the same as P(b Y_1 > b Y_2 + b Y_3) = P(Y_1 > Y_2 + Y_3). Because the factor b cancels out, the probability is unchanged from the (0,1) scenario, leading to the same 1/6 result for a given variable exceeding the sum of the other two, and 1/2 for the largest exceeding the sum of the other two. The pitfall here is to assume the scaling somehow changes the probability. In fact, uniform scaling in the domain does not alter the ratio of volumes for these events.
Can the maximum still exceed the sum of the other two if negative values are allowed?
If the domain includes negative values, say X_1, X_2, X_3 ∈ (−a, a) for some a > 0, the notion “X_1 > X_2 + X_3” interacts with the possibility that X_2 + X_3 might be negative. In fact, the chance for one variable to exceed the sum of the others could be even higher in certain areas if negative values are common.
For a uniform distribution on (−a, a), the geometry of the region x_1 > x_2 + x_3 is more complex because x_2 + x_3 can be negative or positive. Meanwhile, if a is large, the sum x_2 + x_3 can vary widely. The symmetry argument that each variable is equally likely to be the largest still holds (since the distribution is symmetrical across indices), but the probabilities themselves may not reduce to a simple fraction like 1/6 or 1/2. One would typically have to integrate over the full domain (−a, a)^3, or employ some geometric approach that accounts for the shape of the region where x_1 > x_2 + x_3.
The pitfall is to assume that the ratio of volumes remains as simple as in the (0, 1) case; negative values significantly change the geometry and can make the region for x_1 > x_2 + x_3 either more or less “favorable” depending on how x_2 and x_3 can be negative.
How does this probability behave as the number of variables grows very large?
For n independent uniform (0,1) random variables X_1, X_2, …, X_n, one might ask: what is the probability that a particular X_i exceeds the sum of the other n−1? As hinted earlier, this probability for a specific X_i is 1/n!. Then, the probability that any one of them is greater than the sum of the others is n × (1/n!) = 1/(n−1)!.
For large n, (n−1)! grows extremely fast, so the probability that any one variable exceeds the sum of the others becomes very small. Hence, seeing such a “dominant” value is increasingly unlikely as n grows. The pitfall is failing to realize that (n−1)! gets large rapidly, so for, say, n=10, 1/(9)! = 1/362880, which is already a tiny probability. This can be counterintuitive if someone expects that with more random draws, the chance of a very large outlier might increase. However, in a relative sense, all uniform(0,1) draws remain bounded, and the requirement to exceed the sum of the other nine draws becomes much more stringent as n increases.
Could the condition be relaxed to X_1 >= X_2 + X_3 instead of a strict inequality?
If the problem asks for X_1 >= X_2 + X_3, the difference is that we include the boundary where X_1 = X_2 + X_3. In the continuous case of uniform(0,1) variables, the probability that X_1 = X_2 + X_3 exactly is zero because we are dealing with a set of measure zero in the continuous space. Thus P(X_1 >= X_2 + X_3) = P(X_1 > X_2 + X_3) in the continuous uniform setting.
The subtlety appears in discrete distributions, or in cases with a probability mass on boundaries (e.g., a mixture distribution that can take on exact values with nonzero probability). In those scenarios, the boundary condition could add some nontrivial amount to the probability. A common mistake is to assume that the boundary event never contributes if the distribution is purely continuous, but if there's any discrete mixture part (like point masses at 0 or 1), one must handle that carefully.
In what situations would the events {X_i > sum of the others} not be mutually exclusive?
In general, for positive random variables, these events remain mutually exclusive. If X_1 > X_2 + X_3, there is no room for X_2 to also be greater than X_1 + X_3, or X_3 to be greater than X_1 + X_2, because that would require contradictory inequalities among positive quantities. However, if random variables can take negative values, one might conceive pathological distributions where more than one of these conditions could hold simultaneously, for example if some variables are negative enough.
Specifically, if X_2 < 0 and X_3 < 0 while X_1 is positive, you could theoretically have X_1 > X_2 + X_3 and simultaneously X_2 > X_1 + X_3 if X_1 + X_3 is also negative (though that would require some carefully balanced values). This is a major pitfall when extending the problem to signed random variables: mutual exclusivity might break down. In such cases, you cannot sum probabilities of individual events to find the total probability without subtracting off any intersections. Thorough analysis or direct computation of P( max X_i > sum of others ) is required.
How could non-uniform distributions (like Beta, Gamma, etc.) impact this problem?
For non-uniform continuous distributions on (0,1), the integration over the region x_1 > x_2 + x_3 is weighted by the joint density, which now depends on parameters of the distribution. For example, if X_i are Beta(α, β), the pdf is not constant, so some parts of (0,1) have higher density than others. Intuitively, if α < 1, the distribution has more mass near 0, possibly making large values rarer, which affects the probability that one variable dominates.
Similarly, if α > 1, the distribution is skewed more heavily toward 1, possibly increasing or decreasing the chance that a single variable is sufficiently large relative to the other two, depending on their joint behavior. A common pitfall is to assume symmetry arguments alone will yield the same 1/6 and 1/2 results; while symmetry among the variables might still hold, the probability that X_1 is greater than X_2 + X_3 depends on how the distribution’s shape allocates mass in that region of the cube. Numerical integration or advanced analytical tools are generally needed to find an exact probability for each condition.