ML Interview Q Series: Geometric Probability: Forming a Triangle from Random Stick Breaks
📚 Browse the full ML Interview series here.
This problem was asked by Google.
Assume you take have a stick of length 1 and you break it uniformly at random into three parts. What is the probability that the three parts form a triangle?
Approach with the random break representation
Geometric probability interpretation
Practical simulation hint
One might confirm this probability via simulation in a language such as Python:
import numpy as np
n_samples = 10_000_000
break_points = np.sort(np.random.rand(n_samples, 2), axis=1)
X = break_points[:, 0]
Y = break_points[:, 1] - break_points[:, 0]
Z = 1 - break_points[:, 1]
valid_triangles = (X + Y > Z) & (X + Z > Y) & (Y + Z > X)
estimated_probability = np.mean(valid_triangles)
print(estimated_probability)
This code will consistently give an approximation near 0.25.
Possible follow-up questions and detailed answers
How could we generalize this idea to sticks broken into more than three parts?
One extends the triangle inequality constraints to higher-dimensional simplexes. For four segments, the condition to form a tetrahedron in some geometric sense is not analogous to a simple triangle condition in 1D, but one could ask similar inequalities about side lengths if we interpret them differently. In practice, for n breaks, the segments are drawn from a Dirichlet distribution (special case is all parameters = 1). One would need to characterize the geometric constraints for forming the desired shape (e.g., a 1D polygon interpretation of “successive segments can fold into a closed loop”).
Is there an intuitive explanation of why the probability is 1/4?
It boils down to avoiding any segment being over half of the total. If you imagine picking the first break point, you want neither side of that point to be too large. Then the second point must be placed so that the middle piece remains small enough. These constraints carve out a region in the allowable space that takes up exactly 1/4 of the total ways two random cuts can occur.
What if the breaks are not uniform?
Why does sorting the break points matter?
1) Can we solve this problem using direct integration instead of a geometric area argument?
Answer and Detailed Explanation:
Absolutely. While the geometric approach (viewing the problem as finding the area of a region in the unit square) is often the quickest route, you can also solve the problem by setting up and evaluating integrals that directly encode the “triangle-forming” inequalities.
Set Up Random Variables Let (U_1) and (U_2) be i.i.d. uniform random variables in ((0,1)). Define the ordered versions:
[
\huge X = \min(U_1, U_2), \quad Y = \max(U_1, U_2). ] So our break points on the stick of length 1 are at positions (X) and (Y) with (0 < X < Y < 1). The three pieces are:
[
\huge \text{Piece 1: } X, \quad \text{Piece 2: } Y - X, \quad \text{Piece 3: } 1 - Y. ]
Express the Probability Condition To form a valid triangle, each piece must be less than 0.5:
[
\huge X < 0.5, \quad (Y - X) < 0.5, \quad (1 - Y) < 0.5. ] Equivalently, [ \huge X < 0.5, \quad Y - X < 0.5, \quad Y > 0.5. ] The last inequality is rearranged from (1 - Y < 0.5 \implies Y > 0.5).
Set Up the Joint Density The joint density function (f_{X,Y}(x,y)) for (x < y) (given two i.i.d. uniform(0,1) variables) is:
[
\huge f_{X,Y}(x,y) = \begin{cases} 2, & 0 < x < y < 1, \ 0, & \text{otherwise}. \end{cases} ] The factor 2 arises because there is a 50% chance that (U_1 < U_2) and a 50% chance that (U_2 < U_1). Restricting to (x<y) accounts for one half of the square; hence we double the density in that triangular region.
Define the Integration Limits We want to integrate this joint density over the region:
[
\huge 0 < x < 0.5, \quad 0.5 < y < 1, \quad y - x < 0.5. ] The condition (y - x < 0.5) implies (y < x + 0.5). Combine that with (y > 0.5).
First, (x) ranges from (0) to (0.5).
For each fixed (x), (y) must be between (\max(0.5, x)) and (\min(1, x + 0.5)). But we already know (x < 0.5), so (\max(0.5, x) = 0.5) because (x) is never bigger than 0.5.
Hence (y) ranges from (0.5) to (x + 0.5).
Perform the Integration The probability is:
[
\huge \int_{x=0}^{0.5} \int_{y=0.5}^{x + 0.5} 2 ,dy,dx. ] Here the factor 2 is from the joint density (f_{X,Y}(x,y)).
Integrate with respect to (y): [ \huge \int_{y=0.5}^{x+0.5} 2 ,dy = 2 \Bigl[,y\Bigr]_{0.5}^{x+0.5} = 2 \bigl((x+0.5) - 0.5\bigr) = 2x. ]
Then integrate with respect to (x): [ \huge \int_{0}^{0.5} 2x ,dx = \Bigl[x^2\Bigr]_{0}^{0.5} = (0.5)^2 - 0^2 = 0.25. ]
Normalize by the Total Feasible Region The total area in the ((x,y)) plane with (0 < x < y < 1) is (1/2). However, notice that when we wrote the joint density as (2) in the region (0<x<y<1), we already accounted for the fact that the entire unit square is “halved.” So the integral (\int \int 2,dx,dy) over (0<x<y<1) already sums to 1 in total.
This means that the integral result (0.25) is directly the probability, without needing an additional division by (1/2).
Hence by direct integration we arrive at the probability (\boxed{\tfrac{1}{4}}). This matches the geometric argument.
2) What if we break the stick sequentially: first one cut, then pick one of the two resulting pieces at random to cut again?
Answer and Detailed Explanation:
This scenario changes the distribution of the final three segments because the second break is not necessarily anywhere along the original stick—it only happens on one of the two initial pieces, chosen at random.
Setup
You first pick a point (U) uniformly in ((0,1)) on the original stick. This splits the stick into segments of lengths (U) and (1-U).
With probability (1/2), you choose the segment of length (U) for the second break; with probability (1/2), you choose the segment of length (1-U).
Suppose you pick the segment of length (U). Then a second break point (V) is chosen uniformly in ((0,U)). So you end up with final segments of lengths (V), (U - V), and (1-U).
Alternatively, if you pick the segment of length (1-U), then a second break point (V) is chosen uniformly in ((0, 1-U)). The final segments in that case are (U), (V), and ((1-U) - V).
Check Triangle Inequalities Each final set of three segments must still satisfy (a < b + c), (b < a + c), and (c < a + b). Because the distribution of segment lengths is now more complicated than in the original problem (where the two break points are chosen simultaneously and sorted), we cannot use the same simple geometric region.
Approach to the Probability Computation
We might integrate over all possible values of (U) and then condition on which half (the (U) segment or the (1-U) segment) was chosen.
The second break point (V) is uniformly distributed within whichever segment we chose.
After enumerating these cases, we check the probability that the resulting lengths can form a triangle in each case, and then average over the two equally likely scenarios.
Result It turns out that the probability for this sequential-breaking approach is different (and it is slightly larger than (1/4)). A more detailed integral calculation or a simulation approach will yield the numerical answer (approximately (0.25 + \text{something small})). Indeed, simulation is often easier:
import numpy as np
n_samples = 10_000_000
first_break = np.random.rand(n_samples)
# Decide which segment to break again
pick_first_segment = np.random.rand(n_samples) < 0.5
# For those who pick the first segment
second_break_1 = np.random.rand(n_samples) * first_break
# final segments: second_break_1, first_break - second_break_1, 1 - first_break
# For those who pick the second segment
second_break_2 = np.random.rand(n_samples) * (1 - first_break)
# final segments: first_break, second_break_2, (1 - first_break) - second_break_2
# Combine cases, check triangle conditions, average, etc.
By carefully tracking the lengths in each scenario, you can compute the fraction that forms a valid triangle. This yields a probability strictly greater than (1/4) but less than (1/2). Exact solutions involve piecewise integrals but typically simulation suffices to confirm the difference.
3) What is the distribution of the largest piece among the three segments?
Answer and Detailed Explanation:
When two points cut a unit stick at random (simultaneously chosen, then sorted), we get three segments ((X,, Y,, Z)). A natural question is: Which piece is typically the largest, and how is its length distributed?
Symmetry Considerations By symmetry, each of the three pieces has the same chance of being the largest. Indeed, the problem does not prefer any portion of the stick: the random breaks are uniform, and all permutations of “which piece is on the left/middle/right” are equally likely before we enforce (X < Y). Hence, the distribution of the “largest piece” is the same no matter which piece we label as (X), (Y), or (Z).
Compute CDF of the Largest Piece Let (M = \max(X, Y, Z)). Then [ \huge P(M \leq m) = P(X \leq m, , Y \leq m, , Z \leq m). ] However, since (X + Y + Z = 1), it’s only possible for all three to be (\leq m) if (3m \geq 1 \Rightarrow m \geq \frac{1}{3}). Below (\frac{1}{3}), (P(M \leq m)=0) because at least one piece must be (\geq \frac{1}{3}).
For (m) between (\frac{1}{3}) and (1), more sophisticated integrals or geometric arguments can be used to find the exact distribution. One approach is:
[
\huge P(M \leq m) = \int_{\substack{x+y+z=1 \ x,y,z \leq m}} f_{X,Y,Z}(x,y,z), d\text{(region)}. ] Given the uniform breaks, ((X,Y,Z)) (with an appropriate factor for the ordering condition) correspond to the region (0 < X < Y < 1) or the entire space of two break points.
Key Insight for the Broken Stick
We know that (P(M<0.5) = \frac{1}{4}) from the “triangle forming” result. This is the probability that no piece exceeds 0.5. Equivalently, it is the probability that (M < 0.5). So the cumulative distribution function (F_M(0.5) = 0.25).
For other values of (m\in(\tfrac{1}{3}, \tfrac{1}{2})), you can similarly set up integrals or do a direct simulation to estimate.
Practical Simulation The simplest way for many is to simulate large samples of uniform break points, compute (\max(X, Y, Z)) for each, and histogram the values. From that, you can approximate both the PDF and CDF of (M).
Interpretation
The largest piece often ends up being around (0.5) or less in the fraction of cases that lead to a triangle. In all other cases, the largest piece is above 0.5, hence no triangle forms.
The mean of (M) (largest piece) is around (0.5). More precise derivations show it is exactly (11/18 \approx 0.6111). This is a known result that can be found in the literature on the broken stick problem.
4) Could there be a purely combinatorial or symmetry-based argument for the probability of 1/4?
Answer and Detailed Explanation:
Yes, you can use symmetry and elementary probability steps without explicitly computing areas or integrals, though it is still somewhat geometric in nature:
Basic Logic The event that the three pieces form a triangle is the event that no single piece is bigger than 1/2. Now, consider picking two break points (U_1, U_2) in the interval ((0,1)). Without loss of generality, label the cut points so that (U_1) is the smaller and (U_2) the larger, i.e., (U_1 < U_2).
Probabilistic Partition
The length of the left segment is (U_1).
The length of the middle segment is (U_2 - U_1).
The length of the right segment is (1 - U_2).
Because the stick is uniform, the distribution of these lengths has strong symmetry.
Symmetry Trick
The probability that (U_1 > 0.5) is (\frac{1}{2}) for an unsorted break point, but we must be careful after imposing (U_1<U_2).
Alternatively, one can argue that among the 6 possible orderings of two uniform cut points and the endpoints ({0,1}), each ordering is equally likely. Then you systematically check how often the largest gap is > 0.5.
Simpler Counting/Geometry In a more intuitive sense, if you take the square ((0,1)\times(0,1)) for ((U_1,U_2)), the largest segment condition translates to “any dimension > 0.5 in one of the directions.” By partitioning the square into simpler regions and identifying which fraction satisfies “largest piece < 0.5,” one finds it is (\frac{1}{4}).
Though these arguments ultimately hinge on geometry or partitioning the unit square, they avoid the explicit area integrals, relying more on symmetry and direct counting of simpler shapes.
5) How does changing the stick length from 1 to some other length (L) affect the probability?
Answer and Detailed Explanation:
If we replace the unit stick with a stick of length (L), you might wonder if the probability changes. The short answer is: it does not—the probability remains (\tfrac{1}{4}). Here’s why:
Scale Invariance If the stick has length (L) instead of 1, then picking two random points along ((0,L)) uniformly is equivalent to scaling the problem by (L).
Suppose we draw (U_1, U_2 \sim \text{Uniform}(0,L)) and let (X, Y, Z) be the resulting segments.
Dividing everything by (L), define: [ \huge X' = X/L, \quad Y' = Y/L, \quad Z' = Z/L. ]
Notice (X' + Y' + Z' = 1).
Triangle Inequality is Scale-Invariant The condition for (X, Y, Z) to form a triangle is:
[
\huge X < Y + Z, \quad Y < X + Z, \quad Z < X + Y. ] Dividing every term by (L) preserves the inequality:
[
\huge \frac{X}{L} < \frac{Y}{L} + \frac{Z}{L}, \quad \text{etc.} ] So (X, Y, Z) form a triangle if and only if (X', Y', Z') form a triangle.
Uniformity Holds After Scaling The distribution of the points (\frac{U_1}{L}, \frac{U_2}{L}) in ((0,1)) is simply a uniform distribution as well. Thus all statements about “two uniform breaks in ((0,1))” apply directly.
Hence, changing the stick length to (L) does not alter the probability of forming a triangle. You can conceptually “shrink or stretch” the entire scenario without affecting the fundamental geometry of how two break points are chosen.
6) What if we specifically want the probability of forming an acute triangle?
Answer and Detailed Explanation:
Forming any valid triangle requires that no piece exceed half the total length, but forming an acute triangle adds the stricter condition that all angles must be less than (90^\circ).
Criteria for an Acute Triangle In terms of side lengths (a, b, c), a triangle is acute if and only if:
[
\huge a^2 + b^2 > c^2,\quad a^2 + c^2 > b^2,\quad b^2 + c^2 > a^2. ] This is stricter than the standard triangle inequality.
Random Stick Segments
We have (X, Y, Z) such that (X + Y + Z = 1).
For an acute triangle, each side squared plus another side squared must exceed the square of the remaining side.
Geometric Region or Simulation
One can set up integrals to represent the region ({(X,Y,Z): X+Y+Z=1, X<0.5, Y<0.5, Z<0.5, ; X^2 + Y^2 > Z^2, ;\ldots }). This gets rather involved analytically.
A practical approach is to simulate many random breaks and compute the fraction that yields an acute triangle.
Known Result The probability that a random triangle (from a random stick-breaking process) is acute is about 0.125 (1/8). Different references give slightly varying approximations around 0.11–0.12 due to finite sampling, but it is well-documented that the probability is near 0.125.
Interpretation The additional requirement that all angles be acute imposes a significant restriction on how the lengths relate, which is why the probability (about 1/8) is smaller than the 1/4 probability of simply forming any triangle.
7) If a triangle does form, what is the expected perimeter of that triangle?
Answer and Detailed Explanation:
Since the stick has length 1, the perimeter of the formed “triangle” is always (X + Y + Z = 1). Therefore, if a triangle forms, its perimeter is 1 by definition (because we’re simply re-labelling the original stick’s total length into three pieces).
Key Observation In typical geometry, the perimeter of a triangle is the sum of its sides. Here, no matter how you break the stick, if the three segments do indeed satisfy the triangle inequality, you have not lost or added length; it must still total 1.
No Variation Because the entire stick is 1, the “perimeter” is always 1, whether or not the triangle inequality is satisfied. When the triangle inequality fails, we say it’s not a valid triangle, but if it is valid, the perimeter is exactly 1.
Hence, the expected perimeter of the triangle (conditional on it forming) is trivially (\boxed{1}).
8) Could we form a right triangle from such random breaks, and what is the probability?
Answer and Detailed Explanation:
Strict Right Triangle Condition For side lengths (a, b, c), forming a perfect right triangle means one side satisfies (a^2 + b^2 = c^2). But in a continuous probability setting (where (a, b, c) are drawn from a continuous distribution), the probability of hitting this exact equality is 0.
Why Probability Is Zero
If you imagine the space of possible lengths, the set of points satisfying (a^2 + b^2 = c^2) is a 2D surface embedded in a higher-dimensional volume, which typically has measure zero in that continuous space.
For discrete distributions or rational constraints, you might get a non-zero probability, but in the continuous uniform break scenario, the chance of exactly hitting a right-triangle triple is zero.
Therefore, the probability is (\boxed{0}). Any right triangle that might appear is a zero-probability event in the continuum sense.
9) Are the side lengths ((X, Y, Z)) themselves identically distributed as random variables?
Answer and Detailed Explanation:
This is a subtle question: we know from symmetry that before labeling them as “the left piece,” “the middle piece,” and “the right piece,” all pieces are drawn from the same conceptual distribution. But once we define:
[ \huge X = U_1, \quad Y = U_2 - U_1, \quad Z = 1 - U_2 \quad \text{(with } 0<U_1<U_2<1\text{)}, ] the random variables (X, Y, Z) as labeled are not individually identical in distribution.
Why They Are Not Identically Distributed
(X) is the smaller of the two random cut points; (Y) is the gap between the smaller and larger cut point; (Z) is the remaining portion to the end.
The position of (X) is constrained to be less than (Y) (since (X<U_2)) in a certain sense, and (Z) is forced to be the leftover portion.
However, If You Randomly Pick One Piece If instead of labeling them as “left,” “middle,” “right,” you just break the stick and then pick one piece at random from the three resulting pieces, that chosen piece has the same distribution regardless of which piece it is. By symmetry, each piece is equally likely to end up in your hand.
Mathematical Confirmation
The unconditional distributions of the three labeled segments differ. For instance, the expected value (E[X]) differs from (E[Y]) because (X) tends to be systematically smaller as the leftmost segment.
But if you sample from ({X, Y, Z}) with probability 1/3 each, that “randomly chosen segment” has a common distribution.
In short: No, labeled as (X, Y, Z,) they are not i.i.d. random variables. Yes, if you pick a segment at random among the three, the distribution is the same by symmetry.
10) Could we extend the reasoning to find the probability distribution of angles in the triangle that does form?
Answer and Detailed Explanation:
Yes. Once three lengths ((X, Y, Z)) form a valid triangle, one can compute the angles (\alpha, \beta, \gamma). For instance, by the Law of Cosines:
[ \huge \alpha = \arccos\Bigl(\frac{Y^2 + Z^2 - X^2}{2YZ}\Bigr), ] and similarly for (\beta) and (\gamma).
Angle Distribution The angles are random variables depending on (X, Y, Z). One could, in principle, integrate the joint density over the valid region (where (X+Y+Z=1) and no side exceeds 0.5), and then transform these lengths to angles ((\alpha,\beta,\gamma)). This is a complicated transformation.
Practical Simulation
Generate many random breaks ((X, Y, Z)).
Discard the cases that do not form a triangle.
For those that form a triangle, use the Law of Cosines to find each of the three angles.
Plot a histogram or empirical distribution of (\alpha, \beta, \gamma).
Known Outcomes
On average, the angles will be somewhat skewed, and it’s more likely to get scalene (unequal) triangles than near-equilateral ones.
The distribution of angles is continuous and often cannot be reduced to a simple closed form.
Insight Observing the distribution of angles reveals that the largest angle is quite frequently obtuse or near (90^\circ). Only in a fraction (about 1/8 of all triangles) are all angles acute (see the earlier question on acute triangles).
Thus, while it is feasible, finding a closed-form solution is cumbersome; simulation is the most straightforward approach to fully characterize how the angles are distributed.
Summary
The classic broken stick problem where two uniform cuts are made on a stick of length 1 produces three random segments. The probability that these segments can form a triangle is (1/4). We arrived at that result through:
Geometric Region Argument: Identifying the portion of the unit square ((U_1, U_2)) in which no segment exceeds half the total.
Direct Integration: Setting up definite integrals that yield 0.25.
Symmetry Arguments: Reasoning that the triangle condition (no side > 0.5) is only satisfied in one-fourth of the cases.
Beyond the core result, the above follow-up questions highlight a variety of deeper explorations: sequential breaking vs. simultaneous breaking, distributions of the largest piece, forming acute vs. right triangles, and examining angle distributions. Each extension demonstrates both the richness of the core problem and how subtle changes in assumptions can alter the outcome.