ML Interview Q Series: Unit Square Probability: Bounding Manhattan Distance Geometrically.

May 15, 2025

Browse all the Probability Interview Questions here.

Question: You choose at random a point (x, y) in the unit square { (x, y) : 0 ≤ x, y ≤ 1 }. The Manhattan distance of (x, y) to the origin (0, 0) is defined as |x| + |y|. For 0 ≤ a ≤ 2, what is the probability that the Manhattan distance of this point to (0,0) is no more than a?

Short Compact solution

Consider the region in the unit square where x + y ≤ a. Since x and y are both non-negative in this square, the Manhattan distance |x| + |y| is simply x + y.

When 0 ≤ a ≤ 1, the subset of points (x, y) satisfying x + y ≤ a forms a right triangle of side length a, giving area = (1/2)*a².

When 1 ≤ a ≤ 2, you can think of the entire unit square minus the region where x + y > a. This leftover region is another right triangle whose side length is 2 - a, so its area is (1/2)(2 - a)². Hence the probability is 1 - (1/2)(2 - a)².

Therefore, the probability P(a) equals (1/2)a² for 0 ≤ a ≤ 1, and 1 - (1/2)(2 - a)² for 1 ≤ a ≤ 2.

Comprehensive Explanation

Understanding the Problem

We randomly select a point (x, y) within the unit square defined by 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. The Manhattan distance to the origin (0, 0) in this context becomes x + y because x ≥ 0 and y ≥ 0, so |x| + |y| = x + y.

The question asks for the probability that x + y ≤ a for a range 0 ≤ a ≤ 2. Geometrically, within the first quadrant, the set of all points such that x + y ≤ a is the region on or below the line x + y = a.

Geometry of the Region for 0 ≤ a ≤ 1

If 0 ≤ a ≤ 1, the line x + y = a intersects the x-axis at x = a, y = 0 and the y-axis at x = 0, y = a. This forms a right triangle in the lower-left corner of the unit square. The area of such a right triangle is (1/2)baseheight. Here, the base = a and the height = a, hence the area is (1/2)*a².

This region exactly captures the points whose Manhattan distance to the origin is ≤ a. Because we assume a uniform distribution over the unit square (with total area = 1), the probability is the area of that triangular region:

Geometry of the Region for 1 ≤ a ≤ 2

When a > 1, the line x + y = a starts to cut across the square in such a way that a portion of the top-right corner (where x + y > a) is excluded. Another way to see this: for 1 ≤ a ≤ 2, the points x + y ≤ a cover most of the square except for a smaller right triangle in the top-right corner (where x + y > a).

This smaller triangle in the top-right corner has side length (2 - a). Its area is (1/2)*(2 - a)². Hence the area corresponding to x + y ≤ a is the entire unit square area 1 minus that small triangle:

So the probability is:

Putting it all together in a piecewise manner:

Edge Cases

a = 0:
- The region x + y ≤ 0 is simply the origin (0, 0). Because we are dealing with continuous probabilities, the area is 0. This matches P(a) = (1/2)*0² = 0.
a = 2:
- The region x + y ≤ 2 within the unit square basically includes all points in the unit square (because the maximum possible x + y in the unit square is 1 + 1 = 2). So the probability is 1 - (1/2)*(2 - 2)² = 1.

Both of these boundary conditions align with the piecewise formula above.

Alternative Ways to Solve

Integration Approach: We can integrate the uniform density 1 over the region x + y ≤ a, x ∈ [0, 1], y ∈ [0, 1]. This naturally splits into two ranges of a (0 ≤ a ≤ 1 and 1 ≤ a ≤ 2) because the line x + y = a intersects the boundary of the unit square differently in each scenario.
Cumulative Distribution for X+Y: If X and Y are independent and uniformly distributed on [0,1], we can recall that the distribution of X+Y is a well-known triangular distribution on [0,2]. Integrating the corresponding PDF from 0 to a also yields the same result.

Practical Considerations

When using simulation, you can generate many points (x, y) uniformly in the unit square and estimate the fraction for which x + y ≤ a. This is a straightforward Monte Carlo approach and will empirically approach the exact probability for large sample sizes.
If you were dealing with negative coordinates or a larger domain, you would have to adjust for the absolute value terms in the Manhattan distance. In the full plane, the shape for |x| + |y| ≤ a would be a diamond shape. Restricting to the first quadrant, it becomes the triangle we’ve examined.

Possible Follow-up Questions

How would you simulate this in Python to verify the result?

You could do a simple Monte Carlo simulation. Below is a sketch of code:

import numpy as np

def estimate_probability(a, num_samples=10_000_000):
    # Generate random points in [0,1] x [0,1]
    x = np.random.rand(num_samples)
    y = np.random.rand(num_samples)

    # Compute Manhattan distance
    distances = x + y

    # Check how many are <= a
    count = np.sum(distances <= a)

    # Return empirical probability
    return count / num_samples

# Example usage:
for test_a in [0.5, 1.0, 1.5, 2.0]:
    print(f"a = {test_a}, estimated prob = {estimate_probability(test_a)}")

One would compare the output against the formula:

(1/2)*a² for 0 ≤ a ≤ 1
1 - (1/2)*(2 - a)² for 1 ≤ a ≤ 2

What if the question allowed negative coordinates?

If x and y could be negative as well, the shape |x| + |y| ≤ a (for a > 0) would form a diamond centered at the origin. Within a 2x2 square (say from -1 to +1 in each dimension), the fraction of area corresponding to |x| + |y| ≤ a would differ. One can derive a piecewise formula accounting for the diamond shape fully.

How does this relate to the distribution of X+Y when X and Y are uniform on [0,1]?

If X and Y are uniform[0,1], the random variable S = X + Y follows a well-known triangular distribution on [0,2]. Its PDF increases linearly from 0 to 1, then decreases linearly from 1 to 2. The CDF for S at point a matches exactly with the piecewise function we derived for P(a).

Why do we split at a = 1?

Because the line x + y = a changes how it intersects the unit square at the point where a = 1. For smaller a, the region x + y ≤ a is a simple right triangle in the bottom-left. For larger a, we must subtract away the triangular region in the top-right instead, because x + y = a extends beyond x=1 or y=1.

Could we solve it by simple reasoning or symmetry?

Yes. For 0 ≤ a ≤ 1, it’s just the triangle below x + y = a. For 1 ≤ a ≤ 2, you can use symmetry: the area x + y ≤ a in [0,1]×[0,1] is equivalent to 1 minus the area x + y > a. That leftover region is a right triangle of side length 2 - a. Hence we get 1 - (1/2)*(2 - a)².

All these considerations ensure that you fully grasp not just the result, but why it is piecewise and how geometry, integration, and distribution-based arguments align perfectly to yield the same probability function.

Below are additional follow-up questions

What if the distribution of (x, y) is not uniform in the unit square?

In many real-world situations, the point (x, y) might come from a different distribution rather than the uniform one. For instance, x could be drawn from a normal distribution truncated to [0,1], and y might be drawn from some skewed distribution, also truncated to [0,1]. In such cases, the probability that x + y ≤ a (for 0 ≤ x, y ≤ 1) can no longer be computed simply by geometric area. Instead, you would integrate the joint probability density function f(x, y) over the region x + y ≤ a.

As a general approach for a non-uniform distribution:

Identify the region R where x + y ≤ a and x, y in [0,1].
Compute ∫∫ f(x, y) dx dy over R.

Potential pitfalls and subtlety:

If the distribution is separable, i.e., f(x, y) = f(x)*g(y) and each is known, the integration might be simpler but still requires splitting the integral depending on whether a ≤ 1 or a ≥ 1.
If the variables are correlated, then f(x, y) ≠ f(x)*g(y). This can complicate the integration significantly, as you must account for the joint density’s correlation structure.

How would the result change if we considered a rectangle [0, M]×[0, N] for M, N > 0 instead of a unit square?

If instead of a unit square, we sample (x, y) uniformly from a rectangle [0, M]×[0, N], the maximum Manhattan distance in that region could be M + N. Hence a would vary in the range 0 ≤ a ≤ (M + N).

The boundary line x + y = a, for a ≤ min(M, N), encloses a right triangle in the corner of the rectangle. For a > min(M, N), the shape’s geometry changes, similar to how it does in the unit-square case when a crosses 1. You must carefully check whether a exceeds M, N, or both. Thus, the formula becomes piecewise with:

For small a, you get an area that is (1/2)*a², but scaled appropriately if M ≠ N.
For large a, you subtract the area of the excluded region from M*N.

Pitfalls and subtlety:

When a is small (less than both M and N), the shape is straightforwardly a triangle of side a. But once a surpasses one of the sides, the portion that remains outside might not be a simple triangle anymore if a is large enough to exceed either M or N but not both.
A robust approach is to split the derivation by checking which part of the rectangle boundary is intersected first by the line x + y = a.

How does this formula extend to higher dimensions for an L1 ball?

Moving to three or more dimensions is a natural follow-up. In d dimensions, the Manhattan distance (also called the L1 norm) of a point (x1, x2, …, xd) to the origin is x1 + x2 + … + xd, assuming all xi ≥ 0 for simplicity. The region x1 + x2 + … + xd ≤ a is a d-dimensional simplex. If we sample uniformly in a hypercube [0, 1]^d, we want the volume of that simplex for 0 ≤ a ≤ d.

The volume of the region { (x1, …, xd) : x1 + … + xd ≤ a } in nonnegative coordinates is a^d / d!. However, once a > 1, the intersection of that simplex with [0,1]^d again becomes more intricate because portions of the hypercube get excluded. One would typically subtract the volume of the region x1 + … + xd > a.

Pitfalls and subtlety:

As dimension grows, geometric intuition gets trickier. The formula for volume of a d-dimensional simplex is straightforward, but combining that with boundaries of a hypercube requires piecewise analysis similar to the 2D case.
Implementing volume integrals in high dimensions can be subject to numerical instability or large computational cost unless you use a direct known formula.

Could rounding or floating-point precision affect a simulation-based estimate?

Yes, in practical computational settings, if you use floating-point arithmetic to check x + y ≤ a, you might encounter floating-point rounding, especially if a is close to 1 or 2. If your simulation is large, these precision discrepancies can bias the count slightly.

Pitfalls and subtlety:

If a is near 1, the difference between x + y and a might be on the order of machine epsilon. This can lead to borderline points being miscounted. Over many samples, the bias could become noticeable if the threshold is tested in a naive way.
In high-performance simulations or large-scale machine learning systems, these boundary effects might matter if you need extremely precise results (like for a reliability test).

What if we only want the probability that x + y < a strictly, not ≤ a?

Mathematically, for continuous distributions, P(x + y ≤ a) is the same as P(x + y < a) because the probability that x + y = a is zero (a single line in the plane). In discrete or hybrid distributions, however, “≤” vs. “<” could matter.

Pitfalls and subtlety:

If your distribution is partly discrete (e.g., x, y might be integers on a grid), then the line x + y = a might contain multiple integer lattice points. In that case, there is a positive probability mass on that line.
Continuous uniform sampling over a region, however, makes the measure of that line zero, so ≤ and < remain equivalent.

How might we incorporate constraints or transformations such as x being scaled by a factor before computing Manhattan distance?

If we introduce a scaling factor to one coordinate, say we compute distance = α·x + β·y, we then look at α·x + β·y ≤ a. The geometry changes to a region bounded by the line β·y = a - α·x. If α, β > 0, that line intersects the axes at (a/α, 0) and (0, a/β). The probability is again the area of a triangle for sufficiently small a, but the threshold for the piecewise changes occurs at different values. You would need to account for the new intersection points within [0,1]×[0,1].

Pitfalls and subtlety:

If α or β is zero or negative, or if x, y can be outside [0,1], you must carefully handle the geometry.
If α, β are large, the line might not intersect the boundaries the same way as in the unit case. The approach remains piecewise, checking which boundary lines are crossed first.

How would you handle a scenario where the boundary of selection is not a perfect square?

Sometimes, we pick points from an arbitrary polygon or region in the plane, not necessarily a square. Even if the Manhattan distance formula is the same, the uniform sampling region might be complex in shape, making the area of the subset more difficult to calculate directly. You might resort to:

Decomposing the region into simpler shapes or using computational geometry routines to find intersection areas.
Using a Monte Carlo method that filters out points outside the polygon, then estimates probabilities by counting the proportion with |x| + |y| ≤ a.

Pitfalls and subtlety:

Polygon edges could create additional intersection segments with the line x + y = a.
Numerical methods for clipping or polygon area calculation can lead to small inaccuracies or corner-case errors for polygons with many edges or arcs.

How does the piecewise approach generalize if we only sample the triangle 0 ≤ x ≤ 1, 0 ≤ y ≤ 1−x?

In this scenario, the sampling region is itself a triangle in the plane. The region 0 ≤ x ≤ 1, 0 ≤ y ≤ 1−x can be seen as all points under the line x + y = 1 in the first quadrant. If we measure the probability that x + y ≤ a, for 0 ≤ a ≤ 2, the geometry is quite different. For a ≤ 1, the feasible set with x + y ≤ a is simply a smaller subtriangle within that bigger triangle. For a > 1, effectively the entire region might get included once a is large enough to exceed the line x + y = 1.

Pitfalls and subtlety:

The maximum possible value of x + y in that region is 1, so if a ≥ 1, the probability is 1. This is a special edge case different from the unit square example.
Ensuring you account for the boundaries that exist in the region is crucial; a naive approach might incorrectly assume the region still extends to (1,1).

How can sampling from a discrete grid change the approach to calculating probabilities?

If (x, y) are chosen uniformly at random from a finite set of points, say a grid of step size δ in [0, 1]×[0, 1], then the probability is count of points satisfying x + y ≤ a divided by the total number of grid points in the square. The geometry intuition still helps, but you have discrete points that approximate the continuous region.

Pitfalls and subtlety:

When a is small or near an integer boundary, the number of grid points inside the set can jump at discrete intervals of a.
The uniform distribution might no longer reflect real continuous sampling, so if you try to approximate a purely continuous problem with a grid, you introduce some approximation error. The smaller δ is, the closer you get to the true continuous probability.

If we randomly choose (x, y) from the boundary alone, does this probability question make sense?

If the point (x, y) is uniformly chosen from the perimeter of the unit square instead of the entire area, the probability that x + y ≤ a is now a ratio of boundary lengths on which x + y ≤ a. This is a completely different measure: rather than area, we’re considering length along the edges.

Pitfalls and subtlety:

Along the boundary of the unit square, you have four line segments. The condition x + y ≤ a might intersect each edge in different ways.
Because it’s no longer a 2D measure, the geometry of “area” is no longer relevant, so the piecewise formula does not apply.

Could the result differ if “distance” was interpreted slightly differently?

Yes. Sometimes the Manhattan distance might be written as D = |x - x0| + |y - y0| for some center (x0, y0). The formula for the probability region then depends on how that diamond shape intersects the square. If (x0, y0) ≠ (0, 0), you need to shift the region accordingly. The shape remains a diamond in general, but the overlapping portion with the square can shift or be cut off differently.

Pitfalls and subtlety:

If the chosen point (x0, y0) is near or outside the square, large portions of that diamond might not lie within the region of interest.
If (x0, y0) is on an edge, it might drastically reduce or enlarge the feasible area for certain distances a.

Rohan's Bytes

Discussion about this post