ML Interview Q Series: Monte Carlo Simulation: Estimating Pi Using Geometric Probability
Browse all the Probability Interview Questions here.
Question
Consider the following experiment. You draw a square of width 1 foot on the floor. Inside the square, you inscribe a circle of diameter 1 foot. The circle will just fit inside the square. You then throw a dart at the square in such a way that it is equally likely to fall on any point of the square. What is the probability that the dart falls inside the circle, and how might this process be used to estimate the value of pi?
Short Compact solution
All points in the square are equally likely, so the probability of landing inside the circle is the ratio of the circle’s area to the square’s area. The square’s area is 1, and the circle’s area is pi/4 (because the radius is 1/2). Hence, the probability is pi/4. If we do not know pi, we can estimate it by performing this random experiment many times and recording the fraction of times the dart falls inside the circle. Multiplying that fraction by 4 yields an approximate value of pi.
Comprehensive Explanation
Relationship Between Areas
We have a square of side length 1 foot. Hence, its area is 1 foot^2.
Inside that square, we inscribe a circle whose diameter is also 1 foot. The circle’s radius is therefore 0.5 foot. The area of a circle is pi * (radius^2). Thus:
Because the dart is uniformly distributed over the square’s surface, the probability of hitting any particular region within the square is proportional to that region’s area. Hence, the probability that the dart lands inside the circle is the ratio of the circle’s area (pi/4) to the square’s area (1):
Estimating pi
If you repeat this experiment many times, the proportion of darts that land inside the circle approximates pi/4. Solving for pi, we get:
Therefore, one can empirically estimate the value of pi by multiplying that observed fraction by 4.
Monte Carlo Perspective
In Machine Learning and Data Science, such a procedure is a classic Monte Carlo method. Monte Carlo methods estimate numerical quantities by repeated random sampling. In this case, we are estimating pi by sampling points uniformly in a 1x1 square. The fraction of those points that lie in the inscribed circle converges to pi/4 as the number of trials grows very large. By multiplying the fraction by 4, we approximate pi.
Below is a simple Python snippet demonstrating this technique:
import random
def estimate_pi(num_samples=10_000_000):
inside_count = 0
for _ in range(num_samples):
x = random.random() # in [0, 1)
y = random.random() # in [0, 1)
# Check if the point is inside the circle centered at (0.5, 0.5) with radius 0.5
if (x - 0.5)**2 + (y - 0.5)**2 <= 0.5**2:
inside_count += 1
return 4.0 * inside_count / num_samples
approx_pi = estimate_pi()
print("Estimated pi:", approx_pi)
Potential Follow-up question: Accuracy and Number of Samples
A common question is how many dart throws are required to achieve a desired accuracy. In general, the error in Monte Carlo estimation decreases on the order of 1/sqrt(N) where N is the number of samples. To achieve more precise estimates, you need more dart throws.
Potential Follow-up question: Implementation Details in Real Life
In a real physical experiment, uniform throwing over the square might be hard to achieve perfectly. Deviations from uniformity will bias the estimate of pi. One might need to ensure randomization is as uniform as possible or apply appropriate corrections if the distribution is not perfectly uniform.
Potential Follow-up question: Effect of Imperfect Geometry
In real setups, if the circle is not perfectly inscribed or the square is not drawn precisely, the area ratio changes and biases the estimate. One needs to measure the diameter and square dimensions carefully. Even a small deviation in the circle’s radius can change the ratio significantly.
Potential Follow-up question: Generalization to Higher Dimensions
This method generalizes to higher-dimensional volumes. For example, to estimate the volume of a hypersphere inside a hypercube, one can randomly sample points in the hypercube and compute the fraction that lie within the hypersphere. This leads to estimations of constants analogous to pi in higher dimensions, although variance tends to grow in higher dimensions.
Potential Follow-up question: Does This Method Work If Distribution Is Non-uniform?
The key assumption in this method is that the dart (random point) is uniformly distributed in the square. If points are chosen from a non-uniform distribution, the probability measure changes and we cannot simply take the fraction as pi/4. One would have to apply methods such as importance sampling or re-weight the sample points accordingly.
All these considerations highlight both the power of Monte Carlo methods and the care needed to handle real-world or high-dimensional generalizations.