ML Interview Q Series: Analyzing Defective Bolts: Poisson Approximation and Binomial Distribution of Perfect Boxes.
Browse all the Probability Interview Questions here.
Suppose that 0.3% of bolts made by a machine are defective, the defectives occurring at random during production. If the bolts are packaged in boxes of 100, what is the Poisson approximation that a given box will contain x defectives? Suppose you buy 8 boxes of bolts. What is the distribution of the number of boxes with no defective bolts? What is the expected number of boxes with no defective bolts?
Short Compact solution
Let D be the number of defective bolts in a box of 100. Then D follows a Binomial distribution with parameters 100 and 0.003. When we approximate this binomial distribution by a Poisson with parameter 0.3, the probability that D equals x defectives is approximately e-0.3 (0.3)x / x!. In particular, the probability that D=0 is approximately 0.7408.
If we buy 8 boxes of bolts, let N be the number of boxes with no defective bolts. Then N follows a Binomial distribution with n=8 and p=0.7408, so the expected value of N is 8 * 0.7408 = 5.926.
Comprehensive Explanation
Underlying Binomial Setting
A box contains 100 bolts. Each bolt has a probability 0.003 of being defective. Assuming independence of defects from bolt to bolt, the total count of defectives D in a box follows a Binomial distribution:
In this expression, x is the number of defective bolts in the box. This is often denoted Binomial(n=100, p=0.003). However, when n is relatively large (100) and p is relatively small (0.003), it is common to approximate this Binomial distribution by a Poisson with parameter lambda = n * p = 100 * 0.003 = 0.3.
Poisson Approximation
Using the Poisson approximation, we set λ = 0.3. The approximate probability that a box contains x defectives is given by:
Here, λ=0.3 captures the average number of defectives per box. The probability of having no defective bolts in a box is then:
P(D=0) = e-0.3 * (0.3)0 / 0! = e-0.3 ≈ 0.7408
Distribution of Boxes With No Defects
When you purchase 8 boxes, the number of boxes that have zero defective bolts can be viewed as a Binomial random variable N with parameters n=8 and p=0.7408, where p=0.7408 is the probability that a single box has no defectives (as just calculated via the Poisson approximation).
Thus:
Expected Value of N
The expected number of such “perfect” boxes (boxes with no defectives) is n * p = 8 * 0.7408 ≈ 5.926.
Hence, on average, out of 8 boxes, you would expect about 5.926 boxes to be completely defect-free.
Potential Follow-Up Question: Why is the Poisson approximation valid here?
The Poisson approximation to the Binomial is typically justified when n is large and p is small, such that n * p = λ remains moderate. In this problem, n=100 and p=0.003 so λ=0.3, which is reasonably small. This scenario matches the standard rule of thumb for the Poisson approximation (np ≤ 10 or so, and p quite small). Mathematically, as n→∞ and p→0 in such a way that n*p converges to λ, the Binomial distribution converges to Poisson(λ).
Potential Follow-Up Question: Could we have used a normal approximation?
A normal approximation to Binomial(100, 0.003) would have mean μ = 100 * 0.003 = 0.3 and variance σ² = 100 * 0.003 * (1 - 0.003) ≈ 0.2991. While a normal approximation might be used in principle, it is generally less accurate when p is extremely small and the variable x is near zero. The Poisson approximation is more precise in capturing the probability of rare events (like zero defectives).
Potential Follow-Up Question: What happens if the defect rate changes?
If the defect rate p changes significantly, say increases to 0.03, then λ = 100 * 0.03 = 3. In that case, you can still approximate the Binomial with a Poisson(3), although it might start losing some accuracy if p is not very small. Nevertheless, for moderate λ, Poisson is still frequently a good approximation, especially if the underlying assumptions of independence remain valid.
Potential Follow-Up Question: What if defects are not independent across bolts?
The entire Binomial (and thus Poisson) approach relies on the assumption that each bolt’s defect status is independent. In reality, manufacturing processes could introduce correlations—if one bolt is defective, certain production conditions might raise or lower the chances of another bolt being defective. If such dependence is strong, the binomial model may misrepresent the true probabilities. One would need more sophisticated models (e.g., Beta-Binomial or other compound distributions) to capture correlated defects appropriately.
Potential Follow-Up Question: Implementation details in Python
Below is a brief code snippet demonstrating how one might use Python for approximating these probabilities and performing the necessary calculations. This code uses straightforward math libraries without specialized statistics packages, although libraries such as scipy
and numpy
provide built-in methods for Poisson and Binomial probabilities:
import math
# Poisson parameter
lam = 0.3
# Probability of 0 defects in a box using Poisson
p_zero_poisson = math.exp(-lam) * (lam**0) / math.factorial(0)
print("Probability of 0 defects (Poisson approximation):", p_zero_poisson)
# Probability that out of 8 boxes, exactly k have no defects
# This is Binomial(8, p_zero_poisson)
def binomial_pmf(k, n, p):
return math.comb(n, k) * (p**k) * ((1-p)**(n-k))
# Expected number of boxes with no defects
expected_value = 8 * p_zero_poisson
print("Expected number of boxes with no defects:", expected_value)
In an interview context, you should be able to discuss why the Poisson approximation is appropriate, the assumptions involved, and how to adapt your approach if those assumptions were to change.