ML Interview Q Series: Characterizing the Poisson Distribution via a Unique Functional Expectation Identity

May 08, 2025

Browse all the Probability Interview Questions here.

A random variable X is Poisson-distributed with mean λ. Show that for any bounded function g(x) defined on the nonnegative integers,

for every bounded function g(x), and if the expected value of X is λ, then X must be Poisson-distributed with parameter λ.

Short Compact solution

Part (a) follows by directly summing over k from 0 to infinity and using the fact that X has a Poisson(λ) mass function p_k = e^-λ (λ^k / k!). By the substitution rule for expectations,

E[ λ g(X+1) ] = ∑_k=0^∞ λ g(k+1) e^-λ λ^k / k!, E[ X g(X) ] = ∑_k=0^∞ k g(k) e^-λ λ^k / k!,

which cancel out, showing E[ λ g(X+1) - X g(X) ] = 0.

For part (b), define p_j = P(X = j). Choose g(k) as an indicator that k equals some fixed i ≥ 1. Then E[ λ g(X+1) - X g(X) ] = 0 becomes λ p_i-1 - i p_i = 0. Solving the recursion p_i = (λ / i) p_i-1 leads to the Poisson form p_i = (λⁱ / i!) p₀. Imposing ∑_i=0^∞ p_i = 1 then determines p₀ = e^-λ, so

Hence X must be Poisson(λ).

Comprehensive Explanation

Understanding why E[ λ g(X+1) - X g(X) ] = 0 holds for a Poisson(λ) random variable

For a Poisson-distributed random variable X with parameter λ, we know that p_k = P(X = k) = e^-λ (λ^k / k!). The goal is to show

Recall that for any function h(X),

E[h(X)] = ∑_k=0^∞ h(k) p_k.

Applying this to h(X) = λ g(X+1) and h(X) = X g(X), we get

• E[ λ g(X+1) ] = ∑_k=0^∞ λ g(k+1) p_k • E[ X g(X) ] = ∑_k=0^∞ k g(k) p_k.

Substitute p_k = e^-λ λ^k / k! into these sums. Then

E[ λ g(X+1) ] = ∑_k=0^∞ λ g(k+1) e^-λ (λ^k / k!) = e^-λ ∑_k=0^∞ λ^k+1 / k! · g(k+1).

Similarly,

E[ X g(X) ] = ∑_k=0^∞ k g(k) e^-λ λ^k / k! = e^-λ ∑_k=0^∞ (k λ^k) / k! · g(k).

By carefully shifting the index in the first sum (letting k+1 = l, so k = l−1) and comparing terms, these two sums coincide. Hence their difference is zero:

E[ λ g(X+1) - X g(X) ] = 0.

Why this condition characterizes the Poisson distribution

Part (b) asks us to show that if X is an integer-valued random variable with E[X] = λ, and the condition E[ λ g(X+1) - X g(X) ] = 0 holds for all bounded g, then X must be Poisson(λ).

The essential insight is to recognize that choosing g as an indicator for the event X = i isolates the probabilities p_i. Specifically, let g_i(x) = 1 if x = i and 0 otherwise. Then

E[ λ g_i(X+1) ] = λ p_i-1, E[ X g_i(X) ] = i p_i.

Hence the condition becomes:

λ p_i-1 - i p_i = 0,

which rearranges to

It follows that p_i = (λ / i) p_i-1. Iterating this relationship yields

p_i = (λⁱ / i!) p₀.

Since ∑_i=0^∞ p_i = 1, summing the geometric-like series in λⁱ/i! forces p₀ to be e^-λ. Consequently,

which is exactly the Poisson distribution with parameter λ.

This shows that if X has expectation λ and satisfies the indicated moment condition for all bounded g, that condition is strong enough to force X to be Poisson-distributed.

Potential Follow-Up Questions

How does choosing an indicator function g help prove the characterization?

Choosing an indicator function g_i(x) = 1(x = i) zeroes out all terms in the expectation except those involving P(X=i) or P(X=i-1). This transforms the equality E[ λ g(X+1) - X g(X) ] = 0 into a direct algebraic relationship involving just p_i and p_i-1. Because this relationship holds for all i ≥ 1, you get a simple recurrence that completely identifies the distribution.

Why do we need the condition to hold for all bounded functions g?

Requiring the condition for all bounded g is effectively requiring it for all linear combinations of indicator functions. This is enough to pin down all probabilities p_i, because linear combinations of indicator functions can represent any bounded function on the nonnegative integers. In other words, the collection of indicator functions forms a basis for all bounded discrete functions in that domain.

Could any other distribution satisfy the same condition for all bounded g?

No. The step-by-step indicator approach systematically derives a unique set of probabilities that satisfy the recursion with λ p_i-1 = i p_i. The only probability mass function (with ∑ p_i = 1) that meets this condition is the Poisson(λ) distribution. Therefore, no other discrete distribution can satisfy it for all bounded g.

Is the condition E[ λ g(X+1) - X g(X) ] = 0 related to a martingale or moment generating property?

Yes. In fact, λ g(X+1) - X g(X) can be seen as the difference in certain conditional expectations and is closely tied to the idea that the Poisson process has stationary and independent increments. In more advanced treatments, one can interpret the process N(t) (the number of arrivals by time t, which is Poisson-distributed) and show that M(t) = λ f(N(t)+1) - N(t) f(N(t)) might form a martingale-like expression under the right conditions.

How would one verify this in code for a fixed g?

Below is a simple Python snippet that demonstrates the numerical verification for a particular bounded g, for example g(x)=x+1 (or any bounded function). We can approximate E[ λ g(X+1) - X g(X) ] by summation over a reasonable range of k, given λ is not too large.

import math

def poisson_pmf(k, lam):
    return math.exp(-lam) * (lam**k) / math.factorial(k)

def check_condition(lam, g, max_k=50):
    # Approximate the expectation up to max_k
    # for higher values of k if needed.
    expectation = 0.0
    for k in range(max_k+1):
        p_k = poisson_pmf(k, lam)
        term = lam*g(k+1) - k*g(k)
        expectation += term * p_k
    return expectation

# Example usage:
lam = 2.5
g = lambda x: (x+1)**2  # Some bounded-like function for demonstration
res = check_condition(lam, g, max_k=50)
print("Approximate E[lambda*g(X+1) - X*g(X)]:", res)

If you run this code for a reasonable range of k up to, say, 50 or 60 (depending on how large λ is), you should see the approximate expectation close to 0. This reflects the theoretical property we proved.

What if λ is not the true mean of X?

If E[X] ≠ λ, then the relationship λ p_i-1 = i p_i would not be consistent with ∑ i p_i = λ, and eventually one sees contradictions when applying the condition to indicators. So the condition E[ λ g(X+1) - X g(X) ] = 0 alone might be satisfied for some distributions if we do not also impose the condition that E[X] = λ. Once E[X] = λ is demanded, the recursion forcing p_i to be Poisson(λ) emerges without contradiction.

What if X takes values outside the nonnegative integers?

The statement of the problem restricts X to the nonnegative integers. If X were allowed to take on other integer values (like negative integers) or real values, the relationship E[ λ g(X+1) - X g(X) ] = 0 could break down or at least not imply a Poisson distribution in any straightforward sense. The characterization is unique to discrete, nonnegative integer-valued random variables.

All of these considerations reinforce why the given condition strongly implies that X must be Poisson with parameter λ, completing the argument and analysis.

Below are additional follow-up questions

Could the result still hold if we only checked the condition for a limited subset of functions g?

A key part of the proof is that the condition holds for all bounded functions g. If one only checks the condition on a narrower class of functions—say only polynomials up to a certain degree—there’s a risk of missing important constraints that pin down the entire distribution. Essentially, using all bounded g is equivalent to using all finite linear combinations of indicator functions for every integer i. This ability to choose indicator functions isolates the individual probabilities p_i. If we restrict g to a smaller family (e.g., polynomials of degree up to n), we might only isolate certain linear constraints on p_i, which might still be satisfied by distributions other than the Poisson. Hence, we would lose uniqueness and fail to prove that X is Poisson(λ).

Pitfall: A naive approach might attempt to test polynomials of low degree and erroneously conclude that the distribution is Poisson(λ). However, a different distribution could also satisfy those low-degree polynomial conditions but deviate significantly when tested with more general bounded g. This subtlety highlights the necessity of spanning the entire space of bounded functions on the nonnegative integers.

Real-world issue: In practical data scenarios, we might only test a handful of g functions and see approximate validity of E[λ g(X+1) – X g(X)] ≈ 0. It does not guarantee Poisson(λ) unless we have theoretical or empirical reason to test a sufficiently rich family of g.

If the distribution were truncated, does this characterization still hold?

A truncated distribution is one where X has a maximum possible value N (for example, X can only take values 0 through N). In that scenario, even if we impose E[λ g(X+1) – X g(X)] = 0 for bounded g, the indicator-based argument is no longer straightforward. For i close to N, p_i might not follow the same recursion because there is no probability mass for i > N. Hence, the standard step λ p_{i-1} = i p_i fails once i > N.

Potential pitfall: If X were artificially truncated, it might initially satisfy λ p_{i-1} = i p_i for i < N, but at i = N we must have p_N absorb the tail that would otherwise extend to infinity for a Poisson distribution. Consequently, you cannot conclude the distribution is infinite-support Poisson. You either end up with a truncated Poisson distribution (with a normalizing constant different from e^(-λ)) or some other finite-support discrete distribution.

Real-world issue: In real-world applications, certain count data processes might have an upper bound (e.g., maximum capacity in a queue). Even though the generation mechanism is nearly Poisson, the truncation introduces boundary effects that violate the recursion at high count values. The condition E[λ g(X+1) – X g(X)] = 0 for all g no longer strictly holds; thus, the distribution cannot be the full, infinite-support Poisson.

If g is not bounded, would the same proof technique still apply?

The statement explicitly requires g to be bounded on the nonnegative integers. If g is unbounded, the quantity E[λ g(X+1) – X g(X)] might not exist (it could be infinite or undefined). The entire derivation relies on the expectation being finite and well-defined so we can interchange sums, use the indicator trick, etc.

Pitfall: Trying to apply the same argument with unbounded g could cause divergence issues. For instance, if g(x) grows too quickly as x → ∞, we might not have absolute summability, and the expression E[λ g(X+1)] or E[X g(X)] may fail to converge.

Real-world issue: In practice, if we suspect extremely large counts or heavy-tailed distributions, certain function choices might lead to infinite expectations and break the standard manipulations. One must confirm that E[g(X)] is finite for whichever g is being tested.

What happens if λ ≤ 0 or if X can take negative integer values?

The Poisson distribution is defined only for λ > 0 and nonnegative integer X. If λ ≤ 0, the usual parameterization of the Poisson distribution ceases to make sense. Similarly, if X can take negative values, the entire sum-based argument that uses p_k for k ≥ 0 does not apply; we would need a different approach or a different distribution family (e.g., if X can be any integer, we might consider other discrete distributions like the Skellam distribution or similar).

Pitfall: One might attempt to use the same recursion λ p_{i-1} = i p_i for negative indices i–1. But once negative indices are allowed, p_{-1}, p_{-2}, etc. might not even be defined in a typical Poisson context, and the recursion no longer matches up with a standard known distribution.

Real-world issue: In actual analytics, integer-valued data that can go negative might represent gains/losses, net changes, or certain financial instruments. Poisson-based assumptions are then inappropriate. The condition E[λ g(X+1) – X g(X)] = 0 is specifically tailored to the nonnegative, λ > 0, indefinite-support scenario.

Does the variance also play a critical role in confirming the Poisson identity?

For a Poisson(λ) random variable, mean = variance = λ. One might wonder if simply matching the mean to λ is enough or if matching the variance also matters. In this problem’s statement, the key condition with the bounded function g fully enforces that distribution’s structure. Mean alone is not typically enough to characterize a distribution (for example, many distributions share the same mean but differ in variance).

Pitfall: Attempting to use only E[X] = λ to claim that X is Poisson(λ) is not justified—other discrete distributions can have the same mean. We need the extra constraint from E[λ g(X+1) – X g(X)] = 0 for all bounded g to fix the distribution as Poisson(λ).

Real-world issue: In some data applications, we only observe the mean count but not the higher moments. This can be misleading if we assume Poisson(λ) without checking variance or additional constraints. Overdispersed or underdispersed data sets occur commonly in fields like insurance or genomics. The distribution might differ significantly from Poisson despite the same mean.

Could a mixture of Poisson distributions satisfy E[λ g(X+1) – X g(X)] = 0?

Consider a scenario where X is drawn from a mixture of two Poisson distributions with different λ values (e.g., λ1 and λ2). The mixture itself might still have E[X] = λ if the weighted average of λ1 and λ2 is λ. But does it satisfy the same indicator-based recursion?

To see if a mixture can satisfy the recursion: • For i ≥ 1, we require λ p_{i-1} = i p_i. • In a mixture, p_i is a combination of e^(-λ1)(λ1^i / i!) and e^(-λ2)(λ2^i / i!), weighted by mixture probabilities α and 1−α. • This combination typically fails to satisfy the recursion for all i > 0 unless λ1 = λ2 (in which case it’s not really a mixture but a single Poisson).

Pitfall: One might guess that as long as E[X] = λ, any mixture of distributions might do the trick. But the condition E[λ g(X+1) – X g(X)] = 0 is very strict and requires a single parameter λ driving the entire distribution’s factorial moment structure. Mixtures usually break that structure unless the parameters happen to be the same.

Real-world issue: In real data, we often see overdispersion relative to a single Poisson. A standard approach is to model it with a Poisson-gamma mixture (the negative binomial). This mixture can match the mean but definitely does not satisfy E[λ g(X+1) – X g(X)] = 0 for all bounded g, because the recursion fails in ways that reflect the extra dispersion. Thus, such distributions are ruled out by the condition in the problem.

Could the statement be generalized to multidimensional Poisson distributions or vector-valued X?

This question touches on the idea of multinomial or multivariate Poisson distributions. For a random vector X = (X_1, X_2, …, X_n) with each component nonnegative integer-valued, we might try an analogous condition of the form E[λ_j g(X+e_j) – X_j g(X)] = 0 for each j. Here, e_j is the standard basis vector that increments the jth component by 1. However, the full formal statement would require a more elaborate approach to show that each X_j must be marginally and jointly distributed in a certain way.

Pitfall: Simply writing E[λ g(X+e_j) – X_j g(X)] = 0 for each j is no longer a single recursion but a system of coupled recursions. If the components of X are correlated, we need additional constraints that handle the joint probabilities p_{k1, k2, …, kn}. In the univariate case, the recursion is straightforward and leads to Poisson. In the multivariate setting, the “Poisson” concept might mean an independent Poisson vector or a more complex correlation structure that may or may not satisfy a direct extension of the recursion.

Real-world issue: Many real-world count datasets are multidimensional (e.g., number of successes in multiple categories). Poisson distributions can appear in each dimension but may have correlations. For modeling multiple correlated count variables, one might use log-linear models or copula-based approaches. The simple univariate recursion formula does not trivially generalize.

Rohan's Bytes

Discussion about this post