ML Interview Q Series: Does lower satisfaction among users enabling location-sharing imply the feature caused their dissatisfaction?

May 05, 2025

📚 Browse the full ML Interview series here.

Comprehensive Explanation

The question highlights a crucial distinction between correlation and causation. Observing that individuals who use the location-sharing feature are less happy does not automatically establish the feature as the direct cause of their dissatisfaction. There could be many underlying reasons, such as self-selection bias, additional factors that correlate with the willingness to share location, or the presence of confounding variables (for example, users who travel frequently might use the feature more but also be more critical of certain app functionalities).

Connect with me on X (Twitter)

Why Correlation Is Not Always Causation

When a survey shows that a specific group (those who turn on location sharing) reports lower satisfaction, we only know that two events are related (using location sharing and being less satisfied). This relationship might arise from:

Self-selection: Users prone to privacy concerns or high expectations may be more likely to notice flaws in an app and thus also more likely to participate in the location-sharing feature with some caution or skepticism.
Confounding variables: Users who engage more deeply with an app might explore more features (like location sharing) and simultaneously find more points of critique.
Measurement bias: Survey responses might differ based on the attitudes or demographics of those who choose to take part in the new feature.

Approaches To Test For Causation

A strong way to determine whether location sharing truly makes people less happy is to move from observational data to experimental data. The gold standard approach involves randomizing who gets access or encouragement to use a feature, then measuring the difference in outcomes. In causal inference terms, one might attempt to measure the “Average Treatment Effect” (ATE).

Where:

Y(1) is the outcome (e.g., reported happiness level) if a user is assigned to the “treatment” group that uses the feature.
Y(0) is the outcome if a user is assigned to the “control” group that does not use or is not prompted to use the feature.
E[·] denotes the expected value (average) of the outcome.

By randomly assigning users to the feature or to a no-feature group, we minimize systematic differences between the groups aside from the treatment itself. If this randomization is successful, any significant difference in average happiness can be more confidently attributed to the feature.

Example: Hypothetical A/B Test Implementation

import numpy as np
from scipy.stats import ttest_ind

# Let's say we randomly assign users to two groups:
# group_A uses the location feature (treatment),
# group_B does not use it (control).
# We measure their happiness on a scale 1-10.

# Randomly generate some example data
np.random.seed(42)
group_A_happiness = np.random.normal(loc=7.0, scale=1.0, size=500)  # location-sharers
group_B_happiness = np.random.normal(loc=7.2, scale=1.0, size=500)  # non-location-sharers

# Perform a two-sample t-test
t_stat, p_value = ttest_ind(group_A_happiness, group_B_happiness)

print("Mean of group A (treatment):", np.mean(group_A_happiness))
print("Mean of group B (control):", np.mean(group_B_happiness))
print("T-statistic:", t_stat)
print("P-value:", p_value)

In this example, we simulate and measure the average happiness in each group. If the difference in means is statistically significant (with a sufficiently low p-value) and random assignment is valid, we can infer a causal relationship. If the difference is not significant, we do not have evidence to assert that location sharing makes users less happy.

Follow-Up Questions