ML Interview Q Series: How would you estimate default probability 𝑝 using MLE, given recent loan outcomes and assumptions?
📚 Browse the full ML Interview series here.
Comprehensive Explanation
Assumptions and Data Generating Process
A common assumption in this scenario is that each loan outcome follows a Bernoulli distribution, where the probability of a default is p. In other words, each outcome x can be 0 (no default) or 1 (default) with probability p for 1 and (1 - p) for 0. Another typical assumption is that these observations are independent and identically distributed (i.i.d.).
Likelihood Function and MLE Derivation
Given a set of n Bernoulli trials x1, x2, ..., xn, the likelihood function of parameter p can be written as:
where sum_{i=1 to n} x_i is the total number of defaults in the dataset, and n - sum_{i=1 to n} x_i is the total count of non-defaults.
To find the maximum likelihood estimate of p, we typically take the log of the likelihood (known as the log-likelihood), differentiate it with respect to p, set that derivative equal to zero, and solve for p. This yields:
Here, sum_{i=1 to n} x_i is the count of defaults, and n is the total number of loans in the dataset.
Applying to the Provided Data
The outcomes for the last 10 loans are: [0, 1, 0, 0, 1, 0, 0, 1, 0, 0]. Counting the 1’s, we see that sum of x_i = 3 (three defaults). Since n = 10, we can directly compute the MLE:
hat_p = 3 / 10 = 0.3
Thus, based on the given data and under the Bernoulli i.i.d. assumption, the MLE for p (the probability of default on a new loan) is 0.3.
Python Code Example
import numpy as np
data = [0, 1, 0, 0, 1, 0, 0, 1, 0, 0]
p_hat = np.mean(data)
print(p_hat) # Should print 0.3
This straightforward approach uses the sample mean of the Bernoulli outcomes as the MLE of p.
What if the data changes in size or composition?
If additional data are collected (more than 10 loans or different outcomes), the method remains the same: sum the defaults and divide by the total number of loans to estimate p. The MLE framework easily extends to larger datasets without changing the fundamental formula.
How do you handle a situation with zero defaults in the sample?
If all outcomes were 0, the MLE would be sum_{i=1 to n} x_i / n = 0. From a purely frequentist MLE perspective, this would suggest p = 0. However, in practical terms, we might impose a prior (Bayesian approach) or implement smoothing to avoid concluding exactly p=0, because we often expect the chance of default to be strictly greater than zero in the real world.
How would you incorporate explanatory variables (e.g., loan features)?
When you have additional features about each loan (credit score, income, etc.), a common approach is to model p using a logistic regression or another classification method. In logistic regression, p would be modeled as:
p = 1 / (1 + exp(- (beta0 + beta1 x1 + ... + betak xk)))
where betas are parameters learned from the data via maximum likelihood methods extended to this generalized linear model setting.
Potential Edge Cases and Real-World Considerations
In practice, not all loan outcomes may be strictly independent—for instance, macroeconomic conditions affect multiple loans at once. Violations of independence assumptions can create biases in the MLE. Additionally, the dataset might be imbalanced, especially if defaults are rare. In such cases, logistic regression or other models with regularization or Bayesian priors often provide more robust estimates. Also, data may be censored: some loans might still be active, with no final outcome (default or fully paid). Handling such data generally requires survival analysis or other specialized methods.
How would you verify or validate this model?
Typically, you would split historical data into training and test sets, fit the model on the training set, and measure its predictive performance on the test set (e.g., using log loss or AUC). Model calibration checks are also important—comparing predicted probabilities vs. actual default rates in grouped data can reveal if the model tends to overestimate or underestimate risk.
How does MLE compare to MAP estimation in this context?
Maximum a Posteriori (MAP) estimation incorporates a prior distribution on p. For example, if you have a Beta(alpha, beta) prior, the posterior distribution for p after observing Bernoulli data is Beta(alpha + sum(x_i), beta + n - sum(x_i)), and the MAP estimator would be (alpha + sum(x_i) - 1) / (alpha + beta + n - 2), if alpha + beta + n - 2 != 0. This approach is particularly helpful if the dataset is small or there is external knowledge (priors) about the plausible range of default rates.
These considerations highlight how crucial it is to understand the assumptions behind maximum likelihood approaches and the ways in which real-world conditions can affect or modify these methods.
Below are additional follow-up questions
How would you address shifting market conditions that might invalidate the i.i.d. assumption?
A common pitfall is assuming that the probability of default (p) remains stable over time. In reality, economic cycles, interest rate changes, and regulatory shifts can cause p to vary. If the i.i.d. assumption no longer holds, the MLE estimate might be overly simplistic. One approach to handle time-varying conditions is to incorporate a time-series model or use a rolling window to update estimates of p. In practice, you could apply models such as state-space models or time-series regression where p can shift as macroeconomic indicators change. Another strategy is to apply online learning techniques that update parameter estimates continually as new loans and their outcomes arrive. A subtle issue here is deciding how quickly to forget old data since too short a window may lose valuable historical patterns, while too long a window might dilute more recent information.
What if certain loans have partial outcomes or are restructured rather than clearly defaulting or not defaulting?
In many real-world scenarios, loans can be partially repaid, restructured, or settled. This complicates a straightforward Bernoulli framework, where the only outcomes are 0 or 1. One way to manage this scenario is to define a new target variable that captures different degrees of default severity. You might adopt a multi-state model in which each state indicates a different loan outcome stage (e.g., no distress, 30-days past due, 60-days past due, restructured, etc.). Then, a multinomial model (or a survival analysis approach) could be used. Pitfalls arise when mapping complex restructuring statuses to a simplistic binary outcome, risking loss of nuanced data. A potential edge case is that some restructured loans could later move to full default, so timing of observation and final outcome definitions are critical.
How do you construct confidence intervals or uncertainty estimates around your MLE for p?
Once you compute p-hat = sum(x_i)/n, you often need to express uncertainty about that estimate. One standard approach is to use the asymptotic normality of the MLE for Bernoulli parameters. Under large-n assumptions, p-hat is approximately normal with mean p and variance p*(1-p)/n. A simple approximate 95% confidence interval might be:
p-hat ± 1.96 * sqrt(p-hat * (1 - p-hat) / n)
However, potential pitfalls include small sample sizes, where a normal approximation is not accurate. In such cases, one may use exact methods (e.g., Clopper-Pearson intervals) or apply a Bayesian approach with a Beta prior. Another subtlety occurs when p-hat=0 or p-hat=1, which breaks a naive asymptotic approach and calls for alternative interval constructions.
How might you extend the model if default probabilities differ significantly across different subpopulations?
When you have heterogeneous borrower populations (e.g., different credit scores, industries, or regions), a single p may be insufficient. You could stratify the data by meaningful categories and compute subgroup-specific p-hat. Alternatively, you might use logistic regression to estimate p as a function of multiple borrower features. Even in the MLE context of a purely Bernoulli model, you can segment the dataset and produce separate maximum likelihood estimates for each segment. A pitfall is that having many segments with limited data in each can lead to unreliable estimates. Ensuring each segment has enough observations is crucial. Another subtlety is dealing with overlapping segments or new types of borrowers not represented in the historical data.
How would you handle missing or incomplete loan outcome data?
Sometimes you won’t have the final status of all loans. For instance, some loans are still active, and their eventual outcome (default or no default) is unknown. Simply omitting these loans can introduce bias if the set of known outcomes is not representative. One approach is to incorporate survival analysis, which models time-to-event (in this case, time-to-default) instead of a single snapshot. Alternatively, imputation methods or ignoring partial data under the assumption of Missing Completely at Random (MCAR) can be used, though that assumption is often violated. The major pitfall is that active loans with no final outcome may be systematically different from older loans. A subtle real-world issue is that newer loans might be higher risk or might have come from different underwriting criteria, so ignoring them until they either mature or default can bias p-hat.
How do you adjust for potential correlation among loans (e.g., multiple loans from the same borrower or from the same region)?
One key assumption in a basic Bernoulli MLE approach is independence. But if multiple loans originate from the same borrower or region, their defaults might be correlated. This correlation invalidates simple Bernoulli independence assumptions, and the variance of p-hat might be underestimated. You could address this by clustering standard errors when estimating p or by applying hierarchical models. For example, a hierarchical (multilevel) model can learn variation at the group level (borrower, branch, region) and also account for the overall default rate. A subtlety is distinguishing between short-term correlation (e.g., shared macroeconomics) and structural correlation (e.g., repeated loans to the same borrower). In large datasets, ignoring these correlations might artificially inflate confidence in the estimate.
How do you incorporate domain knowledge or expert judgment into the MLE framework?
A strictly frequentist MLE approach uses only the observed data. However, in practice, domain experts may have strong beliefs about typical default rates or early signals. One way to formalize this knowledge is via a Bayesian framework, placing a prior distribution over p. For instance, if experts believe p is typically around 0.05 with some uncertainty, you might use a Beta(alpha, beta) prior that centers around 0.05. In the Bernoulli case, the posterior remains a Beta distribution, and you can derive a more informed estimate (MAP). A subtle pitfall is that strong priors might dominate the data, especially in small-sample settings, potentially overshadowing real changes in default behavior. Balancing domain knowledge with empirical observations remains an important design choice.
How might outliers or mislabeled outcomes in the historical data affect the MLE estimate?
When data contain errors—such as misclassification of a loan’s true default status—the MLE p-hat may be skewed. For instance, if some defaults were mislabeled as non-defaults, you would underestimate the true default probability. In a Bernoulli context, there is no built-in mechanism to discount or adjust for label noise. One remedy is to perform data cleaning or quality checks to reduce the chance of label errors. Another approach might be to model label noise explicitly by assuming a certain probability that any given label is flipped. This, however, increases the complexity of the likelihood function. A subtle consideration is whether label errors are random or systematically related to certain borrower characteristics, in which case ignoring them can cause more severe biases.
How do you handle performance metrics beyond accuracy when validating the model’s predicted probabilities?
MLE for p focuses on fitting the probability of default. In practical settings, you often care about how well the model separates defaulters from non-defaulters and how well calibrated those probabilities are. Standard evaluation metrics include AUC for discrimination and Brier Score or log loss for calibration. A pitfall is relying solely on accuracy in a setting with class imbalance: if the majority of loans do not default, a naive predictor of p=0 might show high accuracy but no real value in identifying risk. Another subtlety is that regulators or internal policies may require certain risk thresholds, so interpretability and alignment with business objectives (like maximizing expected profit or minimizing risk) can be more crucial than standard metrics alone.