ML Interview Q Series: Assessing Fake Review Flags: Applying Bayes' Theorem with Imbalanced Data

May 26, 2025

Browse all the Probability Interview Questions here.

We have historical data indicating that only 2% of reviews are fraudulent and 98% are genuine. When a review is actually fake, our ML model flags it as fake 95% of the time. When a review is truly legitimate, the model correctly labels it as genuine 90% of the time. If the model predicts a certain review to be fake, what is the probability it is indeed fake?

Comprehensive Explanation

A common approach for such problems is Bayesian inference, specifically Bayes’ Theorem. In plain text form, Bayes’ Theorem for a review being fake given that it was detected as fake by the model can be written as:

Below is an explanation for each term, in text-based (inline) math expressions (without latex) as per the instructions:

P(Fake) is the prior probability that any given review is fake, which is 0.02.
P(Legitimate) is the prior probability that a review is legitimate, which is 0.98.
P(DetectedFake|Fake) is the probability that the model flags a review as fake given it is actually fake, which is 0.95.
P(DetectedFake|Legitimate) is the probability that the model flags a review as fake given it is actually legitimate. Because the model is correct 90% of the time for legitimate reviews, it is incorrect 10% of the time; so this value is 0.10.

Substituting these into the formula:

P(DetectedFake|Fake) * P(Fake) = 0.95 * 0.02 = 0.019 P(DetectedFake|Legitimate) * P(Legitimate) = 0.10 * 0.98 = 0.098

Hence:

P(Fake|DetectedFake) = 0.019 / (0.019 + 0.098) = 0.019 / 0.117 ≈ 0.162 (16.2%)

So when the model flags a review as fake, there is only around a 16.2% chance it is truly fraudulent. This highlights the effect of having a low prior probability for fake reviews—most flagged cases turn out to be legitimate despite the seemingly high accuracy for detecting fake reviews.

Below is a quick Python snippet to demonstrate how one could compute this probability programmatically:

prior_fake = 0.02
prior_legit = 0.98
prob_detected_if_fake = 0.95
prob_detected_if_legit = 0.10  # Probability of labeling a legit review as fake

posterior = (prob_detected_if_fake * prior_fake) / (
    (prob_detected_if_fake * prior_fake) + (prob_detected_if_legit * prior_legit)
)

print(f"Probability that a flagged review is actually fake: {posterior * 100:.2f}%")

Why the Probability is Not as High as Expected

Many find it unintuitive that even with a strong detection probability (95%), the posterior probability that a flagged review is indeed fake is relatively low (about 16.2%). The key driver is the imbalance in the prior distribution: only 2% of all reviews are fake. Even if your model is highly accurate at identifying fakes, the large proportion of legitimate reviews can still lead to a significant proportion of false alarms.

Potential Follow-Up Questions

How does this relate to confusion matrix metrics, such as precision and recall?

Precision (for fake detection) corresponds to the fraction of detected fake reviews that are actually fake. Here, we computed that as about 16.2%. This is precisely the Bayesian posterior probability. Recall (for fake detection) would represent the fraction of truly fake reviews that are flagged as fake (which is 95%). Even with a high recall, precision can be low if the prior probability of the positive class (fake reviews) is very low and if there is a non-negligible false-positive rate.

When we translate these probabilities into a confusion matrix:

True Positives: 95% of the 2% that are fake.
False Positives: 10% of the 98% that are legitimate.
False Negatives: 5% of the 2% that are fake.
True Negatives: 90% of the 98% that are legitimate.

Why does class imbalance matter so much?

Class imbalance (98% legitimate vs. 2% fake) skews the results significantly. Even a small error rate on the dominant class (legitimate reviews) can create a large number of false positives, which dilutes the fraction of positives (true fakes) identified by the model. In real-world scenarios with highly imbalanced data, techniques like oversampling, undersampling, or using alternative metrics (for example, precision-recall curves instead of ROC curves) can be helpful.

Are there ways to improve the probability that flagged reviews are truly fake?

There are several strategies:

Adjust the decision threshold: Instead of using the default probability cut-off, raise it to reduce false positives. This can increase precision (at the cost of recall).
Collect more representative data: A better dataset with more examples of fake reviews may help the model distinguish fake from legitimate more effectively.
Incorporate additional features: Beyond textual or rating features, you could incorporate user-behavior metrics, IP addresses, or timing patterns to give more signals about whether a review is fake.

In a production scenario, how would we handle the trade-off between false positives and false negatives?

This depends on the cost of each error type:

False Positive: A legitimate user’s review being flagged as fake. This might harm user trust.
False Negative: A fake review being allowed to remain labeled as legitimate. This might harm product credibility or lead to manipulations of product ratings.

Often, these trade-offs are managed via a custom threshold on the model’s score. In some cases, a separate cost-sensitive model is trained to incorporate the business impact directly into the loss function.

How would one handle limited labeled data for fake reviews?

If labeled data for the fake class is scarce, possible approaches include:

Data augmentation: Simulating or synthetically generating fake reviews.
Semi-supervised learning: Using a large corpus of unlabeled data and a smaller labeled dataset to bootstrap classification.
Anomaly detection methods: Treating fakes as outliers or anomalies.
Active learning: Iteratively labeling uncertain predictions to expand the training dataset.

Would including user-behavior data help?

Yes. Models that only focus on the text can miss critical signals like:

Sudden spikes in review volume from an account.
Repetitive rating behavior or identical text repeated multiple times.
Time-of-day patterns or suspicious IP addresses. Combining user-level features often significantly improves the detection of fraudulent reviews.

Does Bayes’ Theorem still hold if new information contradicts the prior?

Bayes’ Theorem always holds, as it is a fundamental mathematical rule for updating probabilities given new evidence. If new data reveals that the true percentage of fake reviews differs from the original assumption, simply update P(Fake) to reflect the new knowledge. Over time, you might recalculate your prior probabilities and refine your model’s predictions accordingly.

These clarifications and follow-up questions are often used to gauge not just your ability to apply Bayes’ Theorem to a straightforward problem, but also to test your real-world practical knowledge of how to handle imbalance, interpret confusion matrix metrics, and manage production constraints.

Below are additional follow-up questions

What if the model’s cost of misclassifying a fake review is drastically higher than misclassifying a legitimate review?

When the cost of letting a fake review pass as legitimate is significantly more detrimental than flagging a genuine review as fake, the model design might shift to prioritize recall for detecting fakes. One pitfall is that you can inadvertently increase false positives, annoying legitimate users who find their genuine reviews flagged. In cost-sensitive settings, you can assign higher loss penalties to false negatives (fake reviews missed) than to false positives. Methods to handle this include:

Using weighted loss functions, where instances belonging to the fake class carry a higher penalty when misclassified.
Adjusting the decision threshold on predicted probabilities to minimize the overall “cost” rather than just optimizing for accuracy or F1.
Periodically re-evaluating the monetary or reputational consequences of false positives vs. false negatives to ensure these trade-offs reflect real-world business needs.

Subtle issues include correctly quantifying these costs, which can be intangible or vary over time. If you overestimate the cost of missing a fake review, you may end up with an overly sensitive detector that overwhelms your manual review process. Conversely, underestimating the cost might open the door for harmful fake reviews to slip through.

Could model calibration help address the mismatch between actual fake prevalence and model outputs?

Model calibration techniques adjust the predicted probability distribution so that the output probabilities reflect real-world frequencies. For example, if a model output is “0.80 chance of being fake,” well-calibrated models mean roughly 80% of similarly scored instances will truly be fake. Common methods:

Platt scaling (training a logistic regression on model scores vs. ground-truth labels).
Isotonic regression (a non-parametric approach that can better fit more complex calibration curves).

A major pitfall is that calibration depends heavily on representative validation data. If your validation set does not match the true distribution of legitimate vs. fake reviews or changes over time (concept drift), the calibration may become outdated quickly. In heavily imbalanced data, calibration can still be challenging, as small errors in the tail distribution can lead to large swings in predicted probability.

How do we handle sudden changes in user behavior or new attack vectors that create concept drift?

Concept drift occurs when the data distribution shifts over time. For instance, scammers might adopt new phrasing or account-creation strategies to bypass detection. In dealing with drift:

Incremental or online learning approaches can update the model with recent examples, keeping the model relevant to evolving tactics.
Periodic re-training with rolling windows of data allows the model to adapt. However, if the window is too short, the model might forget older patterns that are still relevant; if too long, it might not adapt fast enough to new threats.
Out-of-distribution detection and anomaly detection can alert the team to abrupt changes.

Edge cases include major promotions or holidays that cause legitimate but unusual spikes in activity. The model could incorrectly interpret these genuine surges as suspicious. Balancing daily or weekly incremental updates with stable offline re-training strategies is an ongoing challenge.

Are there specific evaluation pitfalls when dealing with highly imbalanced data beyond accuracy, precision, and recall?

Yes. A few pitfalls and alternative metrics to consider:

Accuracy becomes misleading when the majority class dominates, as a trivial classifier predicting everything as legitimate can achieve very high accuracy.
The ROC curve can also be overly optimistic in highly skewed datasets, because true negatives are so prevalent. A tiny false positive rate can still mean a large absolute number of false positives.
Precision-Recall (PR) curves are often a better indicator of performance in imbalanced settings. High area under the PR curve suggests the model can handle the rare class well across different thresholds.
Confusion matrix breakdown by different segments (time, user demographics, or device type) can reveal pockets of the data where performance might degrade.

One subtlety is that you might need separate strategies for different segments if certain user groups or geographies exhibit different patterns of legit vs. fake reviews. Evaluating the model globally may hide poor performance in niche areas.

How can the prior probability of fake reviews shift rapidly (for example, due to a targeted attack) and how should we adapt?

In events like a coordinated “review bombing,” the percentage of fake reviews might temporarily spike above 2%. If your model was trained on a 2% prior, it might become too conservative. Ways to adapt:

Dynamically adjust the prior as soon as you detect anomalies in incoming data, possibly through online Bayesian updating or using a monitoring system that flags unusual changes in distribution.
Provide a manual override or alert system for suspicious spikes, so human reviewers can investigate and re-label suspicious cases in near real-time.
Retrain or fine-tune the model frequently when a sudden shift is detected, ensuring that newly labeled data reflects the updated distribution.

A major pitfall here is overreacting to short-lived events. You might end up reconfiguring your entire pipeline for a temporary surge that recedes quickly, which can make the model unstable and cause confusion among end-users. Balancing short-term data patterns vs. long-term stable prior estimates is key.

What if malicious actors adapt to detection patterns by writing more sophisticated or carefully disguised fake reviews?

Sophisticated adversaries may use techniques like:

Varying their language models to mimic real human writing more convincingly.
Moderating their review frequency or rating patterns to appear “normal.”
Using compromised accounts with established reputations.

Mitigation requires:

Adversarial machine learning research to identify newly emerging patterns.
Ensemble approaches that combine content-based signals with behavioral or network-based signals. Even if the text reads as genuine, unusual timing or suspicious connection patterns might be detected.
Continuous updates of the model with newly discovered fake samples to keep pace with adversaries.

An edge case is when attackers insert a low-level “smear” campaign that stays under the radar but accumulates subtle influence over time. If the volume is small, it might not trigger typical anomaly flags, requiring deeper insight or correlation with external data to catch slow-acting, stealthy campaigns.

Can an online or streaming learning approach help maintain model accuracy?

Yes, especially in a system where reviews are posted continuously. Online learning offers:

The ability to update model parameters incrementally as new labeled examples arrive.
The opportunity to adapt quickly to distribution shifts (like concept drift or spikes of suspicious activity).
Reduced need for frequent full retrains, which can be computationally expensive.

However, pitfalls include:

Potential to overfit rapidly if the data distribution is noisy or the labeling process is error-prone.
Forgetting older patterns that might still be relevant if the window size for updates is too narrow.
Complexity in engineering an online or streaming pipeline that can handle a large-scale, real-time data flow.

How does external or domain-specific knowledge factor into improving detection?

Incorporating domain knowledge often yields a stronger model by adding features that purely content-based models might miss. Examples:

Verification of user identity or linking reviews to verified purchases.
Using graphs or social network analysis to identify suspicious review clusters or spam rings.
Checking metadata such as the ratio of five-star vs. one-star reviews per reviewer, or how many reviews they post in a short timespan.

A subtlety is the privacy vs. performance trade-off. Some potentially useful data—like user location or social media accounts—may be restricted or protected, which complicates building a robust feature set. Another pitfall is that domain knowledge can become obsolete if the market or user behavior changes, so you have to regularly revisit and validate your domain-based assumptions.

Rohan's Bytes

Discussion about this post