ML Interview Q Series: Validating Probabilities: Fixing Incoherent Odds Using Axioms and Softmax Activation.

May 10, 2025

Browse all the Probability Interview Questions here.

Question

You consult Joe the bookie as to the form in the 2.30 at Ayr. He tells you that, of 16 runners, the favourite has probability 0.3 of winning, two other horses each have probability 0.20 of winning, and the remainder each have probability 0.05 of winning, excepting Desert Pansy, which has a worse than no chance of winning. What do you think of Joe’s advice?

Connect with me on X (Twitter)

Short Compact solution

Assume there are 16 possible winners, one for each horse. Joe’s stated probabilities sum to 1.3 rather than 1.0, which makes them incoherent even if they might look profitable from the bettor’s standpoint. Furthermore, Desert Pansy cannot truly have a “worse than no chance” of winning, and any unmentioned horse (for instance “Dobbin”) should also have a nonzero probability of winning.

Comprehensive Explanation

Joe provides probabilities for each of the 16 horses: 0.3 for the favorite, 0.20 for each of two horses, and 0.05 each for most of the remaining horses, while saying Desert Pansy has effectively no chance of winning. If we sum 0.3 + (2×0.20) + 0.05 for each of the others (excluding Desert Pansy), we reach a total of 1.3. However, a key axiom of probability is that the sum of all probabilities for mutually exclusive outcomes in a complete sample space must be exactly 1. Joe’s total of 1.3 violates that principle, indicating his advice is not a valid probability model.

Probability theory requires that if we let p(i) be the probability that horse i wins, then

Because Joe’s stated probabilities add up to 1.3, his distribution is “incoherent.” In practice, a gambler offered these odds might exploit the situation for guaranteed profit (in an idealized sense) because there appears to be extra “probability mass.” Also, assigning “worse than no chance” or effectively negative probability for Desert Pansy is nonsensical in standard probability theory, as probabilities must be at least 0.

From a more applied perspective, to rectify Joe’s advice you would need to rescale or adjust probabilities so that they sum to 1 and remain nonnegative. Additionally, every horse in the race (including Desert Pansy) must have at least some nonnegative probability of winning.

What is a coherent probability distribution and why must probabilities sum to 1?

A coherent probability distribution is one that satisfies basic axioms of probability: each probability is between 0 and 1, and the total probability for all possible outcomes is exactly 1. Mathematically, for a set of N mutually exclusive and collectively exhaustive events, p(1) + p(2) + ... + p(N) = 1. Failing this indicates logical inconsistency because it implies that the measure of “likelihood” for all outcomes combined is not properly normalized.

Why do bookmakers sometimes produce probabilities summing to more than 1?

In real-world betting, bookmakers set prices (odds) in a way that may reflect a margin or “overround,” which lets them profit regardless of which horse wins. However, this overround does not literally mean the probabilities must exceed 1 in a true mathematical sense. Instead, it is a markup so the implied probabilities might appear to sum to greater than 1. In contrast, Joe’s statement that the sum of probabilities is 1.3, framed as actual probabilities rather than betting odds, shows a fundamental error rather than a typical bookmaker margin.

What is arbitrage and how does it relate to sums of probabilities exceeding 1?

Arbitrage refers to a situation in which a gambler can place bets on certain outcomes and guarantee a profit no matter what happens. If the sum of probabilities from a bookmaker was presented as real probabilities and exceeded 1, it could indicate an arbitrage opportunity in an idealized, frictionless scenario. By carefully distributing your total stake over the horses with inflated probabilities, you could theoretically lock in gains. In practice, there are transaction costs, maximum bet limits, and other real-world constraints that complicate this idea.

Could a horse truly have negative probability of winning?

No. Probability must lie between 0 and 1. Negative probability values do not have a place in conventional probability theory. Saying a horse has a “worse than no chance” or negative probability is simply a mistake or a humorous expression. At minimum, the horse’s probability is 0 if the bookmaker or a model believes it cannot possibly win.

If you were building a machine learning model for horse race predictions, how would you address these probability constraints?

In a machine learning model for multiclass classification (where each horse is a class), you typically ensure that all class probabilities are nonnegative and sum to 1. One common technique is to use a softmax layer in neural networks. After computing raw scores (logits) for each class, a softmax function transforms them into valid probabilities that sum to 1. For example, in PyTorch you might use:

import torch
import torch.nn as nn

# Suppose logits is the output vector of raw scores of shape [batch_size, n_horses]
logits = torch.randn(1, 16)  # example
softmax = nn.Softmax(dim=1)
probabilities = softmax(logits)
print(probabilities)
print(probabilities.sum(dim=1))  # This will be 1.0 for each batch element

You also need a proper loss function such as cross-entropy loss for training. This ensures the model’s predictions remain probabilistic and coherent.

Could you clarify how to check for coherence in a Bayesian setting?

In a Bayesian setting, coherence requires that all posterior distributions and prior distributions obey the same probability rules. Even when applying Bayes’ theorem with a prior distribution p(H) over hypotheses H and a likelihood model p(D|H) over data D, the resulting posterior must also be a valid probability distribution. Formally:

Here, p(D) is the total marginal probability of data D, computed by summing or integrating over all hypotheses. After performing this calculation, each posterior probability must lie between 0 and 1, and the sum/integral of posterior probabilities over all hypotheses H must be 1. Any model that produces posterior sums inconsistent with 1 would be deemed incoherent and mathematically invalid.

These are the core considerations you must understand when someone, like Joe the bookie, provides probabilities for an event that should have a well-defined and mutually exclusive set of outcomes.

Rohan's Bytes

Discussion about this post