ML Interview Q Series: Network Path Reliability Analysis using the Inclusion-Exclusion Principle

May 15, 2025

Browse all the Probability Interview Questions here.

Consider a communication network with four nodes n₁, n₂, n₃ and n₄ and six directed links l₁ = (n₁, n₂), l₂ = (n₁, n₃), l₃ = (n₂, n₃), l₄ = (n₃, n₂), l₅ = (n₂, n₄) and l₆ = (n₃, n₄). A message must be sent from source node n₁ to destination node n₄. Each link lᵢ has probability pᵢ of functioning independently of the others. A path is considered functioning only if all its links function. Use the inclusion-exclusion principle to find the probability that there exists at least one functioning path from n₁ to n₄. Also simplify the result when pᵢ = p for all i.

Short Compact solution

By labeling the four distinct paths from n₁ to n₄ as: • Path 1: (l₁, l₅) • Path 2: (l₂, l₆) • Path 3: (l₁, l₃, l₆) • Path 4: (l₂, l₄, l₅)

and using the inclusion-exclusion formula on these path events A₁, A₂, A₃, A₄, the probability that at least one path is functioning is

When pᵢ = p for all i, it simplifies to

Comprehensive Explanation

Inclusion-Exclusion Principle for Probability of “At Least One Path Functions”

We define the events Aⱼ to be “Path j functions,” for j in {1, 2, 3, 4}. A path functions if and only if all of its links function. The probability that there is at least one functioning path from n₁ to n₄ is P(A₁ ∪ A₂ ∪ A₃ ∪ A₄).

To compute P(A₁ ∪ A₂ ∪ A₃ ∪ A₄), we apply the inclusion-exclusion principle:

In words: • Add all single probabilities. • Subtract the probabilities of all pairwise intersections. • Add back the probabilities of all triple intersections. • Subtract the probability that all four occur together.

Computing Single-Path Probabilities

Each link lᵢ has probability pᵢ of functioning. If a path is composed of links lᵢ₁, lᵢ₂, …, then its probability of functioning is pᵢ₁ × pᵢ₂ × …. For the four paths we have:

A₁ corresponds to path (l₁, l₅). Probability(A₁) = p₁ p₅.
A₂ corresponds to path (l₂, l₆). Probability(A₂) = p₂ p₆.
A₃ corresponds to path (l₁, l₃, l₆). Probability(A₃) = p₁ p₃ p₆.
A₄ corresponds to path (l₂, l₄, l₅). Probability(A₄) = p₂ p₄ p₅.

Computing Pairwise, Triple, and Four-Way Intersections

AⱼAₖ means “Path j and Path k both function.” This can only happen if all links used in both paths simultaneously function.
Similarly for triple intersections AⱼAₖAₗ and the four-way intersection A₁A₂A₃A₄.

When we write out all these terms carefully and sum them up with the appropriate + and – signs from inclusion-exclusion, we arrive at

p₁p₅ + p₂p₆ + p₁p₃p₆ + p₂p₄p₅ minus all pairwise overlaps such as p₁p₂p₅p₆, p₁p₃p₅p₆, and so on plus or minus higher-order intersections (up to p₁p₂p₃p₄p₅p₆).

Collecting like terms, we get exactly:

This is the probability that at least one of the four paths is functioning.

Special Case: pᵢ = p for All i

When every link lᵢ has the same reliability p, we replace each pᵢ by p in the above expression. After factoring and simplifying, all identical terms with factors of p combine appropriately. The final simplified form is:

This expression is derived by systematically grouping terms to show how powers of p factor out from each summand.

Why Inclusion-Exclusion?

In many network-reliability or path-availability questions, directly counting “at least one path” can lead to overcounting intersections of events (where multiple paths might function simultaneously). Inclusion-exclusion is a systematic way to account for those overlaps correctly.

Potential Pitfalls

One common pitfall is to naively sum probabilities of path events without subtracting overlaps, leading to an overestimation of the probability. Another subtlety is to ensure that all paths are accounted for, especially in directed networks where loops or cross-links (like l₃ and l₄ connecting n₂ ↔ n₃) can complicate the analysis.

Practical Example with Code

In practice, if the number of paths is small, one could brute-force compute the probability by enumerating each subset of links that function. Below is a Python snippet that demonstrates enumeration for this small example:

import itertools

# Suppose p = [p1, p2, p3, p4, p5, p6]
def probability_of_some_path(p):
    # Probability that a subset of links is up
    # is product(p_i^1 * (1 - p_i)^0) for each link that is "up",
    # times product((1 - p_i)^1 * p_i^0) for each link that is "down".
    # We'll sum over all subsets that cause at least one path to work.

    links = range(6)  # indexes 0..5 correspond to p1..p6
    total_prob = 0.0

    # All possible states: 2^6
    for state in itertools.product([0, 1], repeat=6):
        # state[i] = 1 means link i is functioning, 0 means down
        # compute the probability of this state
        prob_state = 1.0
        for i in links:
            if state[i] == 1:
                prob_state *= p[i]
            else:
                prob_state *= (1 - p[i])

        # check if any path from n1->n4 is up
        # path 1: l1, l5 -> indexes (0, 4)
        # path 2: l2, l6 -> indexes (1, 5)
        # path 3: l1, l3, l6 -> (0, 2, 5)
        # path 4: l2, l4, l5 -> (1, 3, 4)

        path1 = (state[0] == 1 and state[4] == 1)
        path2 = (state[1] == 1 and state[5] == 1)
        path3 = (state[0] == 1 and state[2] == 1 and state[5] == 1)
        path4 = (state[1] == 1 and state[3] == 1 and state[4] == 1)

        if path1 or path2 or path3 or path4:
            total_prob += prob_state

    return total_prob

This brute-force approach can confirm the derived algebraic expression for a given numeric set of probabilities.

Possible Follow-Up Questions

How does one deal with more complicated networks with many possible paths?

In larger networks, the principle is the same, but direct inclusion-exclusion quickly becomes unwieldy because of many overlaps. Often, techniques like minimal path sets, minimal cut sets, or specialized reliability polynomials are used. One might also rely on efficient algorithms or even Monte Carlo simulation for approximate solutions.

What if the links are not independent?

If the links’ functioning are correlated (for example, if l₃ is more likely to fail when l₁ fails), the simple product pᵢ × pⱼ × … would not be valid. We would need to incorporate the joint probability distribution of the link failures, and the algebraic approach would become more involved.

How do we ensure numerical stability in a real implementation?

When p is very close to 1 or 0, floating-point precision might cause errors in direct summation of small or large probabilities. Techniques like summation in log space can help maintain stability.

How would this generalize if different nodes had multiple links leaving them?

The same method applies: enumerate all distinct paths from the source to the destination, define each path event, and apply inclusion-exclusion. In practice, one often uses more advanced methods (like factoring out common link sets or applying cut-set expansions) to avoid summing over an exponential number of path events.

These details show how the inclusion-exclusion principle is both powerful and can be generalized. For small networks like the one in the question, we can handle the computations directly and verify correctness by brute force or by carefully enumerating all intersections.

Below are additional follow-up questions

How does the approach change if the network can have cycles that create infinitely many distinct paths?

When there are cycles in a directed network, it is theoretically possible for a path to loop around and then proceed to the destination, generating infinitely many simple path variants. From a reliability perspective, an “infinite number of paths” seems problematic. However, in practice, we usually focus on “simple paths” (i.e., paths that do not revisit nodes). A path that loops back to the same node multiple times typically does not contribute additional distinct routes in terms of reliability because it involves extra links that all need to be functioning simultaneously.

One common way to handle cycles is:

Identify all simple paths (those that visit each node at most once) using a depth-first search that tracks visited nodes.
Use these simple paths in an inclusion-exclusion approach.