Expected Value: Predicting Average Outcomes

Expected value is one of the most practical concepts in probability. It tells us what to expect "on average" from a random process—essential for AI systems that need to make decisions under uncertainty.

What is Expected Value?

The expected value (or expectation) is the average outcome you'd get if you repeated an experiment infinitely many times.

Notation: E[X] or μ

Formula (discrete):

E[X] = Σ xᵢ × P(X = xᵢ)

Multiply each outcome by its probability, then sum.

Simple Example: Dice Roll

For a fair 6-sided die:

E[X] = 1×(1/6) + 2×(1/6) + 3×(1/6) + 4×(1/6) + 5×(1/6) + 6×(1/6)
     = (1 + 2 + 3 + 4 + 5 + 6) / 6
     = 21/6
     = 3.5

The expected value is 3.5—even though you can never actually roll 3.5!

This means: if you roll the die many times, your average will approach 3.5.

Expected Value in AI

Classification Confidence

A classifier outputs probabilities [0.8, 0.15, 0.05] for classes worth [1, 0, 0] (correct or wrong).

Expected "correctness":

E[Correct] = 1×0.8 + 0×0.15 + 0×0.05 = 0.8

The model expects to be correct 80% of the time on inputs like this.

Recommendation Systems

Expected rating for recommending a movie:

E[Rating] = 5×0.1 + 4×0.3 + 3×0.4 + 2×0.15 + 1×0.05
          = 0.5 + 1.2 + 1.2 + 0.3 + 0.05
          = 3.25 stars

Reinforcement Learning

Expected reward from an action:

E[Reward] = Σ reward × P(reward | action)

AI agents choose actions that maximize expected reward.

Properties of Expected Value

Linearity

The expected value of a sum equals the sum of expected values:

E[X + Y] = E[X] + E[Y]

This works even if X and Y are dependent!

Example: Rolling two dice

E[Sum] = E[Die1] + E[Die2] = 3.5 + 3.5 = 7

Scaling

E[aX] = a × E[X]
E[X + b] = E[X] + b
E[aX + b] = a × E[X] + b

Example: If E[X] = 10, then E[3X + 5] = 3(10) + 5 = 35

Non-Linearity Warning

For non-linear functions, expected value doesn't pass through:

E[X²] ≠ (E[X])²    (in general)
E[log(X)] ≠ log(E[X])

This matters for loss functions!

Expected Value of Common Distributions

Distribution	Expected Value
Bernoulli(p)	p
Binomial(n, p)	n × p
Geometric(p)	1/p
Poisson(λ)	λ
Uniform(a, b)	(a + b) / 2
Normal(μ, σ²)	μ
Exponential(λ)	1/λ

Expected Loss in Machine Learning

Training minimizes expected loss over the data distribution:

E[Loss] = Σ L(model(x), y) × P(x, y)

In practice, we approximate with the training set:

Empirical Loss ≈ (1/n) × Σ L(model(xᵢ), yᵢ)

Cross-Entropy Loss

For classification:

E[Cross-Entropy] = -E[log P(correct class)]

Minimizing this means maximizing the probability assigned to correct answers.

Expected Value in Decision Making

The Decision Rule

When choosing between actions, pick the one with highest expected value:

Action A: Win $100 with P=0.3, lose $20 with P=0.7

E[A] = 100×0.3 + (-20)×0.7 = 30 - 14 = $16

Action B: Win $40 with P=0.8, lose $10 with P=0.2

E[B] = 40×0.8 + (-10)×0.2 = 32 - 2 = $30

Choose B for higher expected value.

Expected Value vs. Risk

Expected value doesn't capture risk:

Option A: Guaranteed $1,000,000

E[A] = $1,000,000

Option B: $10,000,000 with P=0.11, $0 with P=0.89

E[B] = 10,000,000×0.11 + 0×0.89 = $1,100,000

Option B has higher expected value, but most people would choose A!

This is why variance (next lesson) matters.

Sample Mean vs. Expected Value

Expected Value (E[X]): Theoretical average (requires knowing the distribution)

Sample Mean (x̄): Average of observed samples

x̄ = (1/n) × Σ xᵢ

The Law of Large Numbers says: as n → ∞, the sample mean approaches the expected value.

Computing Expected Values

From a Probability Distribution

def expected_value(values, probabilities):
    return sum(v * p for v, p in zip(values, probabilities))

# Dice example
values = [1, 2, 3, 4, 5, 6]
probs = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
ev = expected_value(values, probs)  # 3.5

From Samples (Monte Carlo)

import random

def monte_carlo_expected_value(sample_fn, n_samples=10000):
    return sum(sample_fn() for _ in range(n_samples)) / n_samples

# Example: Expected value of dice roll
ev = monte_carlo_expected_value(lambda: random.randint(1, 6))  # ≈ 3.5

Expected Value in Neural Networks

Forward Pass

Each layer computes weighted sums—essentially expected values:

output = Σ wᵢ × inputᵢ

Batch Normalization

Uses sample means (approximating expected values) for normalization:

x_normalized = (x - E[x]) / sqrt(Var[x])

Dropout Expectation

During training with dropout rate p:

Some neurons are randomly "dropped" (set to 0)
At inference, we multiply by (1-p) to maintain expected activation

Training: randomly drop neurons
Inference: multiply all activations by (1-p)

This ensures E[training output] = E[inference output].

Conditional Expected Value

Expected value given some condition:

E[X | Y = y]

Example: Expected height given someone is a basketball player vs. general population.

Law of Total Expectation

E[X] = Σ E[X | Y = y] × P(Y = y)

The overall expected value is a weighted average of conditional expected values.

Summary

Expected value is the average outcome over many repetitions
Calculate by summing (outcome × probability) for all outcomes
Expected value is linear: E[X + Y] = E[X] + E[Y]
AI systems often optimize for maximum expected reward/minimum expected loss
Sample means approximate expected values (Law of Large Numbers)
Expected value doesn't capture risk—that's what variance is for

Next, we'll learn about variance and standard deviation—measuring how spread out outcomes are around the expected value.