Expected Value: Predicting Average Outcomes
Expected value is one of the most practical concepts in probability. It tells us what to expect "on average" from a random process—essential for AI systems that need to make decisions under uncertainty.
What is Expected Value?
The expected value (or expectation) is the average outcome you'd get if you repeated an experiment infinitely many times.
Notation: E[X] or μ
Formula (discrete):
E[X] = Σ xᵢ × P(X = xᵢ)
Multiply each outcome by its probability, then sum.
Simple Example: Dice Roll
For a fair 6-sided die:
E[X] = 1×(1/6) + 2×(1/6) + 3×(1/6) + 4×(1/6) + 5×(1/6) + 6×(1/6)
= (1 + 2 + 3 + 4 + 5 + 6) / 6
= 21/6
= 3.5
The expected value is 3.5—even though you can never actually roll 3.5!
This means: if you roll the die many times, your average will approach 3.5.
Expected Value in AI
Classification Confidence
A classifier outputs probabilities [0.8, 0.15, 0.05] for classes worth [1, 0, 0] (correct or wrong).
Expected "correctness":
E[Correct] = 1×0.8 + 0×0.15 + 0×0.05 = 0.8
The model expects to be correct 80% of the time on inputs like this.
Recommendation Systems
Expected rating for recommending a movie:
E[Rating] = 5×0.1 + 4×0.3 + 3×0.4 + 2×0.15 + 1×0.05
= 0.5 + 1.2 + 1.2 + 0.3 + 0.05
= 3.25 stars
Reinforcement Learning
Expected reward from an action:
E[Reward] = Σ reward × P(reward | action)
AI agents choose actions that maximize expected reward.
Properties of Expected Value
Linearity
The expected value of a sum equals the sum of expected values:
E[X + Y] = E[X] + E[Y]
This works even if X and Y are dependent!
Example: Rolling two dice
E[Sum] = E[Die1] + E[Die2] = 3.5 + 3.5 = 7
Scaling
E[aX] = a × E[X]
E[X + b] = E[X] + b
E[aX + b] = a × E[X] + b
Example: If E[X] = 10, then E[3X + 5] = 3(10) + 5 = 35
Non-Linearity Warning
For non-linear functions, expected value doesn't pass through:
E[X²] ≠ (E[X])² (in general)
E[log(X)] ≠ log(E[X])
This matters for loss functions!
Expected Value of Common Distributions
| Distribution | Expected Value |
|---|---|
| Bernoulli(p) | p |
| Binomial(n, p) | n × p |
| Geometric(p) | 1/p |
| Poisson(λ) | λ |
| Uniform(a, b) | (a + b) / 2 |
| Normal(μ, σ²) | μ |
| Exponential(λ) | 1/λ |
Expected Loss in Machine Learning
Training minimizes expected loss over the data distribution:
E[Loss] = Σ L(model(x), y) × P(x, y)
In practice, we approximate with the training set:
Empirical Loss ≈ (1/n) × Σ L(model(xᵢ), yᵢ)
Cross-Entropy Loss
For classification:
E[Cross-Entropy] = -E[log P(correct class)]
Minimizing this means maximizing the probability assigned to correct answers.
Expected Value in Decision Making
The Decision Rule
When choosing between actions, pick the one with highest expected value:
Action A: Win $100 with P=0.3, lose $20 with P=0.7
E[A] = 100×0.3 + (-20)×0.7 = 30 - 14 = $16
Action B: Win $40 with P=0.8, lose $10 with P=0.2
E[B] = 40×0.8 + (-10)×0.2 = 32 - 2 = $30
Choose B for higher expected value.
Expected Value vs. Risk
Expected value doesn't capture risk:
Option A: Guaranteed $1,000,000
E[A] = $1,000,000
Option B: $10,000,000 with P=0.11, $0 with P=0.89
E[B] = 10,000,000×0.11 + 0×0.89 = $1,100,000
Option B has higher expected value, but most people would choose A!
This is why variance (next lesson) matters.
Sample Mean vs. Expected Value
Expected Value (E[X]): Theoretical average (requires knowing the distribution)
Sample Mean (x̄): Average of observed samples
x̄ = (1/n) × Σ xᵢ
The Law of Large Numbers says: as n → ∞, the sample mean approaches the expected value.
Computing Expected Values
From a Probability Distribution
def expected_value(values, probabilities):
return sum(v * p for v, p in zip(values, probabilities))
# Dice example
values = [1, 2, 3, 4, 5, 6]
probs = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
ev = expected_value(values, probs) # 3.5
From Samples (Monte Carlo)
import random
def monte_carlo_expected_value(sample_fn, n_samples=10000):
return sum(sample_fn() for _ in range(n_samples)) / n_samples
# Example: Expected value of dice roll
ev = monte_carlo_expected_value(lambda: random.randint(1, 6)) # ≈ 3.5
Expected Value in Neural Networks
Forward Pass
Each layer computes weighted sums—essentially expected values:
output = Σ wᵢ × inputᵢ
Batch Normalization
Uses sample means (approximating expected values) for normalization:
x_normalized = (x - E[x]) / sqrt(Var[x])
Dropout Expectation
During training with dropout rate p:
- Some neurons are randomly "dropped" (set to 0)
- At inference, we multiply by (1-p) to maintain expected activation
Training: randomly drop neurons
Inference: multiply all activations by (1-p)
This ensures E[training output] = E[inference output].
Conditional Expected Value
Expected value given some condition:
E[X | Y = y]
Example: Expected height given someone is a basketball player vs. general population.
Law of Total Expectation
E[X] = Σ E[X | Y = y] × P(Y = y)
The overall expected value is a weighted average of conditional expected values.
Summary
- Expected value is the average outcome over many repetitions
- Calculate by summing (outcome × probability) for all outcomes
- Expected value is linear: E[X + Y] = E[X] + E[Y]
- AI systems often optimize for maximum expected reward/minimum expected loss
- Sample means approximate expected values (Law of Large Numbers)
- Expected value doesn't capture risk—that's what variance is for
Next, we'll learn about variance and standard deviation—measuring how spread out outcomes are around the expected value.

