Events and Sample Spaces
Before we can calculate probabilities, we need to define what we're measuring. This is where sample spaces and events come in—the fundamental building blocks of probability theory.
Sample Space: All Possibilities
The sample space (denoted Ω or S) is the set of all possible outcomes of an experiment or situation.
Examples
Coin flip:
S = {Heads, Tails}
Rolling a die:
S = {1, 2, 3, 4, 5, 6}
Language model predicting the next word after "The cat":
S = {sat, ran, slept, jumped, meowed, is, was, ...}
// Includes every word in the vocabulary (often 50,000+ tokens)
Image classifier output:
S = {cat, dog, bird, car, person, ...}
// Includes every class the model can predict
Events: Subsets of Outcomes
An event is any subset of the sample space—a collection of outcomes we're interested in.
Simple Events
A simple event is a single outcome:
- Event A: "Rolling a 3"
- Event B: "The model predicts 'cat'"
Compound Events
A compound event combines multiple outcomes:
- Event C: "Rolling an even number" =
{2, 4, 6} - Event D: "Model predicts an animal" =
{cat, dog, bird, fish, ...}
Visualizing Sample Spaces
For small sample spaces, we can list all outcomes. For AI applications, sample spaces are often enormous:
| Application | Sample Space Size |
|---|---|
| Coin flip | 2 |
| Die roll | 6 |
| Deck of cards | 52 |
| MNIST digit classification | 10 |
| ImageNet classification | 1,000 |
| GPT vocabulary | ~50,000 |
| Possible chess games | ~10^120 |
Probability Over Sample Spaces
Every outcome in the sample space has a probability, and all probabilities must sum to 1:
∑ P(outcome) = 1
for all outcomes in S
Fair Die Example
For a fair die:
- P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
- Sum: 6 × (1/6) = 1 ✓
Biased Classifier Example
A sentiment classifier might have:
- P(positive) = 0.45
- P(negative) = 0.35
- P(neutral) = 0.20
- Sum: 0.45 + 0.35 + 0.20 = 1 ✓
Event Operations
We can combine events using set operations:
Union (OR): A ∪ B
All outcomes in A or B (or both).
Example: "Rolling a 1 OR rolling a 2" = {1, 2}
Intersection (AND): A ∩ B
Outcomes in both A and B.
Example: "Rolling an even number AND rolling less than 4"
- Even =
{2, 4, 6} - Less than 4 =
{1, 2, 3} - Intersection =
{2}
Complement (NOT): A'
All outcomes not in A.
Example: "Not rolling a 6" = {1, 2, 3, 4, 5}
Mutually Exclusive Events
Events are mutually exclusive (or disjoint) if they can't both happen:
- Rolling a 1 and rolling a 6 are mutually exclusive
- A classifier predicting "cat" and "dog" for the same image are mutually exclusive (assuming single-label classification)
For mutually exclusive events:
P(A or B) = P(A) + P(B)
Events in AI Systems
Text Classification
Sample space: All possible class labels
S = {spam, not_spam}
Events might include:
- A: "Email is classified as spam"
- B: "Confidence > 0.9"
Object Detection
Sample space: All possible combinations of (class, bounding_box)
The space is infinite because bounding boxes have continuous coordinates!
Generative Models
For a language model generating a 10-word sentence with a 50,000-word vocabulary:
Sample space size = 50,000^10 = 10^47 possible sentences!
This is why sampling strategies (which we'll cover later) are so important.
Calculating Event Probabilities
The probability of an event is the sum of probabilities of all outcomes in that event:
P(A) = ∑ P(outcome)
for all outcomes in A
Example: Even Number on a Die
Event A = "Rolling an even number" = {2, 4, 6}
P(A) = P(2) + P(4) + P(6)
= 1/6 + 1/6 + 1/6
= 3/6 = 0.5
Example: High-Confidence Prediction
If a model's probability distribution for classifying an image is:
- P(cat) = 0.75
- P(dog) = 0.15
- P(other) = 0.10
Event B = "Model predicts cat or dog" = {cat, dog}
P(B) = P(cat) + P(dog) = 0.75 + 0.15 = 0.90
Sample Spaces in Neural Networks
The final layer of a classification neural network produces a probability distribution over the sample space of classes.
For a 3-class classifier:
Input Image → Neural Network → [0.7, 0.2, 0.1]
↓ ↓ ↓
cat dog bird
This output is a probability distribution over the sample space {cat, dog, bird}.
The softmax function (which we'll explore in Module 3) ensures:
- All values are between 0 and 1
- All values sum to 1
Summary
- The sample space contains all possible outcomes
- An event is a subset of outcomes we care about
- Probabilities across the sample space must sum to 1
- Events can be combined with union, intersection, and complement
- AI systems define sample spaces based on their possible outputs
- Neural networks output probability distributions over their sample space
Next, we'll explore conditional probability—how the probability of one event changes given knowledge of another.

