Events and Sample Spaces

Before we can calculate probabilities, we need to define what we're measuring. This is where sample spaces and events come in—the fundamental building blocks of probability theory.

Sample Space: All Possibilities

The sample space (denoted Ω or S) is the set of all possible outcomes of an experiment or situation.

Examples

Coin flip:

S = {Heads, Tails}

Rolling a die:

S = {1, 2, 3, 4, 5, 6}

Language model predicting the next word after "The cat":

S = {sat, ran, slept, jumped, meowed, is, was, ...}
// Includes every word in the vocabulary (often 50,000+ tokens)

Image classifier output:

S = {cat, dog, bird, car, person, ...}
// Includes every class the model can predict

Events: Subsets of Outcomes

An event is any subset of the sample space—a collection of outcomes we're interested in.

Simple Events

A simple event is a single outcome:

Event A: "Rolling a 3"
Event B: "The model predicts 'cat'"

Compound Events

A compound event combines multiple outcomes:

Event C: "Rolling an even number" = {2, 4, 6}
Event D: "Model predicts an animal" = {cat, dog, bird, fish, ...}

Visualizing Sample Spaces

For small sample spaces, we can list all outcomes. For AI applications, sample spaces are often enormous:

Application	Sample Space Size
Coin flip	2
Die roll	6
Deck of cards	52
MNIST digit classification	10
ImageNet classification	1,000
GPT vocabulary	~50,000
Possible chess games	~10^120

Probability Over Sample Spaces

Every outcome in the sample space has a probability, and all probabilities must sum to 1:

∑ P(outcome) = 1
   for all outcomes in S

Fair Die Example

For a fair die:

P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
Sum: 6 × (1/6) = 1 ✓

Biased Classifier Example

A sentiment classifier might have:

P(positive) = 0.45
P(negative) = 0.35
P(neutral) = 0.20
Sum: 0.45 + 0.35 + 0.20 = 1 ✓

Event Operations

We can combine events using set operations:

Union (OR): A ∪ B

All outcomes in A or B (or both).

Example: "Rolling a 1 OR rolling a 2" = {1, 2}

Intersection (AND): A ∩ B

Outcomes in both A and B.

Example: "Rolling an even number AND rolling less than 4"

Even = {2, 4, 6}
Less than 4 = {1, 2, 3}
Intersection = {2}

Complement (NOT): A'

All outcomes not in A.

Example: "Not rolling a 6" = {1, 2, 3, 4, 5}

Mutually Exclusive Events

Events are mutually exclusive (or disjoint) if they can't both happen:

Rolling a 1 and rolling a 6 are mutually exclusive
A classifier predicting "cat" and "dog" for the same image are mutually exclusive (assuming single-label classification)

For mutually exclusive events:

P(A or B) = P(A) + P(B)

Events in AI Systems

Text Classification

Sample space: All possible class labels

S = {spam, not_spam}

Events might include:

A: "Email is classified as spam"
B: "Confidence > 0.9"

Object Detection

Sample space: All possible combinations of (class, bounding_box)

The space is infinite because bounding boxes have continuous coordinates!

Generative Models

For a language model generating a 10-word sentence with a 50,000-word vocabulary:

Sample space size = 50,000^10 = 10^47 possible sentences!

This is why sampling strategies (which we'll cover later) are so important.

Calculating Event Probabilities

The probability of an event is the sum of probabilities of all outcomes in that event:

P(A) = ∑ P(outcome)
       for all outcomes in A

Example: Even Number on a Die

Event A = "Rolling an even number" = {2, 4, 6}

P(A) = P(2) + P(4) + P(6)
     = 1/6 + 1/6 + 1/6
     = 3/6 = 0.5

Example: High-Confidence Prediction

If a model's probability distribution for classifying an image is:

P(cat) = 0.75
P(dog) = 0.15
P(other) = 0.10

Event B = "Model predicts cat or dog" = {cat, dog}

P(B) = P(cat) + P(dog) = 0.75 + 0.15 = 0.90

Sample Spaces in Neural Networks

The final layer of a classification neural network produces a probability distribution over the sample space of classes.

For a 3-class classifier:

Input Image → Neural Network → [0.7, 0.2, 0.1]
                                  ↓     ↓     ↓
                                cat   dog  bird

This output is a probability distribution over the sample space {cat, dog, bird}.

The softmax function (which we'll explore in Module 3) ensures:

All values are between 0 and 1
All values sum to 1

Summary

The sample space contains all possible outcomes
An event is a subset of outcomes we care about
Probabilities across the sample space must sum to 1
Events can be combined with union, intersection, and complement
AI systems define sample spaces based on their possible outputs
Neural networks output probability distributions over their sample space

Next, we'll explore conditional probability—how the probability of one event changes given knowledge of another.