From Scalars to Tensors
Throughout this course, you have worked with scalars, vectors, and matrices. These are actually all special cases of a more general structure called a tensor. Tensors are the core data structure of modern deep learning. Every input, output, weight, and gradient in a neural network is stored as a tensor. Understanding tensors is understanding the language that AI frameworks speak.
Building Up: Scalars, Vectors, Matrices, and Beyond
You already know the building blocks. A tensor is simply the generalization of these familiar objects to any number of dimensions.
| Structure | Rank (Order) | Dimensions | Example |
|---|---|---|---|
| Scalar | 0 | 0D | A single number: 7.5 |
| Vector | 1 | 1D | A list of numbers: [3, 7, 2] |
| Matrix | 2 | 2D | A grid of numbers: 3x4 table |
| 3D Tensor | 3 | 3D | A cube of numbers |
| 4D Tensor | 4 | 4D | A batch of cubes |
| nD Tensor | n | nD | An n-dimensional array |
The rank (also called order) of a tensor is the number of dimensions, or axes, it has. A scalar has rank 0, a vector has rank 1, a matrix has rank 2, and so on. The shape describes the size along each axis.
Shape and Rank in Practice
Every tensor has a shape, which is a tuple of integers describing how many elements exist along each axis.
Scalar: shape = () rank = 0 (just the number 5.0)
Vector: shape = (3,) rank = 1 ([3, 7, 2])
Matrix: shape = (3, 4) rank = 2 (3 rows, 4 columns)
3D Tensor: shape = (2, 3, 4) rank = 3 (2 layers of 3x4 matrices)
In Python, you can always check a tensor's shape. In NumPy: array.shape. In PyTorch: tensor.shape. In TensorFlow: tensor.shape. The shape tells you everything about the structure of your data.
3D Tensors: Batches of Data
A 3D tensor is a stack of matrices. The most common use in AI is a batch of data samples.
Grayscale image batch: shape = (batch_size, height, width)
Example: shape = (32, 28, 28)
This represents 32 grayscale images, each 28x28 pixels (like the MNIST handwritten digit dataset). Each "slice" along the first axis is one image matrix.
4D Tensors: Color Images
Color images add a channels dimension for red, green, and blue.
Color image batch: shape = (batch, channels, height, width)
Example: shape = (32, 3, 224, 224)
This represents 32 color images, each with 3 color channels (RGB) and a resolution of 224x224 pixels. This is the standard input shape for image classification models like ResNet.
5D Tensors: Video Data
Video adds a frames (time) dimension on top of color images.
Video batch: shape = (batch, frames, channels, height, width)
Example: shape = (8, 16, 3, 112, 112)
This represents 8 video clips, each containing 16 frames of 3-channel color images at 112x112 resolution. Video understanding models like SlowFast operate on tensors with exactly this structure.
Tensor Terminology
A few terms you will encounter frequently:
- Axis (or dimension): One of the indices needed to locate an element. A 3D tensor has 3 axes.
- Shape: The tuple of sizes along each axis, e.g., (32, 3, 224, 224).
- Rank: The total number of axes. Also called order or ndim (number of dimensions).
- Element: A single number within the tensor, accessed by specifying one index per axis.
- Slice: A sub-tensor obtained by fixing one or more axes to specific values.
Why AI Needs Higher Dimensions
Neural networks do not process one data sample at a time. They process batches for efficiency. This immediately adds one dimension. Then the data itself may be multi-dimensional (images have height, width, and channels). Combine these and you quickly reach 4D or 5D tensors.
- Batching: Processing 32 or 64 samples at once for faster GPU utilization
- Channels: Color images have RGB channels; feature maps in CNNs have dozens or hundreds of channels
- Sequences: Text data has a sequence length dimension (one position per token)
- Attention heads: Transformers split representations across multiple attention heads, adding yet another dimension
AI Frameworks and Tensors
All major AI frameworks are built around tensor operations.
| Framework | Tensor Type | Example Creation |
|---|---|---|
| NumPy | ndarray | np.zeros((3, 4, 5)) |
| PyTorch | Tensor | torch.zeros(3, 4, 5) |
| TensorFlow | Tensor | tf.zeros((3, 4, 5)) |
These libraries provide optimized operations on tensors that run on CPUs and GPUs. When you hear "tensor" in the context of TensorFlow or PyTorch, it means exactly the multi-dimensional array we have been describing -- the generalization of vectors and matrices to any number of dimensions.
Summary
- Scalars, vectors, and matrices are tensors of rank 0, 1, and 2 respectively
- A tensor is a multi-dimensional array of numbers with a defined shape and rank
- 3D tensors commonly represent batches of data (e.g., a batch of grayscale images)
- 4D tensors represent batches of multi-channel data (e.g., color image batches)
- 5D tensors represent batches of sequential multi-channel data (e.g., video)
- AI needs higher-dimensional tensors because of batching, channels, sequences, and attention heads
- PyTorch, TensorFlow, and NumPy are all built around efficient tensor operations
Now that you understand tensor structure, the next lesson covers how to manipulate tensors -- reshaping, broadcasting, slicing, and the operations that make deep learning possible.

