From Scalars to Tensors

Throughout this course, you have worked with scalars, vectors, and matrices. These are actually all special cases of a more general structure called a tensor. Tensors are the core data structure of modern deep learning. Every input, output, weight, and gradient in a neural network is stored as a tensor. Understanding tensors is understanding the language that AI frameworks speak.

Building Up: Scalars, Vectors, Matrices, and Beyond

You already know the building blocks. A tensor is simply the generalization of these familiar objects to any number of dimensions.

Structure	Rank (Order)	Dimensions	Example
Scalar	0	0D	A single number: 7.5
Vector	1	1D	A list of numbers: [3, 7, 2]
Matrix	2	2D	A grid of numbers: 3x4 table
3D Tensor	3	3D	A cube of numbers
4D Tensor	4	4D	A batch of cubes
nD Tensor	n	nD	An n-dimensional array

The rank (also called order) of a tensor is the number of dimensions, or axes, it has. A scalar has rank 0, a vector has rank 1, a matrix has rank 2, and so on. The shape describes the size along each axis.

Shape and Rank in Practice

Every tensor has a shape, which is a tuple of integers describing how many elements exist along each axis.

Scalar:    shape = ()           rank = 0    (just the number 5.0)
Vector:    shape = (3,)         rank = 1    ([3, 7, 2])
Matrix:    shape = (3, 4)       rank = 2    (3 rows, 4 columns)
3D Tensor: shape = (2, 3, 4)   rank = 3    (2 layers of 3x4 matrices)

In Python, you can always check a tensor's shape. In NumPy: array.shape. In PyTorch: tensor.shape. In TensorFlow: tensor.shape. The shape tells you everything about the structure of your data.

3D Tensors: Batches of Data

A 3D tensor is a stack of matrices. The most common use in AI is a batch of data samples.

Grayscale image batch: shape = (batch_size, height, width)
Example:               shape = (32, 28, 28)

This represents 32 grayscale images, each 28x28 pixels (like the MNIST handwritten digit dataset). Each "slice" along the first axis is one image matrix.

4D Tensors: Color Images

Color images add a channels dimension for red, green, and blue.

Color image batch: shape = (batch, channels, height, width)
Example:           shape = (32, 3, 224, 224)

This represents 32 color images, each with 3 color channels (RGB) and a resolution of 224x224 pixels. This is the standard input shape for image classification models like ResNet.

5D Tensors: Video Data

Video adds a frames (time) dimension on top of color images.

Video batch: shape = (batch, frames, channels, height, width)
Example:     shape = (8, 16, 3, 112, 112)

This represents 8 video clips, each containing 16 frames of 3-channel color images at 112x112 resolution. Video understanding models like SlowFast operate on tensors with exactly this structure.

Tensor Terminology

A few terms you will encounter frequently:

Axis (or dimension): One of the indices needed to locate an element. A 3D tensor has 3 axes.
Shape: The tuple of sizes along each axis, e.g., (32, 3, 224, 224).
Rank: The total number of axes. Also called order or ndim (number of dimensions).
Element: A single number within the tensor, accessed by specifying one index per axis.
Slice: A sub-tensor obtained by fixing one or more axes to specific values.

Why AI Needs Higher Dimensions

Neural networks do not process one data sample at a time. They process batches for efficiency. This immediately adds one dimension. Then the data itself may be multi-dimensional (images have height, width, and channels). Combine these and you quickly reach 4D or 5D tensors.

Batching: Processing 32 or 64 samples at once for faster GPU utilization
Channels: Color images have RGB channels; feature maps in CNNs have dozens or hundreds of channels
Sequences: Text data has a sequence length dimension (one position per token)
Attention heads: Transformers split representations across multiple attention heads, adding yet another dimension

AI Frameworks and Tensors

All major AI frameworks are built around tensor operations.

Framework	Tensor Type	Example Creation
NumPy	ndarray	`np.zeros((3, 4, 5))`
PyTorch	Tensor	`torch.zeros(3, 4, 5)`
TensorFlow	Tensor	`tf.zeros((3, 4, 5))`

These libraries provide optimized operations on tensors that run on CPUs and GPUs. When you hear "tensor" in the context of TensorFlow or PyTorch, it means exactly the multi-dimensional array we have been describing -- the generalization of vectors and matrices to any number of dimensions.

Summary

Scalars, vectors, and matrices are tensors of rank 0, 1, and 2 respectively
A tensor is a multi-dimensional array of numbers with a defined shape and rank
3D tensors commonly represent batches of data (e.g., a batch of grayscale images)
4D tensors represent batches of multi-channel data (e.g., color image batches)
5D tensors represent batches of sequential multi-channel data (e.g., video)
AI needs higher-dimensional tensors because of batching, channels, sequences, and attention heads
PyTorch, TensorFlow, and NumPy are all built around efficient tensor operations

Now that you understand tensor structure, the next lesson covers how to manipulate tensors -- reshaping, broadcasting, slicing, and the operations that make deep learning possible.