Functions of Several Variables
Every machine learning model is a function with multiple inputs. A neural network might have millions of parameters — weights and biases — that together determine its predictions. To understand how training works, you need to work with functions that take many inputs instead of just one.
From One Input to Many
In the previous module, we worked with functions like f(x) = x², which take a single number and return a single number. But real ML models look more like this:
prediction = w₁x₁ + w₂x₂ + w₃x₃ + b
This function has seven inputs: three weights (w₁, w₂, w₃), three features (x₁, x₂, x₃), and one bias (b). During training, the features are fixed (they come from your data), while the weights and bias are adjusted to reduce loss.
Notation for Multivariable Functions
A function of several variables is written as:
f(x₁, x₂, ..., xₙ)
or more compactly using vector notation:
f(x) where x = [x₁, x₂, ..., xₙ]
Example: Two-Variable Loss Function
Consider a simple model with two parameters, w₁ and w₂:
L(w₁, w₂) = (w₁ · x₁ + w₂ · x₂ - target)²
This loss function takes two weights as input and returns a single number: how wrong the model is. The goal of training is to find the values of w₁ and w₂ that minimize L.
Visualizing Functions of Two Variables
A function of one variable produces a curve in 2D. A function of two variables produces a surface in 3D.
L (loss)
^
| . . . .
| . mountain .
| . peak .
|. .
| . ___ .
| . / valley\ .
| ./ * \. * = minimum
+──────────────────> w₁
/
w₂
Each point (w₁, w₂) on the horizontal plane corresponds to a specific pair of parameter values, and the height above that point is the loss. Training means finding the lowest point on this surface.
The Loss Surface
In real ML, the loss function depends on all model parameters simultaneously. For a neural network with 1 million parameters, the loss surface lives in 1,000,001-dimensional space (1 million parameter dimensions plus 1 loss dimension). You cannot visualize it, but the math works identically to the 2D case.
Key properties of loss surfaces:
| Property | Meaning | Implication |
|---|---|---|
| Global minimum | The absolute lowest point on the surface | The best possible parameter values |
| Local minimum | A low point surrounded by higher values | Good but possibly not the best |
| Saddle point | Low in some directions, high in others | Looks like a minimum in some views but not all |
| Plateau | A flat region where loss barely changes | Gradients near zero, slow training |
Level Curves (Contour Lines)
When we cannot visualize 3D surfaces, we use contour plots — the same idea as elevation lines on a topographic map. Each line connects points with the same loss value.
w₂
^
| ╭──────╮
| ╭─┤ ├─╮
| ╭┤ ╭────╮ ├╮
| │ ╭─┤ * ├─╮ │ * = minimum
| ╰┤ ╰────╯ ├╯ Lines = equal loss
| ╰─┤ ├─╯
| ╰──────╯
+──────────────> w₁
If the contour lines form concentric circles, the loss surface is "bowl-shaped" and gradient descent works well. If they form elongated ellipses, the surface is like a narrow valley, and training can be slower.
How ML Frameworks Handle Multivariable Functions
In practice, you define a model and a loss function, and the framework handles the multivariable calculus:
# PyTorch example (conceptual)
def model(x, w1, w2, b):
return w1 * x[0] + w2 * x[1] + b
def loss(prediction, target):
return (prediction - target) ** 2
The framework then computes how the loss changes with respect to each parameter independently. This is the subject of the next lesson: partial derivatives.
Why Multiple Inputs Complicate Things
With one input, there is only one direction to move: increase x or decrease x. With multiple inputs, there are many directions:
- Change w₁ only
- Change w₂ only
- Change both simultaneously
- Change in any combination of directions
The question becomes: which direction reduces the loss most efficiently? To answer this, we need to understand how the loss responds to each parameter individually — which is exactly what partial derivatives provide.
Summary
- ML models are functions of many variables (weights, biases, and inputs)
- A loss function maps all model parameters to a single number (the error)
- The loss surface is a high-dimensional landscape where the height represents the error
- Loss surfaces have global minima, local minima, saddle points, and plateaus
- Contour plots (level curves) visualize 2D slices of the loss surface
- With multiple inputs, we need to determine how the loss changes with respect to each parameter independently
- Partial derivatives, covered in the next lesson, provide exactly this capability
Next, you will learn how to compute the derivative with respect to one variable while holding all others constant.

