What Is a Derivative?
The derivative is the single most important concept in calculus for machine learning. It answers a simple question: if I change the input by a tiny amount, how much does the output change? This is exactly what a model needs to know during training — how changing a weight affects the loss.
Rate of Change
Consider the function f(x) = x². When x = 3, f(3) = 9. What happens if we increase x slightly?
| x | f(x) = x² | Change in x | Change in f(x) |
|---|---|---|---|
| 3.0 | 9.00 | — | — |
| 3.1 | 9.61 | 0.1 | 0.61 |
| 3.01 | 9.0601 | 0.01 | 0.0601 |
| 3.001 | 9.006001 | 0.001 | 0.006001 |
As the change in x gets smaller, the ratio (change in f) / (change in x) approaches a specific value:
0.61 / 0.1 = 6.1
0.0601 / 0.01 = 6.01
0.006001 / 0.001 = 6.001
The ratio approaches 6. This is the derivative of f(x) = x² at x = 3.
The Derivative as a Slope
Geometrically, the derivative at a point is the slope of the tangent line at that point.
f(x)
^
| /
| / tangent line (slope = 6)
| ./
| .·
| .· f(x) = x²
|.·
+-----------> x
3
- A positive derivative means the function is increasing (slopes upward)
- A negative derivative means the function is decreasing (slopes downward)
- A zero derivative means the function is flat at that point (a peak, valley, or plateau)
Formal Notation
The derivative of f(x) with respect to x is written as:
f'(x) = lim f(x + h) - f(x)
h → 0 ─────────────────
h
Other common notations:
| Notation | Read As |
|---|---|
| f'(x) | "f prime of x" |
| df/dx | "the derivative of f with respect to x" |
| dy/dx | "the derivative of y with respect to x" (when y = f(x)) |
In machine learning papers, you will see df/dx most often. The notation makes it clear what is changing (x) and what we are measuring the change of (f).
Computing a Simple Derivative
Let us compute the derivative of f(x) = x² from scratch:
f'(x) = lim (x + h)² - x²
h→0 ─────────────
h
= lim x² + 2xh + h² - x²
h→0 ─────────────────────
h
= lim 2xh + h²
h→0 ─────────
h
= lim 2x + h
h→0
= 2x
So f'(x) = 2x. At x = 3, the derivative is 2(3) = 6, confirming our numerical experiment.
Why This Matters for ML
In machine learning, the function f represents the loss (how wrong the model is), and x represents a model parameter (a weight or bias). The derivative f'(x) = df/dx tells the model:
- Direction: Should I increase or decrease this parameter? (sign of the derivative)
- Magnitude: How sensitive is the loss to this parameter? (absolute value of the derivative)
Example: A Simple Linear Model
Suppose your model predicts house prices using a single weight w:
prediction = w × square_footage
If the true price is $400,000 and your model predicts $350,000, the squared error loss is:
L(w) = (prediction - true_price)²
= (w × square_footage - 400000)²
The derivative dL/dw tells you: if you increase w by a tiny amount, does the loss go up or down? This is the information gradient descent uses to update w.
Instantaneous vs. Average Rate of Change
The average rate of change between two points is the slope of the line connecting them:
average rate = f(b) - f(a)
─────────────
b - a
The derivative is the instantaneous rate of change — what happens as b approaches a. It is the slope at a single point, not between two points.
In ML, we care about the instantaneous rate because we want to know the exact direction to move a parameter right now, at its current value.
Differentiability
A function is differentiable at a point if its derivative exists there. Most functions used in ML are differentiable almost everywhere — this is by design. Researchers choose smooth loss functions and activation functions specifically so that derivatives can be computed.
Notable exceptions:
- The ReLU activation function (f(x) = max(0, x)) has a corner at x = 0 where the derivative is technically undefined. In practice, frameworks assign f'(0) = 0 and it works fine.
- The absolute value function |x| also has a corner at 0.
Summary
- The derivative measures how fast a function's output changes as its input changes
- It is the slope of the tangent line at a specific point
- Positive derivative means the function is increasing; negative means decreasing; zero means flat
- For f(x) = x², the derivative is f'(x) = 2x
- In ML, the derivative tells a model which direction to adjust a parameter to reduce loss
- The derivative provides both direction (sign) and sensitivity (magnitude)
- ML uses differentiable functions by design so that derivatives always exist
The next lesson covers the specific derivative rules you need for machine learning — no more, no less.

