What Is a Derivative?

The derivative is the single most important concept in calculus for machine learning. It answers a simple question: if I change the input by a tiny amount, how much does the output change? This is exactly what a model needs to know during training — how changing a weight affects the loss.

Rate of Change

Consider the function f(x) = x². When x = 3, f(3) = 9. What happens if we increase x slightly?

x	f(x) = x²	Change in x	Change in f(x)
3.0	9.00	—	—
3.1	9.61	0.1	0.61
3.01	9.0601	0.01	0.0601
3.001	9.006001	0.001	0.006001

As the change in x gets smaller, the ratio (change in f) / (change in x) approaches a specific value:

0.61 / 0.1   = 6.1
0.0601 / 0.01  = 6.01
0.006001 / 0.001 = 6.001

The ratio approaches 6. This is the derivative of f(x) = x² at x = 3.

The Derivative as a Slope

Geometrically, the derivative at a point is the slope of the tangent line at that point.

f(x)
 ^
 |         /
 |        / tangent line (slope = 6)
 |      ./
 |    .·
 |  .·     f(x) = x²
 |.·
 +-----------> x
        3

A positive derivative means the function is increasing (slopes upward)
A negative derivative means the function is decreasing (slopes downward)
A zero derivative means the function is flat at that point (a peak, valley, or plateau)

Formal Notation

The derivative of f(x) with respect to x is written as:

f'(x) = lim     f(x + h) - f(x)
        h → 0   ─────────────────
                       h

Other common notations:

Notation	Read As
f'(x)	"f prime of x"
df/dx	"the derivative of f with respect to x"
dy/dx	"the derivative of y with respect to x" (when y = f(x))

In machine learning papers, you will see df/dx most often. The notation makes it clear what is changing (x) and what we are measuring the change of (f).

Computing a Simple Derivative

Let us compute the derivative of f(x) = x² from scratch:

f'(x) = lim   (x + h)² - x²
        h→0   ─────────────
                    h

      = lim   x² + 2xh + h² - x²
        h→0   ─────────────────────
                      h

      = lim   2xh + h²
        h→0   ─────────
                 h

      = lim   2x + h
        h→0

      = 2x

So f'(x) = 2x. At x = 3, the derivative is 2(3) = 6, confirming our numerical experiment.

Why This Matters for ML

In machine learning, the function f represents the loss (how wrong the model is), and x represents a model parameter (a weight or bias). The derivative f'(x) = df/dx tells the model:

Direction: Should I increase or decrease this parameter? (sign of the derivative)
Magnitude: How sensitive is the loss to this parameter? (absolute value of the derivative)

Example: A Simple Linear Model

Suppose your model predicts house prices using a single weight w:

prediction = w × square_footage

If the true price is $400,000 and your model predicts $350,000, the squared error loss is:

L(w) = (prediction - true_price)²
     = (w × square_footage - 400000)²

The derivative dL/dw tells you: if you increase w by a tiny amount, does the loss go up or down? This is the information gradient descent uses to update w.

Instantaneous vs. Average Rate of Change

The average rate of change between two points is the slope of the line connecting them:

average rate = f(b) - f(a)
               ─────────────
                  b - a

The derivative is the instantaneous rate of change — what happens as b approaches a. It is the slope at a single point, not between two points.

In ML, we care about the instantaneous rate because we want to know the exact direction to move a parameter right now, at its current value.

Differentiability

A function is differentiable at a point if its derivative exists there. Most functions used in ML are differentiable almost everywhere — this is by design. Researchers choose smooth loss functions and activation functions specifically so that derivatives can be computed.

Notable exceptions:

The ReLU activation function (f(x) = max(0, x)) has a corner at x = 0 where the derivative is technically undefined. In practice, frameworks assign f'(0) = 0 and it works fine.
The absolute value function |x| also has a corner at 0.

Summary

The derivative measures how fast a function's output changes as its input changes
It is the slope of the tangent line at a specific point
Positive derivative means the function is increasing; negative means decreasing; zero means flat
For f(x) = x², the derivative is f'(x) = 2x
In ML, the derivative tells a model which direction to adjust a parameter to reduce loss
The derivative provides both direction (sign) and sensitivity (magnitude)
ML uses differentiable functions by design so that derivatives always exist

The next lesson covers the specific derivative rules you need for machine learning — no more, no less.

What Is a Derivative?

Rate of Change

Consider the function f(x) = x². When x = 3, f(3) = 9. What happens if we increase x slightly?

x	f(x) = x²	Change in x	Change in f(x)
3.0	9.00	—	—
3.1	9.61	0.1	0.61
3.01	9.0601	0.01	0.0601
3.001	9.006001	0.001	0.006001

As the change in x gets smaller, the ratio (change in f) / (change in x) approaches a specific value:

0.61 / 0.1   = 6.1
0.0601 / 0.01  = 6.01
0.006001 / 0.001 = 6.001

The ratio approaches 6. This is the derivative of f(x) = x² at x = 3.

The Derivative as a Slope

Geometrically, the derivative at a point is the slope of the tangent line at that point.

f(x)
 ^
 |         /
 |        / tangent line (slope = 6)
 |      ./
 |    .·
 |  .·     f(x) = x²
 |.·
 +-----------> x
        3

A positive derivative means the function is increasing (slopes upward)
A negative derivative means the function is decreasing (slopes downward)
A zero derivative means the function is flat at that point (a peak, valley, or plateau)

Formal Notation

The derivative of f(x) with respect to x is written as:

f'(x) = lim     f(x + h) - f(x)
        h → 0   ─────────────────
                       h

Other common notations:

Notation	Read As
f'(x)	"f prime of x"
df/dx	"the derivative of f with respect to x"
dy/dx	"the derivative of y with respect to x" (when y = f(x))

In machine learning papers, you will see df/dx most often. The notation makes it clear what is changing (x) and what we are measuring the change of (f).

Computing a Simple Derivative

Let us compute the derivative of f(x) = x² from scratch:

f'(x) = lim   (x + h)² - x²
        h→0   ─────────────
                    h

      = lim   x² + 2xh + h² - x²
        h→0   ─────────────────────
                      h

      = lim   2xh + h²
        h→0   ─────────
                 h

      = lim   2x + h
        h→0

      = 2x

So f'(x) = 2x. At x = 3, the derivative is 2(3) = 6, confirming our numerical experiment.

Why This Matters for ML

In machine learning, the function f represents the loss (how wrong the model is), and x represents a model parameter (a weight or bias). The derivative f'(x) = df/dx tells the model:

Direction: Should I increase or decrease this parameter? (sign of the derivative)
Magnitude: How sensitive is the loss to this parameter? (absolute value of the derivative)

Example: A Simple Linear Model

Suppose your model predicts house prices using a single weight w:

prediction = w × square_footage

If the true price is $400,000 and your model predicts $350,000, the squared error loss is:

L(w) = (prediction - true_price)²
     = (w × square_footage - 400000)²

The derivative dL/dw tells you: if you increase w by a tiny amount, does the loss go up or down? This is the information gradient descent uses to update w.

Instantaneous vs. Average Rate of Change

The average rate of change between two points is the slope of the line connecting them:

average rate = f(b) - f(a)
               ─────────────
                  b - a

The derivative is the instantaneous rate of change — what happens as b approaches a. It is the slope at a single point, not between two points.

In ML, we care about the instantaneous rate because we want to know the exact direction to move a parameter right now, at its current value.

Differentiability

Notable exceptions:

The ReLU activation function (f(x) = max(0, x)) has a corner at x = 0 where the derivative is technically undefined. In practice, frameworks assign f'(0) = 0 and it works fine.
The absolute value function |x| also has a corner at 0.

Summary

The derivative measures how fast a function's output changes as its input changes
It is the slope of the tangent line at a specific point
Positive derivative means the function is increasing; negative means decreasing; zero means flat
For f(x) = x², the derivative is f'(x) = 2x
In ML, the derivative tells a model which direction to adjust a parameter to reduce loss
The derivative provides both direction (sign) and sensitivity (magnitude)
ML uses differentiable functions by design so that derivatives always exist

The next lesson covers the specific derivative rules you need for machine learning — no more, no less.

What Is a Derivative?

Rate of Change

The Derivative as a Slope

Formal Notation

Computing a Simple Derivative

Why This Matters for ML

Example: A Simple Linear Model

Instantaneous vs. Average Rate of Change

Differentiability

Summary

Questions & Answers

What Is a Derivative?

Rate of Change

The Derivative as a Slope

Formal Notation

Computing a Simple Derivative

Why This Matters for ML

Example: A Simple Linear Model

Instantaneous vs. Average Rate of Change

Differentiability

Summary

Questions & Answers