NumPy Fundamentals with AI Help

NumPy is the library every other data science tool stands on. Pandas DataFrames are built on NumPy arrays. Scikit-learn models accept NumPy inputs. Even PyTorch and TensorFlow tensors borrow much of NumPy's syntax. If you understand NumPy, the rest of the ecosystem clicks.

This lesson gives you the working vocabulary you need to read and write NumPy in real data science code, with prompts you can use when you forget something.

What You'll Learn

What a NumPy array is and why it is faster than a Python list
The four NumPy operations you will use every day
Indexing, slicing, and boolean masks
AI prompts that help you remember NumPy syntax instantly

Why NumPy Is Faster Than Lists

A Python list can hold any mix of types — strings, numbers, other lists. That flexibility costs speed. NumPy arrays hold one type of number per array, packed tightly in memory, so operations run in optimized C code under the hood.

For a list of one million numbers, multiplying every element by two takes hundreds of milliseconds in pure Python. The same operation in NumPy takes a couple of milliseconds. That is the entire reason NumPy exists.

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a * 2)      # [ 2  4  6  8 10]  -- multiplies every element
print(a + 10)     # [11 12 13 14 15]  -- adds 10 to every element
print(a ** 2)     # [ 1  4  9 16 25]  -- squares every element

That a * 2 is called vectorization. You write a single operation, and NumPy applies it to every element. No loops needed.

Creating Arrays

The five most common ways to make an array:

np.array([1, 2, 3, 4])             # from a list
np.zeros(5)                        # [0. 0. 0. 0. 0.]
np.ones((2, 3))                    # 2x3 array of ones
np.arange(0, 10, 2)                # [0 2 4 6 8] -- like range()
np.linspace(0, 1, 5)               # [0.   0.25 0.5  0.75 1.  ]

np.arange is for integer steps. np.linspace is for "give me N evenly spaced values between A and B" — useful when plotting curves.

The Shape of an Array

Every array has a shape — its dimensions. A 1D array of five numbers has shape (5,). A 2D array (a matrix) has shape (rows, columns).

matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
])
print(matrix.shape)   # (2, 3) -- 2 rows, 3 columns
print(matrix.ndim)    # 2 -- number of dimensions
print(matrix.dtype)   # int64 -- the type of each element

Always check .shape when you load real data. About 30 percent of beginner ML errors are shape mismatches: you fed (100, 1) into something that wanted (100,), and the model silently misbehaved.

Indexing and Slicing

Indexing works like Python lists.

a = np.array([10, 20, 30, 40, 50])
a[0]      # 10
a[-1]     # 50
a[1:4]    # [20 30 40]

For 2D arrays, you index [row, column]:

matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
])
matrix[0, 1]      # 2
matrix[:, 0]      # [1 4]   -- all rows, column 0
matrix[1, :]      # [4 5 6] -- row 1, all columns

That : means "everything." It comes up constantly in pandas and ML code.

Boolean Masks (the killer feature)

This is the operation you will use most:

ages = np.array([22, 17, 35, 19, 41, 16])
mask = ages >= 18
print(mask)         # [ True False  True  True  True False]
print(ages[mask])   # [22 35 19 41]

ages >= 18 does not return one True or False — it returns an array of booleans, one per element. Then ages[mask] keeps only the elements where the mask is True. That is how filtering works in NumPy and in pandas.

Combine masks with & (and) and | (or):

adults_under_30 = ages[(ages >= 18) & (ages < 30)]

The parentheses around each condition are not optional — leaving them off causes a confusing operator-precedence error.

The Four Operations You Use Every Day

1. Aggregations: .sum(), .mean(), .max(), .min(), .std() — return one number.

prices = np.array([10.0, 12.5, 8.0, 15.5])
prices.mean()   # 11.5
prices.std()    # ~2.85

2. Element-wise math: addition, multiplication, division between arrays.

revenue = np.array([100, 200, 150])
costs   = np.array([60, 130, 90])
profit  = revenue - costs   # [40 70 60]

3. Reshape: turn a 1D array into a 2D array (or vice versa).

flat = np.arange(12)            # 0..11
grid = flat.reshape(3, 4)       # 3 rows, 4 columns

4. Random numbers: for ML, simulation, sampling.

np.random.seed(42)              # reproducibility
np.random.rand(5)               # 5 uniform samples in [0, 1)
np.random.randn(5)              # 5 normal samples (mean 0, std 1)
np.random.randint(0, 10, 5)     # 5 random ints in [0, 10)

The seed(42) line is important. Without it, you get different random numbers every run, which makes bugs nearly impossible to reproduce.

Asking AI for NumPy Help

NumPy has thousands of functions, and you will not memorize them all. Use this prompt instead:

I am a beginner using NumPy. I have an array arr with shape (100, 5) of floats. I want to: [describe what you want]. Write the NumPy code, and explain what each function does. Add a tiny test with a smaller array so I can verify it works.

For example: "I want to find the row with the highest value in column 2." The AI will write arr[arr[:, 2].argmax()] and explain that argmax() returns the index of the maximum, and that arr[index] then pulls that row.

A Practical Drill

Run this in Colab and predict each result:

import numpy as np
np.random.seed(0)

scores = np.random.randint(50, 100, size=(20, 3))
print("Shape:", scores.shape)
print("Average per student:", scores.mean(axis=1))
print("Average per exam:", scores.mean(axis=0))
print("Top student score:", scores.mean(axis=1).max())

passed = scores.mean(axis=1) >= 75
print("How many passed?", passed.sum())
print("Top student row:", scores[scores.mean(axis=1).argmax()])

The axis= parameter is where most beginners trip up. axis=0 collapses rows (gives one number per column). axis=1 collapses columns (gives one number per row). Drill it until it sticks.

Key Takeaways

NumPy arrays are typed and packed in memory, so they are far faster than Python lists
.shape, .dtype, and .ndim tell you everything about an array — check them often
Boolean masks are the foundation of filtering in both NumPy and pandas
The axis=0 (collapse rows) vs axis=1 (collapse columns) distinction is the most common confusion — practice it
Use AI to write NumPy code on demand, and always include a tiny test you can verify by hand

NumPy Fundamentals with AI Help

This lesson gives you the working vocabulary you need to read and write NumPy in real data science code, with prompts you can use when you forget something.

What You'll Learn

What a NumPy array is and why it is faster than a Python list
The four NumPy operations you will use every day
Indexing, slicing, and boolean masks
AI prompts that help you remember NumPy syntax instantly

Why NumPy Is Faster Than Lists

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a * 2)      # [ 2  4  6  8 10]  -- multiplies every element
print(a + 10)     # [11 12 13 14 15]  -- adds 10 to every element
print(a ** 2)     # [ 1  4  9 16 25]  -- squares every element

That a * 2 is called vectorization. You write a single operation, and NumPy applies it to every element. No loops needed.

Creating Arrays

The five most common ways to make an array:

np.array([1, 2, 3, 4])             # from a list
np.zeros(5)                        # [0. 0. 0. 0. 0.]
np.ones((2, 3))                    # 2x3 array of ones
np.arange(0, 10, 2)                # [0 2 4 6 8] -- like range()
np.linspace(0, 1, 5)               # [0.   0.25 0.5  0.75 1.  ]

np.arange is for integer steps. np.linspace is for "give me N evenly spaced values between A and B" — useful when plotting curves.

The Shape of an Array

Every array has a shape — its dimensions. A 1D array of five numbers has shape (5,). A 2D array (a matrix) has shape (rows, columns).

matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
])
print(matrix.shape)   # (2, 3) -- 2 rows, 3 columns
print(matrix.ndim)    # 2 -- number of dimensions
print(matrix.dtype)   # int64 -- the type of each element

Always check .shape when you load real data. About 30 percent of beginner ML errors are shape mismatches: you fed (100, 1) into something that wanted (100,), and the model silently misbehaved.

Indexing and Slicing

Indexing works like Python lists.

a = np.array([10, 20, 30, 40, 50])
a[0]      # 10
a[-1]     # 50
a[1:4]    # [20 30 40]

For 2D arrays, you index [row, column]:

matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
])
matrix[0, 1]      # 2
matrix[:, 0]      # [1 4]   -- all rows, column 0
matrix[1, :]      # [4 5 6] -- row 1, all columns

That : means "everything." It comes up constantly in pandas and ML code.

Boolean Masks (the killer feature)

This is the operation you will use most:

ages = np.array([22, 17, 35, 19, 41, 16])
mask = ages >= 18
print(mask)         # [ True False  True  True  True False]
print(ages[mask])   # [22 35 19 41]

Combine masks with & (and) and | (or):

adults_under_30 = ages[(ages >= 18) & (ages < 30)]

The parentheses around each condition are not optional — leaving them off causes a confusing operator-precedence error.

The Four Operations You Use Every Day

1. Aggregations: .sum(), .mean(), .max(), .min(), .std() — return one number.

prices = np.array([10.0, 12.5, 8.0, 15.5])
prices.mean()   # 11.5
prices.std()    # ~2.85

2. Element-wise math: addition, multiplication, division between arrays.

revenue = np.array([100, 200, 150])
costs   = np.array([60, 130, 90])
profit  = revenue - costs   # [40 70 60]

3. Reshape: turn a 1D array into a 2D array (or vice versa).

flat = np.arange(12)            # 0..11
grid = flat.reshape(3, 4)       # 3 rows, 4 columns

4. Random numbers: for ML, simulation, sampling.

np.random.seed(42)              # reproducibility
np.random.rand(5)               # 5 uniform samples in [0, 1)
np.random.randn(5)              # 5 normal samples (mean 0, std 1)
np.random.randint(0, 10, 5)     # 5 random ints in [0, 10)

The seed(42) line is important. Without it, you get different random numbers every run, which makes bugs nearly impossible to reproduce.

Asking AI for NumPy Help

NumPy has thousands of functions, and you will not memorize them all. Use this prompt instead:

I am a beginner using NumPy. I have an array arr with shape (100, 5) of floats. I want to: [describe what you want]. Write the NumPy code, and explain what each function does. Add a tiny test with a smaller array so I can verify it works.

A Practical Drill

Run this in Colab and predict each result:

import numpy as np
np.random.seed(0)

scores = np.random.randint(50, 100, size=(20, 3))
print("Shape:", scores.shape)
print("Average per student:", scores.mean(axis=1))
print("Average per exam:", scores.mean(axis=0))
print("Top student score:", scores.mean(axis=1).max())

passed = scores.mean(axis=1) >= 75
print("How many passed?", passed.sum())
print("Top student row:", scores[scores.mean(axis=1).argmax()])

The axis= parameter is where most beginners trip up. axis=0 collapses rows (gives one number per column). axis=1 collapses columns (gives one number per row). Drill it until it sticks.

Key Takeaways

NumPy arrays are typed and packed in memory, so they are far faster than Python lists
.shape, .dtype, and .ndim tell you everything about an array — check them often
Boolean masks are the foundation of filtering in both NumPy and pandas
The axis=0 (collapse rows) vs axis=1 (collapse columns) distinction is the most common confusion — practice it
Use AI to write NumPy code on demand, and always include a tiny test you can verify by hand

NumPy Fundamentals with AI Help

What You'll Learn

Why NumPy Is Faster Than Lists

Creating Arrays

The Shape of an Array

Indexing and Slicing

Boolean Masks (the killer feature)

The Four Operations You Use Every Day

Asking AI for NumPy Help

A Practical Drill

Key Takeaways

Quiz

Questions & Answers

NumPy Fundamentals with AI Help

What You'll Learn

Why NumPy Is Faster Than Lists

Creating Arrays

The Shape of an Array

Indexing and Slicing

Boolean Masks (the killer feature)

The Four Operations You Use Every Day

Asking AI for NumPy Help

A Practical Drill

Key Takeaways

Quiz

Questions & Answers