NumPy Fundamentals with AI Help
NumPy is the library every other data science tool stands on. Pandas DataFrames are built on NumPy arrays. Scikit-learn models accept NumPy inputs. Even PyTorch and TensorFlow tensors borrow much of NumPy's syntax. If you understand NumPy, the rest of the ecosystem clicks.
This lesson gives you the working vocabulary you need to read and write NumPy in real data science code, with prompts you can use when you forget something.
What You'll Learn
- What a NumPy array is and why it is faster than a Python list
- The four NumPy operations you will use every day
- Indexing, slicing, and boolean masks
- AI prompts that help you remember NumPy syntax instantly
Why NumPy Is Faster Than Lists
A Python list can hold any mix of types — strings, numbers, other lists. That flexibility costs speed. NumPy arrays hold one type of number per array, packed tightly in memory, so operations run in optimized C code under the hood.
For a list of one million numbers, multiplying every element by two takes hundreds of milliseconds in pure Python. The same operation in NumPy takes a couple of milliseconds. That is the entire reason NumPy exists.
import numpy as np
a = np.array([1, 2, 3, 4, 5])
print(a * 2) # [ 2 4 6 8 10] -- multiplies every element
print(a + 10) # [11 12 13 14 15] -- adds 10 to every element
print(a ** 2) # [ 1 4 9 16 25] -- squares every element
That a * 2 is called vectorization. You write a single operation, and NumPy applies it to every element. No loops needed.
Creating Arrays
The five most common ways to make an array:
np.array([1, 2, 3, 4]) # from a list
np.zeros(5) # [0. 0. 0. 0. 0.]
np.ones((2, 3)) # 2x3 array of ones
np.arange(0, 10, 2) # [0 2 4 6 8] -- like range()
np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
np.arange is for integer steps. np.linspace is for "give me N evenly spaced values between A and B" — useful when plotting curves.
The Shape of an Array
Every array has a shape — its dimensions. A 1D array of five numbers has shape (5,). A 2D array (a matrix) has shape (rows, columns).
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
])
print(matrix.shape) # (2, 3) -- 2 rows, 3 columns
print(matrix.ndim) # 2 -- number of dimensions
print(matrix.dtype) # int64 -- the type of each element
Always check .shape when you load real data. About 30 percent of beginner ML errors are shape mismatches: you fed (100, 1) into something that wanted (100,), and the model silently misbehaved.
Indexing and Slicing
Indexing works like Python lists.
a = np.array([10, 20, 30, 40, 50])
a[0] # 10
a[-1] # 50
a[1:4] # [20 30 40]
For 2D arrays, you index [row, column]:
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
])
matrix[0, 1] # 2
matrix[:, 0] # [1 4] -- all rows, column 0
matrix[1, :] # [4 5 6] -- row 1, all columns
That : means "everything." It comes up constantly in pandas and ML code.
Boolean Masks (the killer feature)
This is the operation you will use most:
ages = np.array([22, 17, 35, 19, 41, 16])
mask = ages >= 18
print(mask) # [ True False True True True False]
print(ages[mask]) # [22 35 19 41]
ages >= 18 does not return one True or False — it returns an array of booleans, one per element. Then ages[mask] keeps only the elements where the mask is True. That is how filtering works in NumPy and in pandas.
Combine masks with & (and) and | (or):
adults_under_30 = ages[(ages >= 18) & (ages < 30)]
The parentheses around each condition are not optional — leaving them off causes a confusing operator-precedence error.
The Four Operations You Use Every Day
1. Aggregations: .sum(), .mean(), .max(), .min(), .std() — return one number.
prices = np.array([10.0, 12.5, 8.0, 15.5])
prices.mean() # 11.5
prices.std() # ~2.85
2. Element-wise math: addition, multiplication, division between arrays.
revenue = np.array([100, 200, 150])
costs = np.array([60, 130, 90])
profit = revenue - costs # [40 70 60]
3. Reshape: turn a 1D array into a 2D array (or vice versa).
flat = np.arange(12) # 0..11
grid = flat.reshape(3, 4) # 3 rows, 4 columns
4. Random numbers: for ML, simulation, sampling.
np.random.seed(42) # reproducibility
np.random.rand(5) # 5 uniform samples in [0, 1)
np.random.randn(5) # 5 normal samples (mean 0, std 1)
np.random.randint(0, 10, 5) # 5 random ints in [0, 10)
The seed(42) line is important. Without it, you get different random numbers every run, which makes bugs nearly impossible to reproduce.
Asking AI for NumPy Help
NumPy has thousands of functions, and you will not memorize them all. Use this prompt instead:
I am a beginner using NumPy. I have an array
arrwith shape(100, 5)of floats. I want to:[describe what you want]. Write the NumPy code, and explain what each function does. Add a tiny test with a smaller array so I can verify it works.
For example: "I want to find the row with the highest value in column 2." The AI will write arr[arr[:, 2].argmax()] and explain that argmax() returns the index of the maximum, and that arr[index] then pulls that row.
A Practical Drill
Run this in Colab and predict each result:
import numpy as np
np.random.seed(0)
scores = np.random.randint(50, 100, size=(20, 3))
print("Shape:", scores.shape)
print("Average per student:", scores.mean(axis=1))
print("Average per exam:", scores.mean(axis=0))
print("Top student score:", scores.mean(axis=1).max())
passed = scores.mean(axis=1) >= 75
print("How many passed?", passed.sum())
print("Top student row:", scores[scores.mean(axis=1).argmax()])
The axis= parameter is where most beginners trip up. axis=0 collapses rows (gives one number per column). axis=1 collapses columns (gives one number per row). Drill it until it sticks.
Key Takeaways
- NumPy arrays are typed and packed in memory, so they are far faster than Python lists
.shape,.dtype, and.ndimtell you everything about an array — check them often- Boolean masks are the foundation of filtering in both NumPy and pandas
- The
axis=0(collapse rows) vsaxis=1(collapse columns) distinction is the most common confusion — practice it - Use AI to write NumPy code on demand, and always include a tiny test you can verify by hand

