7 NumPy Tricks to Vectorize Your Code
Image by Author
Introduction
You’ve written Python that processes data in a loop. It’s clear, it’s correct, and it’s unusably slow on real-world data sizes. The problem isn’t your algorithm; it’s that for loops in Python execute at interpreter speed, which means every iteration pays the overhead cost of Python’s dynamic type checking and memory management.
NumPy helps solve this bottleneck. It wraps highly optimized C and Fortran libraries that can process entire arrays in single operations, bypassing Python’s overhead completely. But you need to write your code differently — and express it as vectorized operations — to access that speed. The shift requires a different way of thinking. Instead of “loop through and check each value,” you think “select elements matching a condition.” Instead of nested iteration, you think in array dimensions and broadcasting.
This article walks through 7 vectorization techniques that eliminate loops from numerical code. Each one addresses a specific pattern where developers typically reach for iteration, showing you how to reformulate the problem in array operations instead. The result is code that runs much (much) faster and often reads more clearly than the loop-based version.
1. Boolean Indexing Instead of Conditional Loops
You need to filter or modify array elements based on conditions. The instinct is to loop through and check each one.
|
import numpy as np
# Slow: Loop-based filtering data = np.random.randn(1000000) result = [] for x in data: if x > 0: result.append(x * 2) else: result.append(x) result = np.array(result) |
Here’s the vectorized approach:
|
# Fast: Boolean indexing data = np.random.randn(1000000) result = data.copy() result[data > 0] *= 2 |
Here, data > 0 creates a boolean array — True where the condition holds, False elsewhere. Using this as an index selects only those elements.
2. Broadcasting for Implicit Loops
Sometimes you want to combine arrays of different shapes, maybe adding a row vector to every row of a matrix. The loop-based approach requires explicit iteration.
|
# Slow: Explicit loops matrix = np.random.rand(1000, 500) row_means = np.mean(matrix, axis=1) centered = np.zeros_like(matrix) for i in range(matrix.shape[0]): centered[i] = matrix[i] – row_means[i] |
Here’s the vectorized approach:
|
# Fast: Broadcasting matrix = np.random.rand(1000, 500) row_means = np.mean(matrix, axis=1, keepdims=True) centered = matrix – row_means |
In this code, setting keepdims=True keeps row_means as shape (1000, 1), not (1000,). When you subtract, NumPy automatically stretches this column vector across all columns of the matrix. The shapes don’t match, but NumPy makes them compatible by repeating values along singleton dimensions.
🔖 Note: Broadcasting works when dimensions are compatible: either equal, or one of them is 1. The smaller array gets virtually repeated to match the larger one’s shape, no memory copying needed.
3. np.where() for Vectorized If-Else
When you need different calculations for different elements based on conditions, you’ll need to write branching logic inside loops.
|
# Slow: Conditional logic in loops temps = np.random.uniform(–10, 40, 100000) classifications = [] for t in temps: if t < 0: classifications.append(‘freezing’) elif t < 20: classifications.append(‘cool’) else: classifications.append(‘warm’) |
Here’s the vectorized approach:
|
# Fast: np.where() and np.select() temps = np.random.uniform(–10, 40, 100000) classifications = np.select( [temps < 0, temps < 20, temps >= 20], [‘freezing’, ‘cool’, ‘warm’], default=‘unknown’ # Added a string default value )
# For simple splits, np.where() is cleaner: scores = np.random.randint(0, 100, 10000) results = np.where(scores >= 60, ‘pass’, ‘fail’) |
np.where(condition, x, y) returns elements from x where condition is True, from y elsewhere. np.select() extends this to multiple conditions. It checks each condition in order and returns the corresponding value from the second list.
🔖 Note: The conditions in np.select() should be mutually exclusive. If multiple conditions are True for an element, the first match wins.
4. Better Indexing for Lookup Operations
Suppose you have indices and need to gather elements from multiple positions. You’ll often reach for dictionary lookups in loops, or worse, nested searches.
|
# Slow: Loop-based gathering lookup_table = np.array([10, 20, 30, 40, 50]) indices = np.random.randint(0, 5, 100000) results = [] for idx in indices: results.append(lookup_table[idx]) results = np.array(results) |
Here’s the vectorized approach:
|
lookup_table = np.array([10, 20, 30, 40, 50]) indices = np.random.randint(0, 5, 100000) results = lookup_table[indices] |
When you index an array with another array of integers, NumPy pulls out elements at those positions. This works in multiple dimensions too:
|
matrix = np.arange(20).reshape(4, 5) row_indices = np.array([0, 2, 3]) col_indices = np.array([1, 3, 4]) values = matrix[row_indices, col_indices] # Gets matrix[0,1], matrix[2,3], matrix[3,4] |
🔖 Note: This is especially useful when implementing categorical encodings, building histograms, or any operation where you’re mapping indices to values.
5. np.vectorize() for Custom Functions
You have a function that works on scalars, but you need to apply it to arrays. Writing loops everywhere clutters your code.
|
# Slow: Manual looping def complex_transform(x): if x < 0: return np.sqrt(abs(x)) * –1 else: return x ** 2
data = np.random.randn(10000) results = np.array([complex_transform(x) for x in data]) |
Here’s the vectorized approach:
|
# Cleaner: np.vectorize() def complex_transform(x): if x < 0: return np.sqrt(abs(x)) * –1 else: return x ** 2
vec_transform = np.vectorize(complex_transform) data = np.random.randn(10000) results = vec_transform(data) |
Here, np.vectorize() wraps your function so it can handle arrays. It automatically applies the function element-wise and handles the output array creation.
🔖 Note: This doesn’t magically make your function faster. Under the hood, it’s still looping in Python. The advantage here is code clarity, not speed. For real performance gains, rewrite the function using NumPy operations directly:
|
# Actually fast data = np.random.randn(10000) results = np.where(data < 0, –np.sqrt(np.abs(data)), data ** 2) |
6. np.einsum() for Complex Array Operations
Matrix multiplications, transposes, traces, and tensor contractions pile up into unreadable chains of operations.
|
# Matrix multiplication the standard way A = np.random.rand(100, 50) B = np.random.rand(50, 80) C = np.dot(A, B)
# Batch matrix multiply – gets messy batch_A = np.random.rand(32, 10, 20) batch_B = np.random.rand(32, 20, 15) results = np.zeros((32, 10, 15))
for i in range(32): results[i] = np.dot(batch_A[i], batch_B[i]) |
Here’s the vectorized approach:
|
# Clean: einsum A = np.random.rand(100, 50) B = np.random.rand(50, 80) C = np.einsum(‘ij,jk->ik’, A, B)
# Batch matrix multiply – single line batch_A = np.random.rand(32, 10, 20) batch_B = np.random.rand(32, 20, 15) results = np.einsum(‘bij,bjk->bik’, batch_A, batch_B) |
In this example, einsum() uses Einstein summation notation. The string 'ij,jk->ik' says: “take indices i,j from the first array, j,k from the second, sum over shared index j, output has indices i,k.”
Let’s take a few more examples:
|
# Trace (sum of diagonal) matrix = np.random.rand(100, 100) trace = np.einsum(‘ii->’, matrix)
# Transpose transposed = np.einsum(‘ij->ji’, matrix)
# Element-wise multiply then sum A = np.random.rand(50, 50) B = np.random.rand(50, 50) result = np.einsum(‘ij,ij->’, A, B) # Same as np.sum(A * B) |
Using this approach takes time to internalize, but pays off for complex tensor operations.
7. np.apply_along_axis() for Row/Column Operations
When you need to apply a function to each row or column of a matrix, looping through slices works but feels clunky.
|
# Slow: Manual row iteration data = np.random.rand(1000, 50) row_stats = [] for i in range(data.shape[0]): row = data[i] # Custom statistic not in NumPy stat = (np.max(row) – np.min(row)) / np.median(row) row_stats.append(stat) row_stats = np.array(row_stats) |
And here’s the vectorized approach:
|
# Cleaner: apply_along_axis data = np.random.rand(1000, 50)
def custom_stat(row): return (np.max(row) – np.min(row)) / np.median(row)
row_stats = np.apply_along_axis(custom_stat, axis=1, arr=data) |
In the above code snippet, axis=1 means “apply the function to each row” (axis 1 indexes columns, and applying along that axis processes row-wise slices). The function receives 1D arrays and returns scalars or arrays, which get stacked into the result.
Column-wise operations: Use axis=0 to apply functions down columns instead:
|
# Apply to each column col_stats = np.apply_along_axis(custom_stat, axis=0, arr=data) |
🔖Note: Like np.vectorize(), this is primarily for code clarity. If your function can be written in pure NumPy operations, do that instead. But for genuinely complex per-row/column logic, apply_along_axis() is much more efficient than manual loops.
Wrapping Up
Every technique in this article follows the same shift in thinking: describe what transformation you want applied to your data, not how to iterate through it.
I suggest going through the examples in this article, adding timing to see how substantial the performance gains of using vectorized approaches are as compared to the alternative.
This isn’t just about speed. Vectorized code typically ends up shorter and more readable than its loop-based equivalent. The loop version, on the other hand, requires readers to mentally execute the iteration to understand what’s happening. So yeah, happy coding!
