7 NumPy Tricks to Vectorize Your Code

7 NumPy Tricks to Vectorize Your Code
Image by Author

Introduction

You’ve written Python that processes data in a loop. It’s clear, it’s correct, and it’s unusably slow on real-world data sizes. The problem isn’t your algorithm; it’s that for loops in Python execute at interpreter speed, which means every iteration pays the overhead cost of Python’s dynamic type checking and memory management.

NumPy helps solve this bottleneck. It wraps highly optimized C and Fortran libraries that can process entire arrays in single operations, bypassing Python’s overhead completely. But you need to write your code differently — and express it as vectorized operations — to access that speed. The shift requires a different way of thinking. Instead of “loop through and check each value,” you think “select elements matching a condition.” Instead of nested iteration, you think in array dimensions and broadcasting.

This article walks through 7 vectorization techniques that eliminate loops from numerical code. Each one addresses a specific pattern where developers typically reach for iteration, showing you how to reformulate the problem in array operations instead. The result is code that runs much (much) faster and often reads more clearly than the loop-based version.

🔗 Link to the code on GitHub

1. Boolean Indexing Instead of Conditional Loops

You need to filter or modify array elements based on conditions. The instinct is to loop through and check each one.

import numpy as np # Slow: Loop-based filtering data = np.random.randn(1000000) result = [] for x in data: if x > 0: result.append(x * 2) else: result.append(x) result = np.array(result)

import numpy as np

# Slow: Loop-based filtering

data = np.random.randn(1000000)

result = []

for x in data:

if x > 0:

result.append(x * 2)

else:

result.append(x)

result = np.array(result)

Here’s the vectorized approach:

# Fast: Boolean indexing data = np.random.randn(1000000) result = data.copy() result[data > 0] *= 2

# Fast: Boolean indexing

data = np.random.randn(1000000)

result = data.copy()

result[data > 0] *= 2

Here, data > 0 creates a boolean array — True where the condition holds, False elsewhere. Using this as an index selects only those elements.

2. Broadcasting for Implicit Loops

Sometimes you want to combine arrays of different shapes, maybe adding a row vector to every row of a matrix. The loop-based approach requires explicit iteration.

# Slow: Explicit loops matrix = np.random.rand(1000, 500) row_means = np.mean(matrix, axis=1) centered = np.zeros_like(matrix) for i in range(matrix.shape[0]): centered[i] = matrix[i] – row_means[i]

# Slow: Explicit loops

matrix = np.random.rand(1000, 500)

row_means = np.mean(matrix, axis=1)

centered = np.zeros_like(matrix)

for i in range(matrix.shape[0]):

centered[i] = matrix[i] – row_means[i]

Here’s the vectorized approach:

# Fast: Broadcasting matrix = np.random.rand(1000, 500) row_means = np.mean(matrix, axis=1, keepdims=True) centered = matrix – row_means

# Fast: Broadcasting

matrix = np.random.rand(1000, 500)

row_means = np.mean(matrix, axis=1, keepdims=True)

centered = matrix – row_means

In this code, setting keepdims=True keeps row_means as shape (1000, 1), not (1000,). When you subtract, NumPy automatically stretches this column vector across all columns of the matrix. The shapes don’t match, but NumPy makes them compatible by repeating values along singleton dimensions.

🔖 Note: Broadcasting works when dimensions are compatible: either equal, or one of them is 1. The smaller array gets virtually repeated to match the larger one’s shape, no memory copying needed.

3. np.where() for Vectorized If-Else

When you need different calculations for different elements based on conditions, you’ll need to write branching logic inside loops.

# Slow: Conditional logic in loops temps = np.random.uniform(-10, 40, 100000) classifications = [] for t in temps: if t < 0: classifications.append(‘freezing’) elif t < 20: classifications.append(‘cool’) else: classifications.append(‘warm’)

# Slow: Conditional logic in loops

temps = np.random.uniform(–10, 40, 100000)

classifications = []

for t in temps:

if t < 0:

classifications.append(‘freezing’)

elif t < 20:

classifications.append(‘cool’)

else:

classifications.append(‘warm’)

Here’s the vectorized approach:

# Fast: np.where() and np.select() temps = np.random.uniform(-10, 40, 100000) classifications = np.select( [temps < 0, temps < 20, temps >= 20], [‘freezing’, ‘cool’, ‘warm’], default=”unknown” # Added a string default value ) # For simple splits, np.where() is cleaner: scores = np.random.randint(0, 100, 10000) results = np.where(scores >= 60, ‘pass’, ‘fail’)

# Fast: np.where() and np.select()

temps = np.random.uniform(–10, 40, 100000)

classifications = np.select(

[temps < 0, temps < 20, temps >= 20],

[‘freezing’, ‘cool’, ‘warm’],

default=‘unknown’ # Added a string default value

)

# For simple splits, np.where() is cleaner:

scores = np.random.randint(0, 100, 10000)

results = np.where(scores >= 60, ‘pass’, ‘fail’)

np.where(condition, x, y) returns elements from x where condition is True, from y elsewhere. np.select() extends this to multiple conditions. It checks each condition in order and returns the corresponding value from the second list.

🔖 Note: The conditions in np.select() should be mutually exclusive. If multiple conditions are True for an element, the first match wins.

4. Better Indexing for Lookup Operations

Suppose you have indices and need to gather elements from multiple positions. You’ll often reach for dictionary lookups in loops, or worse, nested searches.

# Slow: Loop-based gathering lookup_table = np.array([10, 20, 30, 40, 50]) indices = np.random.randint(0, 5, 100000) results = [] for idx in indices: results.append(lookup_table[idx]) results = np.array(results)

# Slow: Loop-based gathering

lookup_table = np.array([10, 20, 30, 40, 50])

indices = np.random.randint(0, 5, 100000)

results = []

for idx in indices:

results.append(lookup_table[idx])

results = np.array(results)

Here’s the vectorized approach:

lookup_table = np.array([10, 20, 30, 40, 50]) indices = np.random.randint(0, 5, 100000) results = lookup_table[indices]

lookup_table = np.array([10, 20, 30, 40, 50])

indices = np.random.randint(0, 5, 100000)

results = lookup_table[indices]

When you index an array with another array of integers, NumPy pulls out elements at those positions. This works in multiple dimensions too:

matrix = np.arange(20).reshape(4, 5) row_indices = np.array([0, 2, 3]) col_indices = np.array([1, 3, 4]) values = matrix[row_indices, col_indices] # Gets matrix[0,1], matrix[2,3], matrix[3,4]

matrix = np.arange(20).reshape(4, 5)

row_indices = np.array([0, 2, 3])

col_indices = np.array([1, 3, 4])

values = matrix[row_indices, col_indices] # Gets matrix[0,1], matrix[2,3], matrix[3,4]

🔖 Note: This is especially useful when implementing categorical encodings, building histograms, or any operation where you’re mapping indices to values.

5. np.vectorize() for Custom Functions

You have a function that works on scalars, but you need to apply it to arrays. Writing loops everywhere clutters your code.

# Slow: Manual looping def complex_transform(x): if x < 0: return np.sqrt(abs(x)) * -1 else: return x ** 2 data = np.random.randn(10000) results = np.array([complex_transform(x) for x in data])

# Slow: Manual looping

def complex_transform(x):

if x < 0:

return np.sqrt(abs(x)) * –1

else:

return x ** 2

data = np.random.randn(10000)

results = np.array([complex_transform(x) for x in data])

Here’s the vectorized approach:

# Cleaner: np.vectorize() def complex_transform(x): if x < 0: return np.sqrt(abs(x)) * -1 else: return x ** 2 vec_transform = np.vectorize(complex_transform) data = np.random.randn(10000) results = vec_transform(data)

# Cleaner: np.vectorize()

def complex_transform(x):

if x < 0:

return np.sqrt(abs(x)) * –1

else:

return x ** 2

vec_transform = np.vectorize(complex_transform)

data = np.random.randn(10000)

results = vec_transform(data)

Here, np.vectorize() wraps your function so it can handle arrays. It automatically applies the function element-wise and handles the output array creation.

🔖 Note: This doesn’t magically make your function faster. Under the hood, it’s still looping in Python. The advantage here is code clarity, not speed. For real performance gains, rewrite the function using NumPy operations directly:

# Actually fast data = np.random.randn(10000) results = np.where(data < 0, -np.sqrt(np.abs(data)), data ** 2)

# Actually fast

data = np.random.randn(10000)

results = np.where(data < 0, –np.sqrt(np.abs(data)), data ** 2)

6. np.einsum() for Complex Array Operations

Matrix multiplications, transposes, traces, and tensor contractions pile up into unreadable chains of operations.

# Matrix multiplication the standard way A = np.random.rand(100, 50) B = np.random.rand(50, 80) C = np.dot(A, B) # Batch matrix multiply – gets messy batch_A = np.random.rand(32, 10, 20) batch_B = np.random.rand(32, 20, 15) results = np.zeros((32, 10, 15)) for i in range(32): results[i] = np.dot(batch_A[i], batch_B[i])

# Matrix multiplication the standard way

A = np.random.rand(100, 50)

B = np.random.rand(50, 80)

C = np.dot(A, B)

# Batch matrix multiply – gets messy

batch_A = np.random.rand(32, 10, 20)

batch_B = np.random.rand(32, 20, 15)

results = np.zeros((32, 10, 15))

for i in range(32):

results[i] = np.dot(batch_A[i], batch_B[i])

Here’s the vectorized approach:

# Clean: einsum A = np.random.rand(100, 50) B = np.random.rand(50, 80) C = np.einsum(‘ij,jk->ik’, A, B) # Batch matrix multiply – single line batch_A = np.random.rand(32, 10, 20) batch_B = np.random.rand(32, 20, 15) results = np.einsum(‘bij,bjk->bik’, batch_A, batch_B)

# Clean: einsum

A = np.random.rand(100, 50)

B = np.random.rand(50, 80)

C = np.einsum(‘ij,jk->ik’, A, B)

# Batch matrix multiply – single line

batch_A = np.random.rand(32, 10, 20)

batch_B = np.random.rand(32, 20, 15)

results = np.einsum(‘bij,bjk->bik’, batch_A, batch_B)

In this example, einsum() uses Einstein summation notation. The string 'ij,jk->ik' says: “take indices i,j from the first array, j,k from the second, sum over shared index j, output has indices i,k.”

Let’s take a few more examples:

# Trace (sum of diagonal) matrix = np.random.rand(100, 100) trace = np.einsum(‘ii->’, matrix) # Transpose transposed = np.einsum(‘ij->ji’, matrix) # Element-wise multiply then sum A = np.random.rand(50, 50) B = np.random.rand(50, 50) result = np.einsum(‘ij,ij->’, A, B) # Same as np.sum(A * B)

# Trace (sum of diagonal)

matrix = np.random.rand(100, 100)

trace = np.einsum(‘ii->’, matrix)

# Transpose

transposed = np.einsum(‘ij->ji’, matrix)

# Element-wise multiply then sum

A = np.random.rand(50, 50)

B = np.random.rand(50, 50)

result = np.einsum(‘ij,ij->’, A, B) # Same as np.sum(A * B)

Using this approach takes time to internalize, but pays off for complex tensor operations.

7. np.apply_along_axis() for Row/Column Operations

When you need to apply a function to each row or column of a matrix, looping through slices works but feels clunky.

# Slow: Manual row iteration data = np.random.rand(1000, 50) row_stats = [] for i in range(data.shape[0]): row = data[i] # Custom statistic not in NumPy stat = (np.max(row) – np.min(row)) / np.median(row) row_stats.append(stat) row_stats = np.array(row_stats)

# Slow: Manual row iteration

data = np.random.rand(1000, 50)

row_stats = []

for i in range(data.shape[0]):

row = data[i]

# Custom statistic not in NumPy

stat = (np.max(row) – np.min(row)) / np.median(row)

row_stats.append(stat)

row_stats = np.array(row_stats)

And here’s the vectorized approach:

# Cleaner: apply_along_axis data = np.random.rand(1000, 50) def custom_stat(row): return (np.max(row) – np.min(row)) / np.median(row) row_stats = np.apply_along_axis(custom_stat, axis=1, arr=data)

# Cleaner: apply_along_axis

data = np.random.rand(1000, 50)

def custom_stat(row):

return (np.max(row) – np.min(row)) / np.median(row)

row_stats = np.apply_along_axis(custom_stat, axis=1, arr=data)

In the above code snippet, axis=1 means “apply the function to each row” (axis 1 indexes columns, and applying along that axis processes row-wise slices). The function receives 1D arrays and returns scalars or arrays, which get stacked into the result.

Column-wise operations: Use axis=0 to apply functions down columns instead:

# Apply to each column col_stats = np.apply_along_axis(custom_stat, axis=0, arr=data)

# Apply to each column

col_stats = np.apply_along_axis(custom_stat, axis=0, arr=data)

🔖Note: Like np.vectorize(), this is primarily for code clarity. If your function can be written in pure NumPy operations, do that instead. But for genuinely complex per-row/column logic, apply_along_axis() is much more efficient than manual loops.

Wrapping Up

Every technique in this article follows the same shift in thinking: describe what transformation you want applied to your data, not how to iterate through it.

I suggest going through the examples in this article, adding timing to see how substantial the performance gains of using vectorized approaches are as compared to the alternative.

This isn’t just about speed. Vectorized code typically ends up shorter and more readable than its loop-based equivalent. The loop version, on the other hand, requires readers to mentally execute the iteration to understand what’s happening. So yeah, happy coding!

Source link

7 NumPy Tricks to Vectorize Your Code

Client Challenge

30 Days That Changed My Business: The Net-30 Lesson

30 Days That Changed My Business: The Net-30 Lesson

Leave a Reply Cancel reply

POPULAR POSTS

20 Best Resource Management Software of 2025 (Free & Paid)

How to Make a Stakeholder Map

10 Ways To Get a Free DoorDash Gift Card

The Role of Natural Language Processing in Financial News Analysis

They Combed the Co-ops of Upper Manhattan With $700,000 to Spend

Categories

Connect With Us

Recent Posts

Client Challenge

Bitcoin continues slide that’s roiling markets, threatens to break below $80,000