Lesson 4 - Vector Operations and Calculations
Performing Calculations at Scale
You can now create arrays and select data from them. This lesson teaches you how to perform calculations on arrays efficiently using vectorized operations—the true power of NumPy.
By the end of this lesson, you will be able to:
- Perform arithmetic operations on entire arrays without loops
- Understand and apply broadcasting to simplify calculations
- Calculate statistical summaries (mean, min, max, sum, standard deviation)
- Use the axis parameter to calculate statistics for rows or columns
- Find the positions of minimum and maximum values
- Apply these techniques to real data analysis problems
These operations are the foundation of data analytics. Let’s begin.
Vector Arithmetic Operations
Element-Wise Operations
NumPy performs arithmetic operations element by element automatically. No loops required.
import numpy as np
# Create two arrays
a = np.array([10, 20, 30, 40])
b = np.array([1, 2, 3, 4])
print("Array a:", a)
print("Array b:", b)Now perform operations:
# Addition
result = a + b
print("a + b =", result)
# Output: [11 22 33 44]
# Subtraction
result = a - b
print("a - b =", result)
# Output: [ 9 18 27 36]
# Multiplication
result = a * b
print("a * b =", result)
# Output: [ 10 40 90 160]
# Division
result = a / b
print("a / b =", result)
# Output: [10. 10. 10. 10.]Visual representation:
a + b operation:
[10, 20, 30, 40]
+
[ 1, 2, 3, 4]
=
[11, 22, 33, 44]
Each position adds independently:
10+1=11, 20+2=22, 30+3=33, 40+4=44More operations:
# Exponentiation (power)
result = a ** 2
print("a squared:", result)
# Output: [ 100 400 900 1600]
# Modulo (remainder)
result = a % 3
print("a mod 3:", result)
# Output: [1 2 0 1]Real-World Example: Calculate Trip Speeds
Let’s calculate average speeds for multiple trips:
# Trip data
distances = np.array([15.5, 8.2, 22.3, 5.8, 12.0]) # miles
times_minutes = np.array([45, 20, 60, 15, 35]) # minutes
# Convert minutes to hours (vectorized division)
times_hours = times_minutes / 60
# Calculate speed in mph (vectorized division)
speeds = distances / times_hours
print("Distances (miles):", distances)
print("Times (hours):", times_hours)
print("Speeds (mph):", speeds)Output:
Distances (miles): [15.5 8.2 22.3 5.8 12. ]
Times (hours): [0.75 0.33333333 1. 0.25 0.58333333]
Speeds (mph): [20.66666667 24.6 22.3 23.2 20.57142857]Without NumPy, you would need a loop to process each trip individually. With NumPy, one line handles all trips simultaneously.
Operations on 2D Arrays
Element-wise operations work the same way with 2D arrays:
# Student scores before and after extra credit
original_scores = np.array([
[85, 90, 88],
[78, 82, 80],
[92, 85, 91]
])
extra_credit = np.array([
[5, 3, 2],
[4, 5, 3],
[2, 4, 3]
])
# Add extra credit to original scores
final_scores = original_scores + extra_credit
print("Original scores:")
print(original_scores)
print("\nFinal scores (with extra credit):")
print(final_scores)Output:
Original scores:
[[85 90 88]
[78 82 80]
[92 85 91]]
Final scores (with extra credit):
[[90 93 90]
[82 87 83]
[94 89 94]]Each element adds to its corresponding position automatically.
Broadcasting: Operating with Scalars
What is Broadcasting?
Broadcasting is NumPy’s ability to perform operations between arrays of different shapes. The most common case is operating on an entire array with a single number (called a scalar).
NumPy automatically “broadcasts” the scalar to match the array’s shape.
Scalar Operations on Arrays
prices = np.array([100, 200, 150, 300])
# Add tax (9%) to all prices
with_tax = prices * 1.09
print("Original prices:", prices)
print("With 9% tax:", with_tax)Output:
Original prices: [100 200 150 300]
With 9% tax: [109. 218. 163.5 327. ]Visual representation:
prices * 1.09 broadcasts to:
[100, 200, 150, 300]
×
[1.09, 1.09, 1.09, 1.09] ← 1.09 is broadcast to match
=
[109., 218., 163.5, 327.]More examples:
scores = np.array([75, 82, 68, 91, 88])
# Add 5 bonus points to everyone
adjusted = scores + 5
print("Original scores:", scores)
print("With bonus:", adjusted)
# Output: [80 87 73 96 93]
# Convert to percentage (assuming 100 total points)
percentages = scores / 100
print("As percentages:", percentages)
# Output: [0.75 0.82 0.68 0.91 0.88]Broadcasting with 2D Arrays
Broadcasting works with 2D arrays too:
# Sales data (4 weeks × 3 products)
sales = np.array([
[120, 135, 98],
[135, 142, 105],
[150, 138, 112],
[145, 155, 108]
])
# Increase all sales by 10%
increased = sales * 1.10
print("Original sales:")
print(sales)
print("\nWith 10% increase:")
print(increased)Output:
Original sales:
[[120 135 98]
[135 142 105]
[150 138 112]
[145 155 108]]
With 10% increase:
[[132. 148.5 107.8]
[148.5 156.2 115.5]
[165. 151.8 123.2]
[159.5 170.5 118.8]]Advanced Broadcasting: Array with 1D Array
You can also broadcast a 1D array along one dimension of a 2D array:
# Prices for 3 products
unit_prices = np.array([10, 15, 20])
# Calculate revenue (sales × price per unit)
# Broadcasting multiplies each column by its corresponding price
revenue = sales * unit_prices
print("Unit prices:", unit_prices)
print("\nRevenue per week:")
print(revenue)Output:
Unit prices: [10 15 20]
Revenue per week:
[[1200 2025 1960]
[1350 2130 2100]
[1500 2070 2240]
[1450 2325 2160]]Here, NumPy broadcasts the 1D array [10, 15, 20] across all rows, multiplying column 0 by 10, column 1 by 15, and column 2 by 20.
Practical Example: Temperature Conversion
# Temperatures in Celsius
celsius = np.array([0, 10, 20, 25, 30, 35, 40])
# Convert to Fahrenheit: F = C × 9/5 + 32
fahrenheit = celsius * 9/5 + 32
print("Celsius: ", celsius)
print("Fahrenheit:", fahrenheit)Output:
Celsius: [ 0 10 20 25 30 35 40]
Fahrenheit: [ 32. 50. 68. 77. 86. 95. 104.]One line of code converts all temperatures. Broadcasting makes this simple and efficient.
Statistical Methods for 1D Arrays
Basic Statistical Functions
NumPy provides methods to calculate common statistics:
scores = np.array([85, 92, 78, 88, 95, 72, 89, 91])
print("Scores:", scores)
print(f"\nMinimum: {scores.min()}")
print(f"Maximum: {scores.max()}")
print(f"Mean: {scores.mean():.2f}")
print(f"Sum: {scores.sum()}")
print(f"Std Dev: {scores.std():.2f}")Output:
Scores: [85 92 78 88 95 72 89 91]
Minimum: 72
Maximum: 95
Mean: 86.25
Sum: 690
Std Dev: 7.26These methods provide instant insights into your data.
Additional Statistical Methods
data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
print("Data:", data)
print(f"\nMedian: {np.median(data)}")
print(f"25th percentile: {np.percentile(data, 25)}")
print(f"75th percentile: {np.percentile(data, 75)}")Output:
Data: [10 20 30 40 50 60 70 80 90]
Median: 50.0
25th percentile: 30.0
75th percentile: 70.0Finding Positions of Min and Max
Sometimes you need to know where the minimum or maximum occurs:
scores = np.array([85, 92, 78, 88, 95, 72, 89, 91])
# Find indices of max and min
max_index = scores.argmax()
min_index = scores.argmin()
print("Scores:", scores)
print(f"\nHighest score: {scores[max_index]} at index {max_index}")
print(f"Lowest score: {scores[min_index]} at index {min_index}")Output:
Scores: [85 92 78 88 95 72 89 91]
Highest score: 95 at index 4
Lowest score: 72 at index 5The argmax() and argmin() methods return the index of the maximum and minimum values, not the values themselves.
Real-World Example: Sales Analysis
# Monthly sales data
monthly_sales = np.array([120, 135, 150, 145, 160, 175, 190, 185, 170, 155, 140, 165])
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
print("Monthly Sales Analysis:")
print(f"Total annual sales: ${monthly_sales.sum():,}")
print(f"Average monthly sales: ${monthly_sales.mean():.2f}")
print(f"Best month: {months[monthly_sales.argmax()]} (${monthly_sales.max()})")
print(f"Worst month: {months[monthly_sales.argmin()]} (${monthly_sales.min()})")
print(f"Sales range: ${monthly_sales.max() - monthly_sales.min()}")Output:
Monthly Sales Analysis:
Total annual sales: $1,890
Average monthly sales: $157.50
Best month: Jul ($190)
Worst month: Jan ($120)
Sales range: $70This analysis provides actionable business insights with just a few lines of code.
Statistical Methods with the Axis Parameter
Understanding the Axis Parameter
When working with 2D arrays, you often want statistics per row or per column, not for the entire array. The axis parameter controls this:
- No axis: Calculate across the entire array (one result)
- axis=0: Calculate down the rows (one result per column)
- axis=1: Calculate across the columns (one result per row)
Visual representation:
Array shape (4, 3):
Col0 Col1 Col2
Row0: 85 90 88
Row1: 92 85 91
Row2: 78 82 80
Row3: 88 87 92
axis=0 (down rows):
Calculate vertically ↓
Results: one value per column [3 values]
axis=1 (across columns):
Calculate horizontally →
Results: one value per row [4 values]Sample Dataset
# Student scores: 4 students × 3 subjects
scores = np.array([
[85, 90, 88], # Student 1
[92, 85, 91], # Student 2
[78, 82, 80], # Student 3
[88, 87, 92] # Student 4
])
print("Scores (4 students × 3 subjects):")
print(scores)Overall Statistics (No Axis)
Without specifying an axis, you get a single value for the entire array:
print("Overall statistics:")
print(f"Highest score anywhere: {scores.max()}")
print(f"Lowest score anywhere: {scores.min()}")
print(f"Average of all scores: {scores.mean():.2f}")
print(f"Total of all scores: {scores.sum()}")Output:
Overall statistics:
Highest score anywhere: 92
Lowest score anywhere: 78
Average of all scores: 86.17
Total of all scores: 1034Column-Wise Statistics (axis=0)
axis=0 calculates down the rows, giving you one result per column:
# Statistics for each subject (column)
subject_avgs = scores.mean(axis=0)
subject_max = scores.max(axis=0)
subject_min = scores.min(axis=0)
print("Statistics by subject (axis=0):")
print(f"Subject averages: {subject_avgs}")
print(f"Subject maximums: {subject_max}")
print(f"Subject minimums: {subject_min}")Output:
Statistics by subject (axis=0):
Subject averages: [85.75 86. 87.75]
Subject maximums: [92 90 92]
Subject minimums: [78 82 80]Interpretation:
- Subject 0 (Math): Average 85.75, Max 92, Min 78
- Subject 1 (Physics): Average 86.00, Max 90, Min 82
- Subject 2 (Chemistry): Average 87.75, Max 92, Min 80
Row-Wise Statistics (axis=1)
axis=1 calculates across the columns, giving you one result per row:
# Statistics for each student (row)
student_avgs = scores.mean(axis=1)
student_max = scores.max(axis=1)
student_min = scores.min(axis=1)
print("Statistics by student (axis=1):")
print(f"Student averages: {student_avgs}")
print(f"Student best scores: {student_max}")
print(f"Student worst scores: {student_min}")Output:
Statistics by student (axis=1):
Student averages: [87.66666667 89.33333333 80. 89. ]
Student best scores: [90 92 82 92]
Student worst scores: [85 85 78 87]Interpretation:
- Student 0: Average 87.67, Best 90, Worst 85
- Student 1: Average 89.33, Best 92, Worst 85
- Student 2: Average 80.00, Best 82, Worst 78
- Student 3: Average 89.00, Best 92, Worst 87
Visual Summary: Axis Behavior
print("Original array:")
print(scores)
print(f"Shape: {scores.shape}\n")
print("axis=0 (down rows → column results):")
print(scores.mean(axis=0))
print(f"Shape: {scores.mean(axis=0).shape}\n")
print("axis=1 (across columns → row results):")
print(scores.mean(axis=1))
print(f"Shape: {scores.mean(axis=1).shape}")Output:
Original array:
[[85 90 88]
[92 85 91]
[78 82 80]
[88 87 92]]
Shape: (4, 3)
axis=0 (down rows → column results):
[85.75 86. 87.75]
Shape: (3,)
axis=1 (across columns → row results):
[87.66666667 89.33333333 80. 89. ]
Shape: (4,)Practical Example: Sales Dashboard
Let’s create a complete sales analysis:
# Sales data: 4 weeks × 5 products
sales = np.array([
[120, 135, 98, 110, 105], # Week 1
[135, 142, 105, 118, 112], # Week 2
[150, 138, 112, 125, 120], # Week 3
[145, 155, 108, 130, 115] # Week 4
])
print("Sales Data (4 weeks × 5 products):")
print(sales)Analyze product performance:
# Average sales per product (across all weeks)
product_avg = sales.mean(axis=0)
print("\nAverage sales per product (across all weeks):")
for i, avg in enumerate(product_avg, 1):
print(f"Product {i}: {avg:.1f} units/week")Output:
Average sales per product (across all weeks):
Product 1: 137.5 units/week
Product 2: 142.5 units/week
Product 3: 105.8 units/week
Product 4: 120.8 units/week
Product 5: 113.0 units/weekAnalyze weekly performance:
# Total sales per week (all products combined)
weekly_totals = sales.sum(axis=1)
print("\nTotal sales per week (all products combined):")
for i, total in enumerate(weekly_totals, 1):
print(f"Week {i}: {total} units")Output:
Total sales per week (all products combined):
Week 1: 568 units
Week 2: 612 units
Week 3: 645 units
Week 4: 653 unitsComplete analysis report:
print("\n" + "="*50)
print("SALES ANALYSIS REPORT")
print("="*50)
print(f"Total units sold (all time): {sales.sum()}")
print(f"Best week: Week {weekly_totals.argmax() + 1} ({weekly_totals.max()} units)")
print(f"Best product: Product {product_avg.argmax() + 1} ({product_avg.max():.1f} avg)")
print(f"Overall average per product/week: {sales.mean():.1f} units")
print("="*50)Output:
==================================================
SALES ANALYSIS REPORT
==================================================
Total units sold (all time): 2478
Best week: Week 4 (653 units)
Best product: Product 2 (142.5 avg)
Overall average per product/week: 123.9 units
==================================================This demonstrates the power of axis-based calculations for real business analysis.
Practice Exercises
Apply what you have learned with these exercises.
Exercise 1: Vector Operations
Calculate total price (price plus shipping) for each item:
prices = np.array([50, 75, 120, 30, 90])
shipping = np.array([5, 8, 10, 3, 7])
# Your code here:
# 1. Calculate total (price + shipping)
# 2. Calculate total with 9% taxExercise 2: Broadcasting
Apply a 15% discount to all prices:
prices = np.array([100, 200, 150, 300, 250])
# Your code here:
# Calculate discounted prices (15% off)Hint
To apply a 15% discount, multiply by 0.85 (which is 1 - 0.15).
Exercise 3: Statistics with Axis
Calculate average score per student and per subject:
exam_scores = np.array([
[88, 92, 85],
[76, 80, 78],
[95, 89, 93],
[82, 88, 86]
])
# Your code here:
# 1. Calculate average per student (each row)
# 2. Calculate average per subject (each column)
# 3. Find highest score overallSummary
You now understand how to perform calculations on NumPy arrays efficiently. Let’s review the key concepts.
Key Concepts
Vector Arithmetic
- Element-wise operations:
+,-,*,/,**,% - Automatic for arrays of the same shape
- Much faster than loops for large datasets
Broadcasting
- Scalar operations:
array * 1.09applies to all elements - NumPy expands smaller arrays to match larger ones automatically
- Simplifies code and improves performance
Statistical Methods (1D)
.min()and.max()find minimum and maximum values.mean()calculates average.sum()calculates total.std()calculates standard deviation.argmin()and.argmax()find indices of min/maxnp.median(),np.percentile()for additional statistics
Axis Parameter for 2D Arrays
- No axis: Single value across entire array
axis=0: Down rows, one result per columnaxis=1: Across columns, one result per row
Key Methods Reference
# Arithmetic operations
array1 + array2 # Element-wise addition
array * scalar # Broadcasting
# Statistical methods
array.min() # Minimum value
array.max() # Maximum value
array.mean() # Average
array.sum() # Total
array.std() # Standard deviation
array.argmin() # Index of minimum
array.argmax() # Index of maximum
np.median(array) # Median value
np.percentile(array, p) # Percentile
# With axis parameter
array.mean(axis=0) # Mean of each column
array.mean(axis=1) # Mean of each rowAxis Quick Reference
For array with shape (4, 3):
array.mean() → Single value
array.mean(axis=0) → Shape (3,) - one per column
array.mean(axis=1) → Shape (4,) - one per rowWhy This Matters
These operations form the core of data analysis:
- Calculate totals, averages, and ranges
- Compare performance across time periods or categories
- Identify outliers and trends
- Transform data for further analysis
Vectorization makes these calculations fast enough to work with datasets containing millions of values. This is why NumPy is essential for data science.
Next Steps
You can now perform calculations on entire arrays efficiently. In the next lesson, you will learn Boolean indexing—a powerful technique for filtering data based on conditions.
Continue to Lesson 5 - Boolean Indexing
Learn to filter arrays based on conditions and create powerful data queries
Back to Lesson 3 - Selecting and Slicing
Review row and column selection techniques
Unlock the Power of Calculations
You now possess the skills to analyze datasets efficiently. Vector operations and broadcasting eliminate the need for slow loops, while statistical methods provide instant insights into your data.
Combined with the axis parameter, you can analyze data from multiple perspectives—by product, by time period, by category—all with simple, readable code. These are the tools professional data analysts use every day!