Lesson 4 - Vector Operations and Calculations

Performing Calculations at Scale

You can now create arrays and select data from them. This lesson teaches you how to perform calculations on arrays efficiently using vectorized operations—the true power of NumPy.

By the end of this lesson, you will be able to:

  • Perform arithmetic operations on entire arrays without loops
  • Understand and apply broadcasting to simplify calculations
  • Calculate statistical summaries (mean, min, max, sum, standard deviation)
  • Use the axis parameter to calculate statistics for rows or columns
  • Find the positions of minimum and maximum values
  • Apply these techniques to real data analysis problems

These operations are the foundation of data analytics. Let’s begin.


Vector Arithmetic Operations

Element-Wise Operations

NumPy performs arithmetic operations element by element automatically. No loops required.

import numpy as np

# Create two arrays
a = np.array([10, 20, 30, 40])
b = np.array([1, 2, 3, 4])

print("Array a:", a)
print("Array b:", b)

Now perform operations:

# Addition
result = a + b
print("a + b =", result)
# Output: [11 22 33 44]

# Subtraction
result = a - b
print("a - b =", result)
# Output: [ 9 18 27 36]

# Multiplication
result = a * b
print("a * b =", result)
# Output: [ 10  40  90 160]

# Division
result = a / b
print("a / b =", result)
# Output: [10. 10. 10. 10.]

Visual representation:

a + b operation:

[10, 20, 30, 40]
+
[ 1,  2,  3,  4]
=
[11, 22, 33, 44]

Each position adds independently:
10+1=11, 20+2=22, 30+3=33, 40+4=44

More operations:

# Exponentiation (power)
result = a ** 2
print("a squared:", result)
# Output: [ 100  400  900 1600]

# Modulo (remainder)
result = a % 3
print("a mod 3:", result)
# Output: [1 2 0 1]

Real-World Example: Calculate Trip Speeds

Let’s calculate average speeds for multiple trips:

# Trip data
distances = np.array([15.5, 8.2, 22.3, 5.8, 12.0])  # miles
times_minutes = np.array([45, 20, 60, 15, 35])      # minutes

# Convert minutes to hours (vectorized division)
times_hours = times_minutes / 60

# Calculate speed in mph (vectorized division)
speeds = distances / times_hours

print("Distances (miles):", distances)
print("Times (hours):", times_hours)
print("Speeds (mph):", speeds)

Output:

Distances (miles): [15.5  8.2 22.3  5.8 12. ]
Times (hours): [0.75       0.33333333 1.         0.25       0.58333333]
Speeds (mph): [20.66666667 24.6        22.3        23.2        20.57142857]

Without NumPy, you would need a loop to process each trip individually. With NumPy, one line handles all trips simultaneously.

Operations on 2D Arrays

Element-wise operations work the same way with 2D arrays:

# Student scores before and after extra credit
original_scores = np.array([
    [85, 90, 88],
    [78, 82, 80],
    [92, 85, 91]
])

extra_credit = np.array([
    [5, 3, 2],
    [4, 5, 3],
    [2, 4, 3]
])

# Add extra credit to original scores
final_scores = original_scores + extra_credit

print("Original scores:")
print(original_scores)
print("\nFinal scores (with extra credit):")
print(final_scores)

Output:

Original scores:
[[85 90 88]
 [78 82 80]
 [92 85 91]]

Final scores (with extra credit):
[[90 93 90]
 [82 87 83]
 [94 89 94]]

Each element adds to its corresponding position automatically.


Broadcasting: Operating with Scalars

What is Broadcasting?

Broadcasting is NumPy’s ability to perform operations between arrays of different shapes. The most common case is operating on an entire array with a single number (called a scalar).

NumPy automatically “broadcasts” the scalar to match the array’s shape.

Scalar Operations on Arrays

prices = np.array([100, 200, 150, 300])

# Add tax (9%) to all prices
with_tax = prices * 1.09

print("Original prices:", prices)
print("With 9% tax:", with_tax)

Output:

Original prices: [100 200 150 300]
With 9% tax: [109. 218. 163.5 327. ]

Visual representation:

prices * 1.09 broadcasts to:

[100, 200, 150, 300]
×
[1.09, 1.09, 1.09, 1.09]  ← 1.09 is broadcast to match
=
[109., 218., 163.5, 327.]

More examples:

scores = np.array([75, 82, 68, 91, 88])

# Add 5 bonus points to everyone
adjusted = scores + 5
print("Original scores:", scores)
print("With bonus:", adjusted)
# Output: [80 87 73 96 93]

# Convert to percentage (assuming 100 total points)
percentages = scores / 100
print("As percentages:", percentages)
# Output: [0.75 0.82 0.68 0.91 0.88]

Broadcasting with 2D Arrays

Broadcasting works with 2D arrays too:

# Sales data (4 weeks × 3 products)
sales = np.array([
    [120, 135, 98],
    [135, 142, 105],
    [150, 138, 112],
    [145, 155, 108]
])

# Increase all sales by 10%
increased = sales * 1.10

print("Original sales:")
print(sales)
print("\nWith 10% increase:")
print(increased)

Output:

Original sales:
[[120 135  98]
 [135 142 105]
 [150 138 112]
 [145 155 108]]

With 10% increase:
[[132.  148.5 107.8]
 [148.5 156.2 115.5]
 [165.  151.8 123.2]
 [159.5 170.5 118.8]]

Advanced Broadcasting: Array with 1D Array

You can also broadcast a 1D array along one dimension of a 2D array:

# Prices for 3 products
unit_prices = np.array([10, 15, 20])

# Calculate revenue (sales × price per unit)
# Broadcasting multiplies each column by its corresponding price
revenue = sales * unit_prices

print("Unit prices:", unit_prices)
print("\nRevenue per week:")
print(revenue)

Output:

Unit prices: [10 15 20]

Revenue per week:
[[1200 2025 1960]
 [1350 2130 2100]
 [1500 2070 2240]
 [1450 2325 2160]]

Here, NumPy broadcasts the 1D array [10, 15, 20] across all rows, multiplying column 0 by 10, column 1 by 15, and column 2 by 20.

Practical Example: Temperature Conversion

# Temperatures in Celsius
celsius = np.array([0, 10, 20, 25, 30, 35, 40])

# Convert to Fahrenheit: F = C × 9/5 + 32
fahrenheit = celsius * 9/5 + 32

print("Celsius:   ", celsius)
print("Fahrenheit:", fahrenheit)

Output:

Celsius:    [ 0 10 20 25 30 35 40]
Fahrenheit: [ 32.  50.  68.  77.  86.  95. 104.]

One line of code converts all temperatures. Broadcasting makes this simple and efficient.


Statistical Methods for 1D Arrays

Basic Statistical Functions

NumPy provides methods to calculate common statistics:

scores = np.array([85, 92, 78, 88, 95, 72, 89, 91])

print("Scores:", scores)
print(f"\nMinimum:  {scores.min()}")
print(f"Maximum:  {scores.max()}")
print(f"Mean:     {scores.mean():.2f}")
print(f"Sum:      {scores.sum()}")
print(f"Std Dev:  {scores.std():.2f}")

Output:

Scores: [85 92 78 88 95 72 89 91]

Minimum:  72
Maximum:  95
Mean:     86.25
Sum:      690
Std Dev:  7.26

These methods provide instant insights into your data.

Additional Statistical Methods

data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])

print("Data:", data)
print(f"\nMedian:        {np.median(data)}")
print(f"25th percentile: {np.percentile(data, 25)}")
print(f"75th percentile: {np.percentile(data, 75)}")

Output:

Data: [10 20 30 40 50 60 70 80 90]

Median:        50.0
25th percentile: 30.0
75th percentile: 70.0

Finding Positions of Min and Max

Sometimes you need to know where the minimum or maximum occurs:

scores = np.array([85, 92, 78, 88, 95, 72, 89, 91])

# Find indices of max and min
max_index = scores.argmax()
min_index = scores.argmin()

print("Scores:", scores)
print(f"\nHighest score: {scores[max_index]} at index {max_index}")
print(f"Lowest score:  {scores[min_index]} at index {min_index}")

Output:

Scores: [85 92 78 88 95 72 89 91]

Highest score: 95 at index 4
Lowest score:  72 at index 5

The argmax() and argmin() methods return the index of the maximum and minimum values, not the values themselves.

Real-World Example: Sales Analysis

# Monthly sales data
monthly_sales = np.array([120, 135, 150, 145, 160, 175, 190, 185, 170, 155, 140, 165])
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

print("Monthly Sales Analysis:")
print(f"Total annual sales:    ${monthly_sales.sum():,}")
print(f"Average monthly sales: ${monthly_sales.mean():.2f}")
print(f"Best month:            {months[monthly_sales.argmax()]} (${monthly_sales.max()})")
print(f"Worst month:           {months[monthly_sales.argmin()]} (${monthly_sales.min()})")
print(f"Sales range:           ${monthly_sales.max() - monthly_sales.min()}")

Output:

Monthly Sales Analysis:
Total annual sales:    $1,890
Average monthly sales: $157.50
Best month:            Jul ($190)
Worst month:           Jan ($120)
Sales range:           $70

This analysis provides actionable business insights with just a few lines of code.


Statistical Methods with the Axis Parameter

Understanding the Axis Parameter

When working with 2D arrays, you often want statistics per row or per column, not for the entire array. The axis parameter controls this:

  • No axis: Calculate across the entire array (one result)
  • axis=0: Calculate down the rows (one result per column)
  • axis=1: Calculate across the columns (one result per row)

Visual representation:

Array shape (4, 3):
        Col0  Col1  Col2
Row0:    85    90    88
Row1:    92    85    91
Row2:    78    82    80
Row3:    88    87    92

axis=0 (down rows):
Calculate vertically ↓
Results: one value per column [3 values]

axis=1 (across columns):
Calculate horizontally →
Results: one value per row [4 values]

Sample Dataset

# Student scores: 4 students × 3 subjects
scores = np.array([
    [85, 90, 88],  # Student 1
    [92, 85, 91],  # Student 2
    [78, 82, 80],  # Student 3
    [88, 87, 92]   # Student 4
])

print("Scores (4 students × 3 subjects):")
print(scores)

Overall Statistics (No Axis)

Without specifying an axis, you get a single value for the entire array:

print("Overall statistics:")
print(f"Highest score anywhere: {scores.max()}")
print(f"Lowest score anywhere:  {scores.min()}")
print(f"Average of all scores:  {scores.mean():.2f}")
print(f"Total of all scores:    {scores.sum()}")

Output:

Overall statistics:
Highest score anywhere: 92
Lowest score anywhere:  78
Average of all scores:  86.17
Total of all scores:    1034

Column-Wise Statistics (axis=0)

axis=0 calculates down the rows, giving you one result per column:

# Statistics for each subject (column)
subject_avgs = scores.mean(axis=0)
subject_max = scores.max(axis=0)
subject_min = scores.min(axis=0)

print("Statistics by subject (axis=0):")
print(f"Subject averages: {subject_avgs}")
print(f"Subject maximums: {subject_max}")
print(f"Subject minimums: {subject_min}")

Output:

Statistics by subject (axis=0):
Subject averages: [85.75 86.   87.75]
Subject maximums: [92 90 92]
Subject minimums: [78 82 80]

Interpretation:

  • Subject 0 (Math): Average 85.75, Max 92, Min 78
  • Subject 1 (Physics): Average 86.00, Max 90, Min 82
  • Subject 2 (Chemistry): Average 87.75, Max 92, Min 80

Row-Wise Statistics (axis=1)

axis=1 calculates across the columns, giving you one result per row:

# Statistics for each student (row)
student_avgs = scores.mean(axis=1)
student_max = scores.max(axis=1)
student_min = scores.min(axis=1)

print("Statistics by student (axis=1):")
print(f"Student averages: {student_avgs}")
print(f"Student best scores: {student_max}")
print(f"Student worst scores: {student_min}")

Output:

Statistics by student (axis=1):
Student averages: [87.66666667 89.33333333 80.         89.        ]
Student best scores: [90 92 82 92]
Student worst scores: [85 85 78 87]

Interpretation:

  • Student 0: Average 87.67, Best 90, Worst 85
  • Student 1: Average 89.33, Best 92, Worst 85
  • Student 2: Average 80.00, Best 82, Worst 78
  • Student 3: Average 89.00, Best 92, Worst 87

Visual Summary: Axis Behavior

print("Original array:")
print(scores)
print(f"Shape: {scores.shape}\n")

print("axis=0 (down rows → column results):")
print(scores.mean(axis=0))
print(f"Shape: {scores.mean(axis=0).shape}\n")

print("axis=1 (across columns → row results):")
print(scores.mean(axis=1))
print(f"Shape: {scores.mean(axis=1).shape}")

Output:

Original array:
[[85 90 88]
 [92 85 91]
 [78 82 80]
 [88 87 92]]
Shape: (4, 3)

axis=0 (down rows → column results):
[85.75 86.   87.75]
Shape: (3,)

axis=1 (across columns → row results):
[87.66666667 89.33333333 80.         89.        ]
Shape: (4,)

Practical Example: Sales Dashboard

Let’s create a complete sales analysis:

# Sales data: 4 weeks × 5 products
sales = np.array([
    [120, 135, 98, 110, 105],   # Week 1
    [135, 142, 105, 118, 112],  # Week 2
    [150, 138, 112, 125, 120],  # Week 3
    [145, 155, 108, 130, 115]   # Week 4
])

print("Sales Data (4 weeks × 5 products):")
print(sales)

Analyze product performance:

# Average sales per product (across all weeks)
product_avg = sales.mean(axis=0)

print("\nAverage sales per product (across all weeks):")
for i, avg in enumerate(product_avg, 1):
    print(f"Product {i}: {avg:.1f} units/week")

Output:

Average sales per product (across all weeks):
Product 1: 137.5 units/week
Product 2: 142.5 units/week
Product 3: 105.8 units/week
Product 4: 120.8 units/week
Product 5: 113.0 units/week

Analyze weekly performance:

# Total sales per week (all products combined)
weekly_totals = sales.sum(axis=1)

print("\nTotal sales per week (all products combined):")
for i, total in enumerate(weekly_totals, 1):
    print(f"Week {i}: {total} units")

Output:

Total sales per week (all products combined):
Week 1: 568 units
Week 2: 612 units
Week 3: 645 units
Week 4: 653 units

Complete analysis report:

print("\n" + "="*50)
print("SALES ANALYSIS REPORT")
print("="*50)
print(f"Total units sold (all time):      {sales.sum()}")
print(f"Best week:                         Week {weekly_totals.argmax() + 1} ({weekly_totals.max()} units)")
print(f"Best product:                      Product {product_avg.argmax() + 1} ({product_avg.max():.1f} avg)")
print(f"Overall average per product/week:  {sales.mean():.1f} units")
print("="*50)

Output:

==================================================
SALES ANALYSIS REPORT
==================================================
Total units sold (all time):      2478
Best week:                         Week 4 (653 units)
Best product:                      Product 2 (142.5 avg)
Overall average per product/week:  123.9 units
==================================================

This demonstrates the power of axis-based calculations for real business analysis.


Practice Exercises

Apply what you have learned with these exercises.

Exercise 1: Vector Operations

Calculate total price (price plus shipping) for each item:

prices = np.array([50, 75, 120, 30, 90])
shipping = np.array([5, 8, 10, 3, 7])

# Your code here:
# 1. Calculate total (price + shipping)
# 2. Calculate total with 9% tax

Exercise 2: Broadcasting

Apply a 15% discount to all prices:

prices = np.array([100, 200, 150, 300, 250])

# Your code here:
# Calculate discounted prices (15% off)

Hint

To apply a 15% discount, multiply by 0.85 (which is 1 - 0.15).

Exercise 3: Statistics with Axis

Calculate average score per student and per subject:

exam_scores = np.array([
    [88, 92, 85],
    [76, 80, 78],
    [95, 89, 93],
    [82, 88, 86]
])

# Your code here:
# 1. Calculate average per student (each row)
# 2. Calculate average per subject (each column)
# 3. Find highest score overall

Summary

You now understand how to perform calculations on NumPy arrays efficiently. Let’s review the key concepts.

Key Concepts

Vector Arithmetic

  • Element-wise operations: +, -, *, /, **, %
  • Automatic for arrays of the same shape
  • Much faster than loops for large datasets

Broadcasting

  • Scalar operations: array * 1.09 applies to all elements
  • NumPy expands smaller arrays to match larger ones automatically
  • Simplifies code and improves performance

Statistical Methods (1D)

  • .min() and .max() find minimum and maximum values
  • .mean() calculates average
  • .sum() calculates total
  • .std() calculates standard deviation
  • .argmin() and .argmax() find indices of min/max
  • np.median(), np.percentile() for additional statistics

Axis Parameter for 2D Arrays

  • No axis: Single value across entire array
  • axis=0: Down rows, one result per column
  • axis=1: Across columns, one result per row

Key Methods Reference

# Arithmetic operations
array1 + array2         # Element-wise addition
array * scalar          # Broadcasting

# Statistical methods
array.min()             # Minimum value
array.max()             # Maximum value
array.mean()            # Average
array.sum()             # Total
array.std()             # Standard deviation
array.argmin()          # Index of minimum
array.argmax()          # Index of maximum
np.median(array)        # Median value
np.percentile(array, p) # Percentile

# With axis parameter
array.mean(axis=0)      # Mean of each column
array.mean(axis=1)      # Mean of each row

Axis Quick Reference

For array with shape (4, 3):

array.mean()        → Single value
array.mean(axis=0)  → Shape (3,) - one per column
array.mean(axis=1)  → Shape (4,) - one per row

Why This Matters

These operations form the core of data analysis:

  • Calculate totals, averages, and ranges
  • Compare performance across time periods or categories
  • Identify outliers and trends
  • Transform data for further analysis

Vectorization makes these calculations fast enough to work with datasets containing millions of values. This is why NumPy is essential for data science.


Next Steps

You can now perform calculations on entire arrays efficiently. In the next lesson, you will learn Boolean indexing—a powerful technique for filtering data based on conditions.

Continue to Lesson 5 - Boolean Indexing

Learn to filter arrays based on conditions and create powerful data queries

Back to Lesson 3 - Selecting and Slicing

Review row and column selection techniques


Unlock the Power of Calculations

You now possess the skills to analyze datasets efficiently. Vector operations and broadcasting eliminate the need for slow loops, while statistical methods provide instant insights into your data.

Combined with the axis parameter, you can analyze data from multiple perspectives—by product, by time period, by category—all with simple, readable code. These are the tools professional data analysts use every day!