Lesson 6 - Modifying Data and Assignment
Transforming Your Data
You can now create, analyze, and filter NumPy arrays. This final lesson teaches you how to modify arrays—updating values, cleaning data, and transforming datasets for analysis.
By the end of this lesson, you will be able to:
- Assign new values to specific array elements
- Update entire rows or columns at once
- Modify data using Boolean indexing for targeted updates
- Clean datasets by fixing invalid values
- Add new calculated columns to arrays
- Understand the difference between views and copies
- Apply these techniques to real data preparation tasks
Data modification is essential for cleaning and preparing datasets before analysis. Let’s master these techniques.
Basic Assignment Operations
Assigning to Single Elements
You can change individual array elements by assigning new values:
import numpy as np
scores = np.array([85, 92, 78, 88, 95])
print("Original scores:")
print(scores)
# Output: [85 92 78 88 95]Change specific elements:
# Change first score
scores[0] = 90
print("After changing first score to 90:")
print(scores)
# Output: [90 92 78 88 95]
# Change last score
scores[-1] = 100
print("After changing last score to 100:")
print(scores)
# Output: [90 92 78 88 100]Assigning to Multiple Elements
Update multiple elements at once using slicing:
prices = np.array([10, 15, 20, 25, 30])
print("Original prices:")
print(prices)
# Output: [10 15 20 25 30]Set multiple elements to the same value:
# Update first 3 prices to same value
prices[0:3] = 12
print("After setting first 3 to 12:")
print(prices)
# Output: [12 12 12 25 30]Set multiple elements to different values:
# Update first 3 prices to different values
prices[0:3] = [8, 10, 12]
print("After setting to different values:")
print(prices)
# Output: [ 8 10 12 25 30]Assigning in 2D Arrays
Assignment works with 2D arrays too:
# Student scores: Math, Physics, Chemistry
students = np.array([
[85, 90, 88],
[92, 85, 91],
[78, 82, 80]
])
print("Original student scores:")
print(students)Update individual elements:
# Change single element (student 0, subject 0)
students[0, 0] = 87
print("After changing [0, 0] to 87:")
print(students)
# Output:
# [[87 90 88]
# [92 85 91]
# [78 82 80]]Update entire rows:
# Update entire first row (all scores for student 0)
students[0] = [90, 92, 89]
print("After updating first student's scores:")
print(students)
# Output:
# [[90 92 89]
# [92 85 91]
# [78 82 80]]Update entire columns:
# Update entire Math column (column 0)
students[:, 0] = [95, 93, 85]
print("After updating Math column:")
print(students)
# Output:
# [[95 92 89]
# [93 85 91]
# [85 82 80]]Views vs Copies: Important Difference
Critical Concept
NumPy slices create views, not copies. When you modify a slice, you modify the original array. This is different from Python lists!
# NumPy arrays: slices are VIEWS (modify original)
original = np.array([1, 2, 3, 4, 5])
slice_view = original[1:4]
print("Original array:", original)
print("Slice:", slice_view)
# Output: Original array: [1 2 3 4 5]
# Output: Slice: [2 3 4]Modify the slice:
# Modify slice
slice_view[0] = 99
print("\nAfter modifying slice:")
print("Slice:", slice_view)
print("Original (also changed!):", original)
# Output: Slice: [99 3 4]
# Output: Original (also changed!): [ 1 99 3 4 5]The original array changed! To prevent this, use .copy():
# Use .copy() to create independent copy
original = np.array([1, 2, 3, 4, 5])
slice_copy = original[1:4].copy()
slice_copy[0] = 99
print("\nWith .copy():")
print("Slice copy:", slice_copy)
print("Original (unchanged):", original)
# Output: Slice copy: [99 3 4]
# Output: Original (unchanged): [1 2 3 4 5]Assignment with Boolean Indexing
Replacing Values That Meet Conditions
Boolean indexing enables targeted updates—change only values that meet specific criteria:
ages = np.array([25, 30, -5, 45, 18, -2, 35, 150, 28])
print("Original ages (with errors):")
print(ages)
# Output: [ 25 30 -5 45 18 -2 35 150 28]Fix invalid negative ages:
# Fix negative ages (set to 0)
ages[ages < 0] = 0
print("After fixing negatives:")
print(ages)
# Output: [ 25 30 0 45 18 0 35 150 28]Cap unrealistic ages:
# Cap unrealistic ages (> 100 → 100)
ages[ages > 100] = 100
print("After capping at 100:")
print(ages)
# Output: [ 25 30 0 45 18 0 35 100 28]Practical Example: Data Cleaning
Sensor data often contains error codes that need cleaning:
# Temperature data with sensor errors
temps = np.array([25, 28, -999, 32, 30, -999, 27, 31, 29, -999])
print("Temperature readings (-999 = sensor error):")
print(temps)
# Output: [ 25 28 -999 32 30 -999 27 31 29 -999]Replace errors with NaN:
# Convert to float to support NaN
temps = temps.astype(float)
# Replace errors with NaN
temps[temps == -999] = np.nan
print("\nAfter replacing -999 with NaN:")
print(temps)
# Output: [25. 28. nan 32. 30. nan 27. 31. 29. nan]Calculate statistics ignoring NaN:
# Calculate mean (ignoring NaN)
mean_temp = np.nanmean(temps)
print(f"\nMean temperature (ignoring NaN): {mean_temp:.1f}°C")
# Output: Mean temperature (ignoring NaN): 28.9°C
# Replace NaN with mean
temps[np.isnan(temps)] = mean_temp
print("\nAfter filling NaN with mean:")
print(temps)
# Output: [25. 28. 28.9 32. 30. 28.9 27. 31. 29. 28.9]This is a common data cleaning pattern: identify invalid values, mark them as NaN, calculate statistics without them, then fill missing values.
Applying Different Updates Based on Ranges
Sometimes you need to apply different updates to different subsets:
# Sales performance bonuses
sales = np.array([120, 135, 98, 145, 110, 175, 190, 88])
print("Sales amounts:")
print(sales)
# Output: [120 135 98 145 110 175 190 88]Calculate tiered bonuses:
# Create bonus array (copy to avoid modifying original)
bonuses = np.zeros(len(sales))
# Low sales (< 100): 5% bonus
low_mask = sales < 100
bonuses[low_mask] = sales[low_mask] * 0.05
# Medium sales (100-150): 10% bonus
medium_mask = (sales >= 100) & (sales <= 150)
bonuses[medium_mask] = sales[medium_mask] * 0.10
# High sales (> 150): 15% bonus
high_mask = sales > 150
bonuses[high_mask] = sales[high_mask] * 0.15
print("\nSales and bonuses:")
for sale, bonus in zip(sales, bonuses):
print(f"Sales: ${sale:3d} → Bonus: ${bonus:5.2f}")Output:
Sales and bonuses:
Sales: $120 → Bonus: $12.00
Sales: $135 → Bonus: $13.50
Sales: $ 98 → Bonus: $ 4.90
Sales: $145 → Bonus: $14.50
Sales: $110 → Bonus: $11.00
Sales: $175 → Bonus: $26.25
Sales: $190 → Bonus: $28.50
Sales: $ 88 → Bonus: $ 4.40Conditional Assignment in 2D Arrays
Update Rows Based on Conditions
Apply updates to specific rows based on column values:
# Student data: ID, Math, Physics, Chemistry
students = np.array([
[101, 85, 90, 88],
[102, 92, 85, 91],
[103, 55, 58, 52], # Struggling student
[104, 88, 87, 92],
[105, 48, 45, 50] # Struggling student
])
print("Original student data:")
print(students)Add bonus points to struggling students:
# Add 5 bonus points to Math for students who scored < 60
low_math = students[:, 1] < 60
students[low_math, 1] = students[low_math, 1] + 5
print("After adding bonus to low Math scores:")
print(students)
# Output:
# [[101 85 90 88]
# [102 92 85 91]
# [103 60 58 52] ← Math increased from 55 to 60
# [104 88 87 92]
# [105 53 45 50]] ← Math increased from 48 to 53Update Specific Columns for All Rows
Apply updates to entire columns:
# Sales data: Product, Units, Price
products = np.array([
[1, 120, 25],
[2, 85, 40],
[3, 200, 15],
[4, 55, 60]
])
print("Original product data:")
print(products)Increase all prices:
# Increase all prices by 10%
products[:, 2] = products[:, 2] * 1.1
print("After 10% price increase:")
print(products)
# Output:
# [[ 1 120 27.5]
# [ 2 85 44. ]
# [ 3 200 16.5]
# [ 4 55 66. ]]
# Round prices to nearest integer
products[:, 2] = np.round(products[:, 2])
print("After rounding prices:")
print(products.astype(int))
# Output:
# [[ 1 120 28]
# [ 2 85 44]
# [ 3 200 16]
# [ 4 55 66]]Cap Values in Specific Columns
Prevent unrealistic values:
# Trip data: Distance (km), Time (min), Speed (km/h)
trips = np.array([
[5.2, 12, 26],
[15.8, 8, 118], # Unrealistic speed
[8.5, 15, 34],
[22.4, 10, 134], # Unrealistic speed
[12.1, 18, 40]
])
print("Trip data (with unrealistic speeds):")
print(trips)Cap speeds at reasonable maximum:
# Cap speeds at 100 km/h
trips[trips[:, 2] > 100, 2] = 100
print("\nAfter capping speeds at 100:")
print(trips)
# Output:
# [[ 5.2 12. 26. ]
# [ 15.8 8. 100. ] ← Capped from 118
# [ 8.5 15. 34. ]
# [ 22.4 10. 100. ] ← Capped from 134
# [ 12.1 18. 40. ]]Adding New Columns
Using np.column_stack()
Add new columns to existing arrays:
# Student scores: Math, Physics
scores = np.array([
[85, 90],
[92, 85],
[78, 82]
])
print("Original scores (Math, Physics):")
print(scores)
print(f"Shape: {scores.shape}")
# Output: Shape: (3, 2)Add a Chemistry column:
# Add Chemistry scores
chemistry = np.array([88, 91, 80])
scores_with_chem = np.column_stack((scores, chemistry))
print("\nAfter adding Chemistry:")
print(scores_with_chem)
print(f"Shape: {scores_with_chem.shape}")
# Output:
# [[85 90 88]
# [92 85 91]
# [78 82 80]]
# Shape: (3, 3)Using np.concatenate()
Alternative method using concatenate:
# Convert 1D array to column
chemistry_col = chemistry.reshape(-1, 1)
scores_concat = np.concatenate((scores, chemistry_col), axis=1)
print("Using np.concatenate:")
print(scores_concat)
print(f"Shape: {scores_concat.shape}")Both methods produce the same result. Use whichever feels more natural.
Adding Calculated Columns
Create new columns from calculations:
# Sales data: Units, Price
sales = np.array([
[120, 25],
[85, 40],
[200, 15]
])
print("Sales data (Units, Price):")
print(sales)Calculate revenue:
# Calculate Revenue = Units × Price
revenue = sales[:, 0] * sales[:, 1]
print("\nRevenue:")
print(revenue)
# Output: [3000 3400 3000]
# Add Revenue as new column
sales_with_revenue = np.column_stack((sales, revenue))
print("\nSales with Revenue (Units, Price, Revenue):")
print(sales_with_revenue)
# Output:
# [[ 120 25 3000]
# [ 85 40 3400]
# [ 200 15 3000]]Complete Data Enhancement Example
Build a comprehensive dataset with multiple calculated columns:
# Product data: ID, Units_Sold, Unit_Price
products = np.array([
[1, 120, 25],
[2, 85, 40],
[3, 200, 15],
[4, 55, 60]
])
print("Original product data:")
print(products)Add multiple calculated columns:
# Calculate Revenue
revenue = products[:, 1] * products[:, 2]
# Calculate Tax (9%)
tax = revenue * 0.09
# Calculate Total (Revenue + Tax)
total = revenue + tax
print("\nCalculated columns:")
print(f"Revenue: {revenue}")
print(f"Tax: {tax}")
print(f"Total: {total}")Combine everything:
# Add all new columns
enhanced_products = np.column_stack((products, revenue, tax, total))
print("\nEnhanced product data:")
print("ID | Units | Price | Revenue | Tax | Total")
print("=" * 55)
for row in enhanced_products:
print(f"{int(row[0]):2d} | {int(row[1]):5d} | {int(row[2]):5d} | {row[3]:7.0f} | {row[4]:6.2f} | {row[5]:7.2f}")Output:
Enhanced product data:
ID | Units | Price | Revenue | Tax | Total
=======================================================
1 | 120 | 25 | 3000 | 270.00 | 3270.00
2 | 85 | 40 | 3400 | 306.00 | 3706.00
3 | 200 | 15 | 3000 | 270.00 | 3270.00
4 | 55 | 60 | 3300 | 297.00 | 3597.00Practice Exercises
Apply data modification techniques to these exercises.
Exercise 1: Clean Invalid Data
Replace all negative values with 0:
measurements = np.array([25, -5, 32, 30, -8, 27, 31, -2, 29])
# Your code here:
# 1. Replace negative values with 0
# 2. Print cleaned arrayExercise 2: Apply Conditional Discount
Reduce prices greater than 50 by 10%:
prices = np.array([45, 65, 30, 75, 40, 80, 55])
# Your code here:
# 1. Apply 10% discount to prices > 50
# 2. Print updated pricesHint
Use Boolean indexing: prices[prices > 50] = prices[prices > 50] * 0.9
Exercise 3: Add Calculated Column
Calculate and add total score for each student:
# Student scores: Math, Physics, Chemistry
students = np.array([
[85, 90, 88],
[92, 85, 91],
[78, 82, 80]
])
# Your code here:
# 1. Calculate total for each student (sum across row)
# 2. Add as new column using np.column_stack()
# 3. Print resultSummary
You now know how to modify NumPy arrays effectively. Let’s review the key concepts.
Key Concepts
Basic Assignment
- Single element:
array[0] = 100 - Multiple elements:
array[0:3] = [1, 2, 3] - 2D single:
array[0, 1] = 50 - 2D row:
array[0] = [1, 2, 3] - 2D column:
array[:, 0] = [10, 20, 30]
Boolean Assignment
- Replace matching values:
array[array < 0] = 0 - Cap values:
array[array > 100] = 100 - Clean data: replace errors with NaN or mean
- Apply tiered updates based on value ranges
2D Conditional Assignment
- Update column conditionally:
array[mask, col] = new_value - Modify all rows in column:
array[:, col] = values - Cap specific columns:
array[array[:, col] > 100, col] = 100
Adding Columns
np.column_stack((array, new_col))adds columnsnp.concatenate((array, new_col), axis=1)alternative method- Add calculated columns (revenue, totals, averages)
Key Patterns Reference
# Basic assignment
array[index] = value
array[start:end] = values
array[:, col] = values
# Boolean assignment
array[array < 0] = 0 # Fix negatives
array[array > 100] = 100 # Cap values
array[np.isnan(array)] = mean_value # Fill NaN
# 2D assignment
array[:, col] = array[:, col] * 1.1 # Update column
array[mask, col] = value # Conditional update
# Add columns
new_array = np.column_stack((array, new_col))Important Reminders
- Views vs Copies: Slices are views. Use
.copy()for independent copies - Boolean indexing creates a copy, doesn’t affect original
- Use
.astype(float)before assigning NaN values - Reshape 1D to column:
array.reshape(-1, 1) - Always verify shapes when combining arrays
Why This Matters
Data modification enables you to:
- Clean invalid or missing values
- Transform data for analysis
- Calculate derived metrics
- Prepare datasets for visualization
- Fix data quality issues
- Apply business rules to data
These skills are essential for real-world data analytics where raw data rarely arrives in perfect condition.
Conclusion: NumPy Fundamentals Complete
Congratulations! You have completed the NumPy Fundamentals module. You now possess a comprehensive set of skills for numerical computing and data manipulation:
What You Mastered:
- Creating and understanding NumPy arrays
- Loading real data from CSV files
- Selecting specific rows, columns, and subsets
- Performing vectorized calculations efficiently
- Filtering data with Boolean indexing
- Modifying and transforming datasets
Your Next Steps:
Continue to Pandas Data Analysis
Build on NumPy to work with labeled, tabular data using pandas DataFrames
Back to Module Overview
Review the complete NumPy Fundamentals module
You Are Now a NumPy Practitioner
The skills you learned in this module form the foundation of the entire Python data science ecosystem. NumPy arrays underpin pandas, scikit-learn, TensorFlow, and virtually every data science library you will encounter.
You can now:
- Work with numerical data efficiently
- Perform calculations on entire datasets without loops
- Query and filter data using Boolean conditions
- Clean and transform real-world datasets
- Prepare data for analysis and visualization
These capabilities make you ready for advanced data analytics. Continue your journey with pandas, where you will apply these NumPy concepts to labeled, tabular data with even more powerful features!