Lesson 3 - Selecting and Slicing Data
Extracting What You Need
Now that you can create and load 2D arrays, you need to know how to extract specific pieces of data from them. This lesson teaches you how to select rows, columns, and subsets—essential skills for data analysis.
By the end of this lesson, you will be able to:
- Select specific rows from 2D arrays
- Select specific columns from 2D arrays
- Extract individual elements using row and column coordinates
- Combine row and column selections to create 2D slices
- Apply common selection patterns used in data analysis
- Extract exactly the data you need for calculations
These selection techniques are fundamental. You will use them constantly when analyzing real datasets.
Understanding Our Sample Dataset
Let’s create a dataset to practice with. This array contains student scores across multiple subjects:
import numpy as np
# Create sample dataset for practice
# Rows: 5 students
# Columns: student_id, math, physics, chemistry, biology
students = np.array([
[101, 85, 90, 88, 92], # Student 1
[102, 92, 85, 91, 87], # Student 2
[103, 78, 82, 80, 85], # Student 3
[104, 88, 87, 92, 89], # Student 4
[105, 95, 89, 93, 91] # Student 5
])
print("Student Scores Dataset:")
print(students)
print(f"\nShape: {students.shape} (5 students, 5 columns)")Output:
Student Scores Dataset:
[[101 85 90 88 92]
[102 92 85 91 87]
[103 78 82 80 85]
[104 88 87 92 89]
[105 95 89 93 91]]
Shape: (5, 5) (5 students, 5 columns)Visual structure:
ID Math Physics Chem Bio
Row 0: 101 85 90 88 92
Row 1: 102 92 85 91 87
Row 2: 103 78 82 80 85
Row 3: 104 88 87 92 89
Row 4: 105 95 89 93 91Understanding this structure helps you know what data each selection will extract.
Selecting Rows
Single Row Selection
You select a single row using its index:
# Select first row (index 0)
first_student = students[0]
print("First student:")
print(first_student)
# Output: [101 85 90 88 92]
# Select third row (index 2)
third_student = students[2]
print("\nThird student:")
print(third_student)
# Output: [103 78 82 80 85]
# Select last row
last_student = students[-1]
print("\nLast student:")
print(last_student)
# Output: [105 95 89 93 91]When you select a single row, you get a 1D array containing all columns for that row.
Visual representation:
students[0] selects:
Row 0: [101, 85, 90, 88, 92] ← This entire row
Row 1: [102, 92, 85, 91, 87]
Row 2: [103, 78, 82, 80, 85]
Row 3: [104, 88, 87, 92, 89]
Row 4: [105, 95, 89, 93, 91]Multiple Consecutive Rows
To select multiple rows, use slicing:
# First 3 students (rows 0, 1, 2)
first_three = students[0:3]
print("First 3 students:")
print(first_three)
# Output:
# [[101 85 90 88 92]
# [102 92 85 91 87]
# [103 78 82 80 85]]
# Alternative: shortcut for "from start"
first_three = students[:3]
print("\nSame result (shortcut):")
print(first_three)Remember that the end index is not included: 0:3 means indices 0, 1, and 2.
# Rows 1-3 (indices 1, 2, 3)
middle_students = students[1:4]
print("Students 2-4:")
print(middle_students)
# Output:
# [[102 92 85 91 87]
# [103 78 82 80 85]
# [104 88 87 92 89]]
# Last 2 students
last_two = students[-2:]
print("\nLast 2 students:")
print(last_two)
# Output:
# [[104 88 87 92 89]
# [105 95 89 93 91]]Visual representation:
students[1:4] selects:
Row 0: [101, 85, 90, 88, 92]
Row 1: [102, 92, 85, 91, 87] ← Start here
Row 2: [103, 78, 82, 80, 85] ← Include
Row 3: [104, 88, 87, 92, 89] ← End here (included)
Row 4: [105, 95, 89, 93, 91] ← Not includedSelecting a Single Element
To get a specific element from a 2D array, provide both row and column indices:
# Syntax: array[row, column]
# Student 0, Math score (column 1)
math_score = students[0, 1]
print(f"First student's Math score: {math_score}")
# Output: 85
# Student 2, Chemistry score (column 3)
chem_score = students[2, 3]
print(f"Third student's Chemistry score: {chem_score}")
# Output: 80
# Last student, last subject
bio_score = students[-1, -1]
print(f"Last student's Biology score: {bio_score}")
# Output: 91Visual representation:
students[2, 3] selects:
Col0 Col1 Col2 Col3 Col4
Row 0: 101 85 90 88 92
Row 1: 102 92 85 91 87
Row 2: 103 78 82 [80] 85 ← Row 2, Column 3
Row 3: 104 88 87 92 89
Row 4: 105 95 89 93 91Selecting Columns
Single Column Selection
To select a column, you need all rows but only one column position. The syntax is array[:, column_index]:
# All student IDs (column 0)
student_ids = students[:, 0]
print("All student IDs:")
print(student_ids)
# Output: [101 102 103 104 105]
# All Math scores (column 1)
math_scores = students[:, 1]
print("\nAll Math scores:")
print(math_scores)
# Output: [85 92 78 88 95]
# All Chemistry scores (column 3)
chemistry_scores = students[:, 3]
print("\nAll Chemistry scores:")
print(chemistry_scores)
# Output: [88 91 80 92 93]The : means “all rows.” So students[:, 1] means “all rows, column 1.”
Visual representation:
students[:, 1] selects:
ID [Math] Physics Chem Bio
Row 0: 101 85 90 88 92
Row 1: 102 92 85 91 87
Row 2: 103 78 82 80 85
Row 3: 104 88 87 92 89
Row 4: 105 95 89 93 91
↑
This entire columnMultiple Consecutive Columns
Select multiple columns using slicing:
# First 3 columns (ID, Math, Physics)
first_three_cols = students[:, 0:3]
print("ID, Math, Physics:")
print(first_three_cols)
# Output:
# [[101 85 90]
# [102 92 85]
# [103 78 82]
# [104 88 87]
# [105 95 89]]
# Science scores only (Physics, Chemistry, Biology = columns 2, 3, 4)
science_scores = students[:, 2:5]
print("\nScience scores (Physics, Chemistry, Biology):")
print(science_scores)
# Output:
# [[90 88 92]
# [85 91 87]
# [82 80 85]
# [87 92 89]
# [89 93 91]]
# Last 2 columns
last_two_cols = students[:, -2:]
print("\nLast 2 subjects (Chemistry, Biology):")
print(last_two_cols)
# Output:
# [[88 92]
# [91 87]
# [80 85]
# [92 89]
# [93 91]]Non-Consecutive Columns
What if you want columns that are not next to each other? Use a list of column indices:
# Math and Chemistry only (columns 1 and 3)
math_and_chem = students[:, [1, 3]]
print("Math and Chemistry scores:")
print(math_and_chem)
# Output:
# [[85 88]
# [92 91]
# [78 80]
# [88 92]
# [95 93]]
# ID and Biology (columns 0 and 4)
id_and_bio = students[:, [0, 4]]
print("\nID and Biology:")
print(id_and_bio)
# Output:
# [[101 92]
# [102 87]
# [103 85]
# [104 89]
# [105 91]]
# Multiple non-consecutive columns
custom_cols = students[:, [0, 2, 4]] # ID, Physics, Biology
print("\nID, Physics, Biology:")
print(custom_cols)
# Output:
# [[101 90 92]
# [102 85 87]
# [103 82 85]
# [104 87 89]
# [105 89 91]]The list [1, 3] tells NumPy: “Give me columns 1 and 3, in that order.”
Visual Guide: Row vs Column Selection
Understanding the difference between row and column selection is crucial:
Original Array (students):
Col0 Col1 Col2 Col3 Col4
Row0: 101 85 90 88 92
Row1: 102 92 85 91 87
Row2: 103 78 82 80 85
Row3: 104 88 87 92 89
Row4: 105 95 89 93 91
students[2] → Row 2 (entire row)
[103, 78, 82, 80, 85]
students[:, 1] → Column 1 (all rows, column 1)
[85, 92, 78, 88, 95]
students[2, 1] → Single value (row 2, column 1)
78Two-Dimensional Slices
Combining Row and Column Selection
You can specify both row and column ranges to extract a rectangular subset:
# Syntax: array[row_start:row_end, col_start:col_end]
# First 3 students, first 3 columns
subset_1 = students[0:3, 0:3]
print("First 3 students, first 3 columns:")
print(subset_1)
# Output:
# [[101 85 90]
# [102 92 85]
# [103 78 82]]Visual representation:
students[0:3, 0:3] extracts:
[Col0 Col1 Col2] Col3 Col4
[Row0: 101 85 90] 88 92
[Row1: 102 92 85] 91 87
[Row2: 103 78 82] 80 85
Row3: 104 88 87 92 89
Row4: 105 95 89 93 91
The bracketed area is extracted.More examples:
# Students 2-4 (indices 1, 2, 3), Science scores (columns 2, 3, 4)
science_subset = students[1:4, 2:5]
print("Students 2-4, Science scores:")
print(science_subset)
# Output:
# [[85 91 87]
# [82 80 85]
# [87 92 89]]
# Last 2 students, last 3 subjects
last_corner = students[-2:, -3:]
print("Last 2 students, last 3 subjects:")
print(last_corner)
# Output:
# [[92 89 93]
# [93 91 95]]Mixed Selection: Specific Rows with Column Slice
You can combine single row selection with column slicing:
# First student only, all science subjects
student_science = students[0:1, 2:5]
print("First student's science scores:")
print(student_science)
# Output: [[90 88 92]]
# Note: This is still a 2D array (1 row, 3 columns)
# Alternative: single row (becomes 1D array)
student_science_1d = students[0, 2:5]
print("\nFirst student's science scores (1D):")
print(student_science_1d)
# Output: [90 88 92]The difference:
students[0:1, 2:5]returns a 2D array with shape (1, 3)students[0, 2:5]returns a 1D array with shape (3,)
Usually, you want the 1D version when selecting from a single row.
Practical Selection Patterns
Let’s apply these techniques to a realistic sales dataset:
# Create sales data: 7 days × 4 products
sales = np.array([
[120, 135, 98, 110], # Monday
[135, 142, 105, 118], # Tuesday
[150, 138, 112, 125], # Wednesday
[145, 155, 108, 130], # Thursday
[160, 148, 118, 135], # Friday
[175, 162, 125, 142], # Saturday
[190, 170, 132, 150] # Sunday
])
print("Weekly Sales Data:")
print(sales)
print(f"Shape: {sales.shape} (7 days, 4 products)")Pattern 1: Extracting Weekday vs Weekend Data
# Weekday sales only (Monday-Friday, rows 0-4)
weekday_sales = sales[0:5]
print("Weekday sales (Mon-Fri):")
print(weekday_sales)
# Output:
# [[120 135 98 110]
# [135 142 105 118]
# [150 138 112 125]
# [145 155 108 130]
# [160 148 118 135]]
# Weekend sales (Saturday-Sunday, rows 5-6)
weekend_sales = sales[5:7]
print("\nWeekend sales (Sat-Sun):")
print(weekend_sales)
# Output:
# [[175 162 125 142]
# [190 170 132 150]]Pattern 2: Extracting Product-Specific Data
# Product A sales all week (column 0)
product_a = sales[:, 0]
print("Product A (all days):")
print(product_a)
# Output: [120 135 150 145 160 175 190]
# Products C and D only (columns 2 and 3)
products_cd = sales[:, 2:4]
print("\nProducts C and D (all days):")
print(products_cd)
# Output:
# [[ 98 110]
# [105 118]
# [112 125]
# [108 130]
# [118 135]
# [125 142]
# [132 150]]Pattern 3: Combining Row and Column Filters
# Weekend sales for Products A and B only
weekend_ab = sales[5:7, 0:2]
print("Weekend sales (Products A & B):")
print(weekend_ab)
# Output:
# [[175 162]
# [190 170]]This extracts exactly the data you need—weekend days for two specific products.
Common Selection Patterns
Here are patterns you will use repeatedly in data analysis:
Pattern 1: First or Last N Rows
# First 3 rows, all columns
first_3_rows = students[:3]
print("First 3 rows:")
print(first_3_rows)
# Last 2 rows, all columns
last_2_rows = students[-2:]
print("\nLast 2 rows:")
print(last_2_rows)Pattern 2: All Rows, Specific Columns
# All rows, skip ID column (get just scores)
just_scores = students[:, 1:]
print("Just scores (no IDs):")
print(just_scores)
# Output:
# [[85 90 88 92]
# [92 85 91 87]
# [78 82 80 85]
# [88 87 92 89]
# [95 89 93 91]]Pattern 3: Excluding First or Last Rows/Columns
# Skip first row
without_first_row = students[1:]
print("Without first row:")
print(without_first_row)
# Skip last column
without_last_col = students[:, :-1]
print("\nWithout last column:")
print(without_last_col)
# Output:
# [[101 85 90 88]
# [102 92 85 91]
# [103 78 82 80]
# [104 88 87 92]
# [105 95 89 93]]Pattern 4: Every Nth Row or Column
# Every 2nd row (rows 0, 2, 4)
every_2nd_row = students[::2]
print("Every 2nd row:")
print(every_2nd_row)
# Output:
# [[101 85 90 88 92]
# [103 78 82 80 85]
# [105 95 89 93 91]]
# Every 2nd column (columns 0, 2, 4)
every_2nd_col = students[:, ::2]
print("\nEvery 2nd column:")
print(every_2nd_col)
# Output:
# [[101 90 92]
# [102 85 87]
# [103 82 85]
# [104 87 89]
# [105 89 91]]Real-World Application: Temperature Data
# Temperature data: 12 months × 4 cities
temps = np.array([
[10, 12, 15, 18], # Jan
[12, 14, 17, 20], # Feb
[15, 17, 20, 23], # Mar
[18, 20, 23, 26], # Apr
[22, 24, 27, 30], # May
[27, 29, 32, 35], # Jun
[30, 32, 35, 38], # Jul
[29, 31, 34, 37], # Aug
[25, 27, 30, 33], # Sep
[20, 22, 25, 28], # Oct
[15, 17, 20, 23], # Nov
[11, 13, 16, 19] # Dec
])
print("Temperature Data (12 months × 4 cities):")
print(temps)Extract quarterly data:
# Q1 temperatures (Jan-Mar)
q1_temps = temps[0:3]
print("Q1 Temperatures (Jan-Mar):")
print(q1_temps)
# Summer temperatures (Jun-Aug = indices 5-7)
summer_temps = temps[5:8]
print("\nSummer Temperatures (Jun-Aug):")
print(summer_temps)Extract city-specific data:
# City 1 all year (column 0)
city1_temps = temps[:, 0]
print("City 1 (all months):")
print(city1_temps)
# Output: [10 12 15 18 22 27 30 29 25 20 15 11]
# Cities 2 and 3 in summer
summer_cities_23 = temps[5:8, 1:3]
print("\nCities 2 & 3 in summer:")
print(summer_cities_23)
# Output:
# [[29 32]
# [32 35]
# [31 34]]Practice Exercises
Apply what you have learned with these exercises.
Exercise 1: Row Selection
Using the students array, select:
- First 2 students
- Students 3-5
- Last student
# Your code hereExercise 2: Column Selection
Using the students array, select:
- All Physics scores (column 2)
- Math and Chemistry scores (columns 1 and 3)
- All scores except ID (skip column 0)
# Your code hereHint
For non-consecutive columns, use a list: array[:, [1, 3]]
Exercise 3: 2D Slices
Using the sales array, extract:
- First 3 days, first 2 products
- Last 2 days, all products
- All days, Products B and C (columns 1 and 2)
# Your code hereSummary
You now have complete control over extracting data from NumPy arrays. Let’s review the essential concepts.
Key Concepts
Row Selection
array[0]selects a single rowarray[0:3]selects multiple consecutive rowsarray[-1]selects the last rowarray[:3]selects first 3 rows
Column Selection
array[:, 0]selects a single column (all rows)array[:, 1:4]selects multiple consecutive columnsarray[:, [0, 2, 4]]selects non-consecutive columns:means “all rows” or “all columns”
Single Element Selection
array[row, col]selects a specific elementarray[2, 3]selects row 2, column 3
Two-Dimensional Slices
array[1:4, 2:5]selects rows 1-3 and columns 2-4- Combine any row and column selections
- Creates rectangular subsets
Common Patterns
array[:3]- first 3 rowsarray[-2:]- last 2 rowsarray[:, 1:]- skip first columnarray[::2]- every 2nd rowarray[:, ::2]- every 2nd column
Selection Syntax Reference
array[row] # Single row (all columns)
array[:, col] # Single column (all rows)
array[row, col] # Single element
array[r1:r2] # Row slice
array[:, c1:c2] # Column slice
array[r1:r2, c1:c2] # 2D slice
array[:, [0, 2, 4]] # Non-consecutive columnsImportant Reminders
:means “all” (all rows or all columns)- End index is NOT included in slices:
0:3means 0, 1, 2 - Negative indices count from the end:
-1is last,-2is second to last - A single index gives you one dimension lower:
array[0]is 1D,array[0:1]is 2D
Why This Matters
Data analysis is all about extracting the right subset of data to answer specific questions:
- Compare weekday vs weekend performance
- Analyze specific product lines
- Extract quarterly or seasonal data
- Focus on specific features or measurements
The selection techniques you learned here enable all of these analyses. You can now pinpoint exactly the data you need.
Next Steps
You can now create arrays, load data, and extract specific subsets. In the next lesson, you will learn to perform calculations on these arrays using vectorized operations—the true power of NumPy.
Continue to Lesson 4 - Vector Operations
Learn arithmetic operations, broadcasting, and statistical calculations on arrays
Back to Lesson 2 - 2D Arrays and CSV Data
Review two-dimensional arrays and loading CSV files
Master Data Extraction
Selection and slicing are fundamental skills you will use in every data analysis project. With these techniques mastered, you are ready to move on to performing calculations and deriving insights from your data.
The ability to extract exactly the data you need is what separates beginners from proficient data analysts. You now have that ability!