Lesson 3 - Selecting and Slicing Data

Extracting What You Need

Now that you can create and load 2D arrays, you need to know how to extract specific pieces of data from them. This lesson teaches you how to select rows, columns, and subsets—essential skills for data analysis.

By the end of this lesson, you will be able to:

Select specific rows from 2D arrays
Select specific columns from 2D arrays
Extract individual elements using row and column coordinates
Combine row and column selections to create 2D slices
Apply common selection patterns used in data analysis
Extract exactly the data you need for calculations

These selection techniques are fundamental. You will use them constantly when analyzing real datasets.

Understanding Our Sample Dataset

Let’s create a dataset to practice with. This array contains student scores across multiple subjects:

import numpy as np

# Create sample dataset for practice
# Rows: 5 students
# Columns: student_id, math, physics, chemistry, biology
students = np.array([
    [101, 85, 90, 88, 92],  # Student 1
    [102, 92, 85, 91, 87],  # Student 2
    [103, 78, 82, 80, 85],  # Student 3
    [104, 88, 87, 92, 89],  # Student 4
    [105, 95, 89, 93, 91]   # Student 5
])

print("Student Scores Dataset:")
print(students)
print(f"\nShape: {students.shape} (5 students, 5 columns)")

Output:

Student Scores Dataset:
[[101  85  90  88  92]
 [102  92  85  91  87]
 [103  78  82  80  85]
 [104  88  87  92  89]
 [105  95  89  93  91]]

Shape: (5, 5) (5 students, 5 columns)

Visual structure:

        ID   Math  Physics  Chem  Bio
Row 0:  101   85     90      88    92
Row 1:  102   92     85      91    87
Row 2:  103   78     82      80    85
Row 3:  104   88     87      92    89
Row 4:  105   95     89      93    91

Understanding this structure helps you know what data each selection will extract.

Selecting Rows

Single Row Selection

You select a single row using its index:

# Select first row (index 0)
first_student = students[0]
print("First student:")
print(first_student)
# Output: [101  85  90  88  92]

# Select third row (index 2)
third_student = students[2]
print("\nThird student:")
print(third_student)
# Output: [103  78  82  80  85]

# Select last row
last_student = students[-1]
print("\nLast student:")
print(last_student)
# Output: [105  95  89  93  91]

When you select a single row, you get a 1D array containing all columns for that row.

Visual representation:

students[0] selects:

Row 0: [101, 85, 90, 88, 92]  ← This entire row
Row 1: [102, 92, 85, 91, 87]
Row 2: [103, 78, 82, 80, 85]
Row 3: [104, 88, 87, 92, 89]
Row 4: [105, 95, 89, 93, 91]

Multiple Consecutive Rows

To select multiple rows, use slicing:

# First 3 students (rows 0, 1, 2)
first_three = students[0:3]
print("First 3 students:")
print(first_three)

# Output:
# [[101  85  90  88  92]
#  [102  92  85  91  87]
#  [103  78  82  80  85]]

# Alternative: shortcut for "from start"
first_three = students[:3]
print("\nSame result (shortcut):")
print(first_three)

Remember that the end index is not included: 0:3 means indices 0, 1, and 2.

# Rows 1-3 (indices 1, 2, 3)
middle_students = students[1:4]
print("Students 2-4:")
print(middle_students)

# Output:
# [[102  92  85  91  87]
#  [103  78  82  80  85]
#  [104  88  87  92  89]]

# Last 2 students
last_two = students[-2:]
print("\nLast 2 students:")
print(last_two)

# Output:
# [[104  88  87  92  89]
#  [105  95  89  93  91]]

Visual representation:

students[1:4] selects:

Row 0: [101, 85, 90, 88, 92]
Row 1: [102, 92, 85, 91, 87]  ← Start here
Row 2: [103, 78, 82, 80, 85]  ← Include
Row 3: [104, 88, 87, 92, 89]  ← End here (included)
Row 4: [105, 95, 89, 93, 91]  ← Not included

Selecting a Single Element

To get a specific element from a 2D array, provide both row and column indices:

# Syntax: array[row, column]

# Student 0, Math score (column 1)
math_score = students[0, 1]
print(f"First student's Math score: {math_score}")
# Output: 85

# Student 2, Chemistry score (column 3)
chem_score = students[2, 3]
print(f"Third student's Chemistry score: {chem_score}")
# Output: 80

# Last student, last subject
bio_score = students[-1, -1]
print(f"Last student's Biology score: {bio_score}")
# Output: 91

Visual representation:

students[2, 3] selects:

        Col0  Col1  Col2  Col3  Col4
Row 0:  101   85    90    88    92
Row 1:  102   92    85    91    87
Row 2:  103   78    82   [80]   85   ← Row 2, Column 3
Row 3:  104   88    87    92    89
Row 4:  105   95    89    93    91

Selecting Columns

Single Column Selection

To select a column, you need all rows but only one column position. The syntax is array[:, column_index]:

# All student IDs (column 0)
student_ids = students[:, 0]
print("All student IDs:")
print(student_ids)
# Output: [101 102 103 104 105]

# All Math scores (column 1)
math_scores = students[:, 1]
print("\nAll Math scores:")
print(math_scores)
# Output: [85 92 78 88 95]

# All Chemistry scores (column 3)
chemistry_scores = students[:, 3]
print("\nAll Chemistry scores:")
print(chemistry_scores)
# Output: [88 91 80 92 93]

The : means “all rows.” So students[:, 1] means “all rows, column 1.”

Visual representation:

students[:, 1] selects:

        ID  [Math] Physics  Chem  Bio
Row 0:  101   85     90      88    92
Row 1:  102   92     85      91    87
Row 2:  103   78     82      80    85
Row 3:  104   88     87      92    89
Row 4:  105   95     89      93    91
              ↑
        This entire column

Multiple Consecutive Columns

Select multiple columns using slicing:

# First 3 columns (ID, Math, Physics)
first_three_cols = students[:, 0:3]
print("ID, Math, Physics:")
print(first_three_cols)

# Output:
# [[101  85  90]
#  [102  92  85]
#  [103  78  82]
#  [104  88  87]
#  [105  95  89]]

# Science scores only (Physics, Chemistry, Biology = columns 2, 3, 4)
science_scores = students[:, 2:5]
print("\nScience scores (Physics, Chemistry, Biology):")
print(science_scores)

# Output:
# [[90 88 92]
#  [85 91 87]
#  [82 80 85]
#  [87 92 89]
#  [89 93 91]]

# Last 2 columns
last_two_cols = students[:, -2:]
print("\nLast 2 subjects (Chemistry, Biology):")
print(last_two_cols)

# Output:
# [[88 92]
#  [91 87]
#  [80 85]
#  [92 89]
#  [93 91]]

Non-Consecutive Columns

What if you want columns that are not next to each other? Use a list of column indices:

# Math and Chemistry only (columns 1 and 3)
math_and_chem = students[:, [1, 3]]
print("Math and Chemistry scores:")
print(math_and_chem)

# Output:
# [[85 88]
#  [92 91]
#  [78 80]
#  [88 92]
#  [95 93]]

# ID and Biology (columns 0 and 4)
id_and_bio = students[:, [0, 4]]
print("\nID and Biology:")
print(id_and_bio)

# Output:
# [[101  92]
#  [102  87]
#  [103  85]
#  [104  89]
#  [105  91]]

# Multiple non-consecutive columns
custom_cols = students[:, [0, 2, 4]]  # ID, Physics, Biology
print("\nID, Physics, Biology:")
print(custom_cols)

# Output:
# [[101  90  92]
#  [102  85  87]
#  [103  82  85]
#  [104  87  89]
#  [105  89  91]]

The list [1, 3] tells NumPy: “Give me columns 1 and 3, in that order.”

Visual Guide: Row vs Column Selection

Understanding the difference between row and column selection is crucial:

Original Array (students):
       Col0  Col1  Col2  Col3  Col4
Row0:  101    85    90    88    92
Row1:  102    92    85    91    87
Row2:  103    78    82    80    85
Row3:  104    88    87    92    89
Row4:  105    95    89    93    91

students[2]        → Row 2 (entire row)
                     [103, 78, 82, 80, 85]

students[:, 1]     → Column 1 (all rows, column 1)
                     [85, 92, 78, 88, 95]

students[2, 1]     → Single value (row 2, column 1)
                     78

Two-Dimensional Slices

Combining Row and Column Selection

You can specify both row and column ranges to extract a rectangular subset:

# Syntax: array[row_start:row_end, col_start:col_end]

# First 3 students, first 3 columns
subset_1 = students[0:3, 0:3]
print("First 3 students, first 3 columns:")
print(subset_1)

# Output:
# [[101  85  90]
#  [102  92  85]
#  [103  78  82]]

Visual representation:

students[0:3, 0:3] extracts:

       [Col0  Col1  Col2] Col3  Col4
[Row0:  101    85    90]   88    92
[Row1:  102    92    85]   91    87
[Row2:  103    78    82]   80    85
 Row3:  104    88    87    92    89
 Row4:  105    95    89    93    91

The bracketed area is extracted.

More examples:

# Students 2-4 (indices 1, 2, 3), Science scores (columns 2, 3, 4)
science_subset = students[1:4, 2:5]
print("Students 2-4, Science scores:")
print(science_subset)

# Output:
# [[85 91 87]
#  [82 80 85]
#  [87 92 89]]

# Last 2 students, last 3 subjects
last_corner = students[-2:, -3:]
print("Last 2 students, last 3 subjects:")
print(last_corner)

# Output:
# [[92 89 93]
#  [93 91 95]]

Mixed Selection: Specific Rows with Column Slice

You can combine single row selection with column slicing:

# First student only, all science subjects
student_science = students[0:1, 2:5]
print("First student's science scores:")
print(student_science)
# Output: [[90 88 92]]
# Note: This is still a 2D array (1 row, 3 columns)

# Alternative: single row (becomes 1D array)
student_science_1d = students[0, 2:5]
print("\nFirst student's science scores (1D):")
print(student_science_1d)
# Output: [90 88 92]

The difference:

students[0:1, 2:5] returns a 2D array with shape (1, 3)
students[0, 2:5] returns a 1D array with shape (3,)

Usually, you want the 1D version when selecting from a single row.

Practical Selection Patterns

Let’s apply these techniques to a realistic sales dataset:

# Create sales data: 7 days × 4 products
sales = np.array([
    [120, 135, 98, 110],   # Monday
    [135, 142, 105, 118],  # Tuesday
    [150, 138, 112, 125],  # Wednesday
    [145, 155, 108, 130],  # Thursday
    [160, 148, 118, 135],  # Friday
    [175, 162, 125, 142],  # Saturday
    [190, 170, 132, 150]   # Sunday
])

print("Weekly Sales Data:")
print(sales)
print(f"Shape: {sales.shape} (7 days, 4 products)")

Pattern 1: Extracting Weekday vs Weekend Data

# Weekday sales only (Monday-Friday, rows 0-4)
weekday_sales = sales[0:5]
print("Weekday sales (Mon-Fri):")
print(weekday_sales)

# Output:
# [[120 135  98 110]
#  [135 142 105 118]
#  [150 138 112 125]
#  [145 155 108 130]
#  [160 148 118 135]]

# Weekend sales (Saturday-Sunday, rows 5-6)
weekend_sales = sales[5:7]
print("\nWeekend sales (Sat-Sun):")
print(weekend_sales)

# Output:
# [[175 162 125 142]
#  [190 170 132 150]]

Pattern 2: Extracting Product-Specific Data

# Product A sales all week (column 0)
product_a = sales[:, 0]
print("Product A (all days):")
print(product_a)
# Output: [120 135 150 145 160 175 190]

# Products C and D only (columns 2 and 3)
products_cd = sales[:, 2:4]
print("\nProducts C and D (all days):")
print(products_cd)

# Output:
# [[ 98 110]
#  [105 118]
#  [112 125]
#  [108 130]
#  [118 135]
#  [125 142]
#  [132 150]]

Pattern 3: Combining Row and Column Filters

# Weekend sales for Products A and B only
weekend_ab = sales[5:7, 0:2]
print("Weekend sales (Products A & B):")
print(weekend_ab)

# Output:
# [[175 162]
#  [190 170]]

This extracts exactly the data you need—weekend days for two specific products.

Common Selection Patterns

Here are patterns you will use repeatedly in data analysis:

Pattern 1: First or Last N Rows

# First 3 rows, all columns
first_3_rows = students[:3]
print("First 3 rows:")
print(first_3_rows)

# Last 2 rows, all columns
last_2_rows = students[-2:]
print("\nLast 2 rows:")
print(last_2_rows)

Pattern 2: All Rows, Specific Columns

# All rows, skip ID column (get just scores)
just_scores = students[:, 1:]
print("Just scores (no IDs):")
print(just_scores)

# Output:
# [[85 90 88 92]
#  [92 85 91 87]
#  [78 82 80 85]
#  [88 87 92 89]
#  [95 89 93 91]]

Pattern 3: Excluding First or Last Rows/Columns

# Skip first row
without_first_row = students[1:]
print("Without first row:")
print(without_first_row)

# Skip last column
without_last_col = students[:, :-1]
print("\nWithout last column:")
print(without_last_col)

# Output:
# [[101  85  90  88]
#  [102  92  85  91]
#  [103  78  82  80]
#  [104  88  87  92]
#  [105  95  89  93]]

Pattern 4: Every Nth Row or Column

# Every 2nd row (rows 0, 2, 4)
every_2nd_row = students[::2]
print("Every 2nd row:")
print(every_2nd_row)

# Output:
# [[101  85  90  88  92]
#  [103  78  82  80  85]
#  [105  95  89  93  91]]

# Every 2nd column (columns 0, 2, 4)
every_2nd_col = students[:, ::2]
print("\nEvery 2nd column:")
print(every_2nd_col)

# Output:
# [[101  90  92]
#  [102  85  87]
#  [103  82  85]
#  [104  87  89]
#  [105  89  91]]

Real-World Application: Temperature Data

# Temperature data: 12 months × 4 cities
temps = np.array([
    [10, 12, 15, 18],  # Jan
    [12, 14, 17, 20],  # Feb
    [15, 17, 20, 23],  # Mar
    [18, 20, 23, 26],  # Apr
    [22, 24, 27, 30],  # May
    [27, 29, 32, 35],  # Jun
    [30, 32, 35, 38],  # Jul
    [29, 31, 34, 37],  # Aug
    [25, 27, 30, 33],  # Sep
    [20, 22, 25, 28],  # Oct
    [15, 17, 20, 23],  # Nov
    [11, 13, 16, 19]   # Dec
])

print("Temperature Data (12 months × 4 cities):")
print(temps)

Extract quarterly data:

# Q1 temperatures (Jan-Mar)
q1_temps = temps[0:3]
print("Q1 Temperatures (Jan-Mar):")
print(q1_temps)

# Summer temperatures (Jun-Aug = indices 5-7)
summer_temps = temps[5:8]
print("\nSummer Temperatures (Jun-Aug):")
print(summer_temps)

Extract city-specific data:

# City 1 all year (column 0)
city1_temps = temps[:, 0]
print("City 1 (all months):")
print(city1_temps)
# Output: [10 12 15 18 22 27 30 29 25 20 15 11]

# Cities 2 and 3 in summer
summer_cities_23 = temps[5:8, 1:3]
print("\nCities 2 & 3 in summer:")
print(summer_cities_23)

# Output:
# [[29 32]
#  [32 35]
#  [31 34]]

Practice Exercises

Apply what you have learned with these exercises.

Exercise 1: Row Selection

Using the students array, select:

First 2 students
Students 3-5
Last student

# Your code here

Exercise 2: Column Selection

Using the students array, select:

All Physics scores (column 2)
Math and Chemistry scores (columns 1 and 3)
All scores except ID (skip column 0)

# Your code here

Hint

For non-consecutive columns, use a list: array[:, [1, 3]]

Exercise 3: 2D Slices

Using the sales array, extract:

First 3 days, first 2 products
Last 2 days, all products
All days, Products B and C (columns 1 and 2)

# Your code here

Summary

You now have complete control over extracting data from NumPy arrays. Let’s review the essential concepts.

Key Concepts

Row Selection

array[0] selects a single row
array[0:3] selects multiple consecutive rows
array[-1] selects the last row
array[:3] selects first 3 rows

Column Selection

array[:, 0] selects a single column (all rows)
array[:, 1:4] selects multiple consecutive columns
array[:, [0, 2, 4]] selects non-consecutive columns
: means “all rows” or “all columns”

Single Element Selection

array[row, col] selects a specific element
array[2, 3] selects row 2, column 3

Two-Dimensional Slices

array[1:4, 2:5] selects rows 1-3 and columns 2-4
Combine any row and column selections
Creates rectangular subsets

Common Patterns

array[:3] - first 3 rows
array[-2:] - last 2 rows
array[:, 1:] - skip first column
array[::2] - every 2nd row
array[:, ::2] - every 2nd column

Selection Syntax Reference

array[row]              # Single row (all columns)
array[:, col]           # Single column (all rows)
array[row, col]         # Single element
array[r1:r2]            # Row slice
array[:, c1:c2]         # Column slice
array[r1:r2, c1:c2]     # 2D slice
array[:, [0, 2, 4]]     # Non-consecutive columns

Important Reminders

: means “all” (all rows or all columns)
End index is NOT included in slices: 0:3 means 0, 1, 2
Negative indices count from the end: -1 is last, -2 is second to last
A single index gives you one dimension lower: array[0] is 1D, array[0:1] is 2D

Why This Matters

Data analysis is all about extracting the right subset of data to answer specific questions:

Compare weekday vs weekend performance
Analyze specific product lines
Extract quarterly or seasonal data
Focus on specific features or measurements

The selection techniques you learned here enable all of these analyses. You can now pinpoint exactly the data you need.

Next Steps

You can now create arrays, load data, and extract specific subsets. In the next lesson, you will learn to perform calculations on these arrays using vectorized operations—the true power of NumPy.

Continue to Lesson 4 - Vector Operations

Learn arithmetic operations, broadcasting, and statistical calculations on arrays

Back to Lesson 2 - 2D Arrays and CSV Data

Review two-dimensional arrays and loading CSV files

Master Data Extraction

Selection and slicing are fundamental skills you will use in every data analysis project. With these techniques mastered, you are ready to move on to performing calculations and deriving insights from your data.

The ability to extract exactly the data you need is what separates beginners from proficient data analysts. You now have that ability!

Lesson 2 - 2D Arrays and Working with CSV Data

Lesson 4 - Vector Operations and Calculations

Lesson 3 - Selecting and Slicing Data

Extracting What You Need#

Understanding Our Sample Dataset#

Selecting Rows#

Single Row Selection#

Multiple Consecutive Rows#

Selecting a Single Element#

Selecting Columns#

Single Column Selection#

Multiple Consecutive Columns#

Non-Consecutive Columns#

Visual Guide: Row vs Column Selection#

Two-Dimensional Slices#

Combining Row and Column Selection#

Mixed Selection: Specific Rows with Column Slice#

Practical Selection Patterns#

Pattern 1: Extracting Weekday vs Weekend Data#

Pattern 2: Extracting Product-Specific Data#

Pattern 3: Combining Row and Column Filters#

Common Selection Patterns#

Pattern 1: First or Last N Rows#

Pattern 2: All Rows, Specific Columns#

Pattern 3: Excluding First or Last Rows/Columns#

Pattern 4: Every Nth Row or Column#

Real-World Application: Temperature Data#

Practice Exercises#

Exercise 1: Row Selection#

Exercise 2: Column Selection#

Exercise 3: 2D Slices#

Summary#

Key Concepts#

Selection Syntax Reference#

Important Reminders#

Why This Matters#

Next Steps#

Continue to Lesson 4 - Vector Operations

Back to Lesson 2 - 2D Arrays and CSV Data

Master Data Extraction#

Extracting What You Need

Understanding Our Sample Dataset

Selecting Rows

Single Row Selection

Multiple Consecutive Rows

Selecting a Single Element

Selecting Columns

Single Column Selection

Multiple Consecutive Columns

Non-Consecutive Columns

Visual Guide: Row vs Column Selection

Two-Dimensional Slices

Combining Row and Column Selection

Mixed Selection: Specific Rows with Column Slice

Practical Selection Patterns

Pattern 1: Extracting Weekday vs Weekend Data

Pattern 2: Extracting Product-Specific Data

Pattern 3: Combining Row and Column Filters

Common Selection Patterns

Pattern 1: First or Last N Rows

Pattern 2: All Rows, Specific Columns

Pattern 3: Excluding First or Last Rows/Columns

Pattern 4: Every Nth Row or Column

Real-World Application: Temperature Data

Practice Exercises

Exercise 1: Row Selection

Exercise 2: Column Selection

Exercise 3: 2D Slices

Summary

Key Concepts

Selection Syntax Reference

Important Reminders

Why This Matters

Next Steps

Master Data Extraction