Lesson 3 - Selecting and Slicing Data

Extracting What You Need

Now that you can create and load 2D arrays, you need to know how to extract specific pieces of data from them. This lesson teaches you how to select rows, columns, and subsets—essential skills for data analysis.

By the end of this lesson, you will be able to:

  • Select specific rows from 2D arrays
  • Select specific columns from 2D arrays
  • Extract individual elements using row and column coordinates
  • Combine row and column selections to create 2D slices
  • Apply common selection patterns used in data analysis
  • Extract exactly the data you need for calculations

These selection techniques are fundamental. You will use them constantly when analyzing real datasets.


Understanding Our Sample Dataset

Let’s create a dataset to practice with. This array contains student scores across multiple subjects:

import numpy as np

# Create sample dataset for practice
# Rows: 5 students
# Columns: student_id, math, physics, chemistry, biology
students = np.array([
    [101, 85, 90, 88, 92],  # Student 1
    [102, 92, 85, 91, 87],  # Student 2
    [103, 78, 82, 80, 85],  # Student 3
    [104, 88, 87, 92, 89],  # Student 4
    [105, 95, 89, 93, 91]   # Student 5
])

print("Student Scores Dataset:")
print(students)
print(f"\nShape: {students.shape} (5 students, 5 columns)")

Output:

Student Scores Dataset:
[[101  85  90  88  92]
 [102  92  85  91  87]
 [103  78  82  80  85]
 [104  88  87  92  89]
 [105  95  89  93  91]]

Shape: (5, 5) (5 students, 5 columns)

Visual structure:

        ID   Math  Physics  Chem  Bio
Row 0:  101   85     90      88    92
Row 1:  102   92     85      91    87
Row 2:  103   78     82      80    85
Row 3:  104   88     87      92    89
Row 4:  105   95     89      93    91

Understanding this structure helps you know what data each selection will extract.


Selecting Rows

Single Row Selection

You select a single row using its index:

# Select first row (index 0)
first_student = students[0]
print("First student:")
print(first_student)
# Output: [101  85  90  88  92]

# Select third row (index 2)
third_student = students[2]
print("\nThird student:")
print(third_student)
# Output: [103  78  82  80  85]

# Select last row
last_student = students[-1]
print("\nLast student:")
print(last_student)
# Output: [105  95  89  93  91]

When you select a single row, you get a 1D array containing all columns for that row.

Visual representation:

students[0] selects:

Row 0: [101, 85, 90, 88, 92]  ← This entire row
Row 1: [102, 92, 85, 91, 87]
Row 2: [103, 78, 82, 80, 85]
Row 3: [104, 88, 87, 92, 89]
Row 4: [105, 95, 89, 93, 91]

Multiple Consecutive Rows

To select multiple rows, use slicing:

# First 3 students (rows 0, 1, 2)
first_three = students[0:3]
print("First 3 students:")
print(first_three)

# Output:
# [[101  85  90  88  92]
#  [102  92  85  91  87]
#  [103  78  82  80  85]]

# Alternative: shortcut for "from start"
first_three = students[:3]
print("\nSame result (shortcut):")
print(first_three)

Remember that the end index is not included: 0:3 means indices 0, 1, and 2.

# Rows 1-3 (indices 1, 2, 3)
middle_students = students[1:4]
print("Students 2-4:")
print(middle_students)

# Output:
# [[102  92  85  91  87]
#  [103  78  82  80  85]
#  [104  88  87  92  89]]

# Last 2 students
last_two = students[-2:]
print("\nLast 2 students:")
print(last_two)

# Output:
# [[104  88  87  92  89]
#  [105  95  89  93  91]]

Visual representation:

students[1:4] selects:

Row 0: [101, 85, 90, 88, 92]
Row 1: [102, 92, 85, 91, 87]  ← Start here
Row 2: [103, 78, 82, 80, 85]  ← Include
Row 3: [104, 88, 87, 92, 89]  ← End here (included)
Row 4: [105, 95, 89, 93, 91]  ← Not included

Selecting a Single Element

To get a specific element from a 2D array, provide both row and column indices:

# Syntax: array[row, column]

# Student 0, Math score (column 1)
math_score = students[0, 1]
print(f"First student's Math score: {math_score}")
# Output: 85

# Student 2, Chemistry score (column 3)
chem_score = students[2, 3]
print(f"Third student's Chemistry score: {chem_score}")
# Output: 80

# Last student, last subject
bio_score = students[-1, -1]
print(f"Last student's Biology score: {bio_score}")
# Output: 91

Visual representation:

students[2, 3] selects:

        Col0  Col1  Col2  Col3  Col4
Row 0:  101   85    90    88    92
Row 1:  102   92    85    91    87
Row 2:  103   78    82   [80]   85   ← Row 2, Column 3
Row 3:  104   88    87    92    89
Row 4:  105   95    89    93    91

Selecting Columns

Single Column Selection

To select a column, you need all rows but only one column position. The syntax is array[:, column_index]:

# All student IDs (column 0)
student_ids = students[:, 0]
print("All student IDs:")
print(student_ids)
# Output: [101 102 103 104 105]

# All Math scores (column 1)
math_scores = students[:, 1]
print("\nAll Math scores:")
print(math_scores)
# Output: [85 92 78 88 95]

# All Chemistry scores (column 3)
chemistry_scores = students[:, 3]
print("\nAll Chemistry scores:")
print(chemistry_scores)
# Output: [88 91 80 92 93]

The : means “all rows.” So students[:, 1] means “all rows, column 1.”

Visual representation:

students[:, 1] selects:

        ID  [Math] Physics  Chem  Bio
Row 0:  101   85     90      88    92
Row 1:  102   92     85      91    87
Row 2:  103   78     82      80    85
Row 3:  104   88     87      92    89
Row 4:  105   95     89      93    91
              ↑
        This entire column

Multiple Consecutive Columns

Select multiple columns using slicing:

# First 3 columns (ID, Math, Physics)
first_three_cols = students[:, 0:3]
print("ID, Math, Physics:")
print(first_three_cols)

# Output:
# [[101  85  90]
#  [102  92  85]
#  [103  78  82]
#  [104  88  87]
#  [105  95  89]]

# Science scores only (Physics, Chemistry, Biology = columns 2, 3, 4)
science_scores = students[:, 2:5]
print("\nScience scores (Physics, Chemistry, Biology):")
print(science_scores)

# Output:
# [[90 88 92]
#  [85 91 87]
#  [82 80 85]
#  [87 92 89]
#  [89 93 91]]

# Last 2 columns
last_two_cols = students[:, -2:]
print("\nLast 2 subjects (Chemistry, Biology):")
print(last_two_cols)

# Output:
# [[88 92]
#  [91 87]
#  [80 85]
#  [92 89]
#  [93 91]]

Non-Consecutive Columns

What if you want columns that are not next to each other? Use a list of column indices:

# Math and Chemistry only (columns 1 and 3)
math_and_chem = students[:, [1, 3]]
print("Math and Chemistry scores:")
print(math_and_chem)

# Output:
# [[85 88]
#  [92 91]
#  [78 80]
#  [88 92]
#  [95 93]]

# ID and Biology (columns 0 and 4)
id_and_bio = students[:, [0, 4]]
print("\nID and Biology:")
print(id_and_bio)

# Output:
# [[101  92]
#  [102  87]
#  [103  85]
#  [104  89]
#  [105  91]]

# Multiple non-consecutive columns
custom_cols = students[:, [0, 2, 4]]  # ID, Physics, Biology
print("\nID, Physics, Biology:")
print(custom_cols)

# Output:
# [[101  90  92]
#  [102  85  87]
#  [103  82  85]
#  [104  87  89]
#  [105  89  91]]

The list [1, 3] tells NumPy: “Give me columns 1 and 3, in that order.”

Visual Guide: Row vs Column Selection

Understanding the difference between row and column selection is crucial:

Original Array (students):
       Col0  Col1  Col2  Col3  Col4
Row0:  101    85    90    88    92
Row1:  102    92    85    91    87
Row2:  103    78    82    80    85
Row3:  104    88    87    92    89
Row4:  105    95    89    93    91

students[2]        → Row 2 (entire row)
                     [103, 78, 82, 80, 85]

students[:, 1]     → Column 1 (all rows, column 1)
                     [85, 92, 78, 88, 95]

students[2, 1]     → Single value (row 2, column 1)
                     78

Two-Dimensional Slices

Combining Row and Column Selection

You can specify both row and column ranges to extract a rectangular subset:

# Syntax: array[row_start:row_end, col_start:col_end]

# First 3 students, first 3 columns
subset_1 = students[0:3, 0:3]
print("First 3 students, first 3 columns:")
print(subset_1)

# Output:
# [[101  85  90]
#  [102  92  85]
#  [103  78  82]]

Visual representation:

students[0:3, 0:3] extracts:

       [Col0  Col1  Col2] Col3  Col4
[Row0:  101    85    90]   88    92
[Row1:  102    92    85]   91    87
[Row2:  103    78    82]   80    85
 Row3:  104    88    87    92    89
 Row4:  105    95    89    93    91

The bracketed area is extracted.

More examples:

# Students 2-4 (indices 1, 2, 3), Science scores (columns 2, 3, 4)
science_subset = students[1:4, 2:5]
print("Students 2-4, Science scores:")
print(science_subset)

# Output:
# [[85 91 87]
#  [82 80 85]
#  [87 92 89]]

# Last 2 students, last 3 subjects
last_corner = students[-2:, -3:]
print("Last 2 students, last 3 subjects:")
print(last_corner)

# Output:
# [[92 89 93]
#  [93 91 95]]

Mixed Selection: Specific Rows with Column Slice

You can combine single row selection with column slicing:

# First student only, all science subjects
student_science = students[0:1, 2:5]
print("First student's science scores:")
print(student_science)
# Output: [[90 88 92]]
# Note: This is still a 2D array (1 row, 3 columns)

# Alternative: single row (becomes 1D array)
student_science_1d = students[0, 2:5]
print("\nFirst student's science scores (1D):")
print(student_science_1d)
# Output: [90 88 92]

The difference:

  • students[0:1, 2:5] returns a 2D array with shape (1, 3)
  • students[0, 2:5] returns a 1D array with shape (3,)

Usually, you want the 1D version when selecting from a single row.


Practical Selection Patterns

Let’s apply these techniques to a realistic sales dataset:

# Create sales data: 7 days × 4 products
sales = np.array([
    [120, 135, 98, 110],   # Monday
    [135, 142, 105, 118],  # Tuesday
    [150, 138, 112, 125],  # Wednesday
    [145, 155, 108, 130],  # Thursday
    [160, 148, 118, 135],  # Friday
    [175, 162, 125, 142],  # Saturday
    [190, 170, 132, 150]   # Sunday
])

print("Weekly Sales Data:")
print(sales)
print(f"Shape: {sales.shape} (7 days, 4 products)")

Pattern 1: Extracting Weekday vs Weekend Data

# Weekday sales only (Monday-Friday, rows 0-4)
weekday_sales = sales[0:5]
print("Weekday sales (Mon-Fri):")
print(weekday_sales)

# Output:
# [[120 135  98 110]
#  [135 142 105 118]
#  [150 138 112 125]
#  [145 155 108 130]
#  [160 148 118 135]]

# Weekend sales (Saturday-Sunday, rows 5-6)
weekend_sales = sales[5:7]
print("\nWeekend sales (Sat-Sun):")
print(weekend_sales)

# Output:
# [[175 162 125 142]
#  [190 170 132 150]]

Pattern 2: Extracting Product-Specific Data

# Product A sales all week (column 0)
product_a = sales[:, 0]
print("Product A (all days):")
print(product_a)
# Output: [120 135 150 145 160 175 190]

# Products C and D only (columns 2 and 3)
products_cd = sales[:, 2:4]
print("\nProducts C and D (all days):")
print(products_cd)

# Output:
# [[ 98 110]
#  [105 118]
#  [112 125]
#  [108 130]
#  [118 135]
#  [125 142]
#  [132 150]]

Pattern 3: Combining Row and Column Filters

# Weekend sales for Products A and B only
weekend_ab = sales[5:7, 0:2]
print("Weekend sales (Products A & B):")
print(weekend_ab)

# Output:
# [[175 162]
#  [190 170]]

This extracts exactly the data you need—weekend days for two specific products.


Common Selection Patterns

Here are patterns you will use repeatedly in data analysis:

Pattern 1: First or Last N Rows

# First 3 rows, all columns
first_3_rows = students[:3]
print("First 3 rows:")
print(first_3_rows)

# Last 2 rows, all columns
last_2_rows = students[-2:]
print("\nLast 2 rows:")
print(last_2_rows)

Pattern 2: All Rows, Specific Columns

# All rows, skip ID column (get just scores)
just_scores = students[:, 1:]
print("Just scores (no IDs):")
print(just_scores)

# Output:
# [[85 90 88 92]
#  [92 85 91 87]
#  [78 82 80 85]
#  [88 87 92 89]
#  [95 89 93 91]]

Pattern 3: Excluding First or Last Rows/Columns

# Skip first row
without_first_row = students[1:]
print("Without first row:")
print(without_first_row)

# Skip last column
without_last_col = students[:, :-1]
print("\nWithout last column:")
print(without_last_col)

# Output:
# [[101  85  90  88]
#  [102  92  85  91]
#  [103  78  82  80]
#  [104  88  87  92]
#  [105  95  89  93]]

Pattern 4: Every Nth Row or Column

# Every 2nd row (rows 0, 2, 4)
every_2nd_row = students[::2]
print("Every 2nd row:")
print(every_2nd_row)

# Output:
# [[101  85  90  88  92]
#  [103  78  82  80  85]
#  [105  95  89  93  91]]

# Every 2nd column (columns 0, 2, 4)
every_2nd_col = students[:, ::2]
print("\nEvery 2nd column:")
print(every_2nd_col)

# Output:
# [[101  90  92]
#  [102  85  87]
#  [103  82  85]
#  [104  87  89]
#  [105  89  91]]

Real-World Application: Temperature Data

# Temperature data: 12 months × 4 cities
temps = np.array([
    [10, 12, 15, 18],  # Jan
    [12, 14, 17, 20],  # Feb
    [15, 17, 20, 23],  # Mar
    [18, 20, 23, 26],  # Apr
    [22, 24, 27, 30],  # May
    [27, 29, 32, 35],  # Jun
    [30, 32, 35, 38],  # Jul
    [29, 31, 34, 37],  # Aug
    [25, 27, 30, 33],  # Sep
    [20, 22, 25, 28],  # Oct
    [15, 17, 20, 23],  # Nov
    [11, 13, 16, 19]   # Dec
])

print("Temperature Data (12 months × 4 cities):")
print(temps)

Extract quarterly data:

# Q1 temperatures (Jan-Mar)
q1_temps = temps[0:3]
print("Q1 Temperatures (Jan-Mar):")
print(q1_temps)

# Summer temperatures (Jun-Aug = indices 5-7)
summer_temps = temps[5:8]
print("\nSummer Temperatures (Jun-Aug):")
print(summer_temps)

Extract city-specific data:

# City 1 all year (column 0)
city1_temps = temps[:, 0]
print("City 1 (all months):")
print(city1_temps)
# Output: [10 12 15 18 22 27 30 29 25 20 15 11]

# Cities 2 and 3 in summer
summer_cities_23 = temps[5:8, 1:3]
print("\nCities 2 & 3 in summer:")
print(summer_cities_23)

# Output:
# [[29 32]
#  [32 35]
#  [31 34]]

Practice Exercises

Apply what you have learned with these exercises.

Exercise 1: Row Selection

Using the students array, select:

  • First 2 students
  • Students 3-5
  • Last student
# Your code here

Exercise 2: Column Selection

Using the students array, select:

  • All Physics scores (column 2)
  • Math and Chemistry scores (columns 1 and 3)
  • All scores except ID (skip column 0)
# Your code here

Hint

For non-consecutive columns, use a list: array[:, [1, 3]]

Exercise 3: 2D Slices

Using the sales array, extract:

  • First 3 days, first 2 products
  • Last 2 days, all products
  • All days, Products B and C (columns 1 and 2)
# Your code here

Summary

You now have complete control over extracting data from NumPy arrays. Let’s review the essential concepts.

Key Concepts

Row Selection

  • array[0] selects a single row
  • array[0:3] selects multiple consecutive rows
  • array[-1] selects the last row
  • array[:3] selects first 3 rows

Column Selection

  • array[:, 0] selects a single column (all rows)
  • array[:, 1:4] selects multiple consecutive columns
  • array[:, [0, 2, 4]] selects non-consecutive columns
  • : means “all rows” or “all columns”

Single Element Selection

  • array[row, col] selects a specific element
  • array[2, 3] selects row 2, column 3

Two-Dimensional Slices

  • array[1:4, 2:5] selects rows 1-3 and columns 2-4
  • Combine any row and column selections
  • Creates rectangular subsets

Common Patterns

  • array[:3] - first 3 rows
  • array[-2:] - last 2 rows
  • array[:, 1:] - skip first column
  • array[::2] - every 2nd row
  • array[:, ::2] - every 2nd column

Selection Syntax Reference

array[row]              # Single row (all columns)
array[:, col]           # Single column (all rows)
array[row, col]         # Single element
array[r1:r2]            # Row slice
array[:, c1:c2]         # Column slice
array[r1:r2, c1:c2]     # 2D slice
array[:, [0, 2, 4]]     # Non-consecutive columns

Important Reminders

  • : means “all” (all rows or all columns)
  • End index is NOT included in slices: 0:3 means 0, 1, 2
  • Negative indices count from the end: -1 is last, -2 is second to last
  • A single index gives you one dimension lower: array[0] is 1D, array[0:1] is 2D

Why This Matters

Data analysis is all about extracting the right subset of data to answer specific questions:

  • Compare weekday vs weekend performance
  • Analyze specific product lines
  • Extract quarterly or seasonal data
  • Focus on specific features or measurements

The selection techniques you learned here enable all of these analyses. You can now pinpoint exactly the data you need.


Next Steps

You can now create arrays, load data, and extract specific subsets. In the next lesson, you will learn to perform calculations on these arrays using vectorized operations—the true power of NumPy.

Continue to Lesson 4 - Vector Operations

Learn arithmetic operations, broadcasting, and statistical calculations on arrays

Back to Lesson 2 - 2D Arrays and CSV Data

Review two-dimensional arrays and loading CSV files


Master Data Extraction

Selection and slicing are fundamental skills you will use in every data analysis project. With these techniques mastered, you are ready to move on to performing calculations and deriving insights from your data.

The ability to extract exactly the data you need is what separates beginners from proficient data analysts. You now have that ability!