Lesson 3 - Selecting Data with .loc[]

Precise Label-Based Selection

You have learned to load data into DataFrames. Now you need to access specific parts of that data. This lesson teaches you .loc[]—pandas’ powerful label-based selection tool.

By the end of this lesson, you will be able to:

  • Select single values using row and column labels
  • Select entire rows by label
  • Select multiple rows and columns
  • Understand label-based slicing (inclusive on both ends)
  • Combine row and column selections
  • Write readable, self-documenting selection code

Label-based selection makes your code clear and maintainable. Let’s master it.


Understanding .loc[]

The .loc[] accessor uses labels (row names and column names) to select data. This is different from NumPy, which uses numeric positions.

Syntax:

df.loc[row_label, column_label]

Why use labels instead of positions?

  • Self-documenting: df.loc['TechCorp Global', 'revenues'] is clear
  • Robust: Works even if row order changes
  • Readable: Anyone can understand what you’re selecting

Let’s create a sample dataset to practice:

import pandas as pd
import numpy as np

# Create a dataset of companies
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000, 145000, 165000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000, 18000, 28000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900, 3200, 6700],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany', 'USA', 'UK']
})

# Set company names as the index (row labels)
companies = companies.set_index('company')

print("Companies DataFrame:")
print(companies.head())

Notice that company is now the index (row labels), not a regular column.


Selecting Single Elements

Select a specific value using both row and column labels:

# Select revenues of 'TechCorp Global'
ali_revenue = companies.loc['TechCorp Global', 'revenues']

print(f"TechCorp Global revenues: ${ali_revenue:,}")
print(f"Type: {type(ali_revenue)}")

Output:

TechCorp Global revenues: $125,000
Type: <class 'numpy.int64'>

Compare with NumPy:

# NumPy way: arr[0, 1]  <- What is column 1?
# Pandas way: df.loc['TechCorp Global', 'revenues']  <- Clear and readable!

More examples:

# Select profit of 'FirstBank Holdings'
maryam_profit = companies.loc['FirstBank Holdings', 'profits']
print(f"FirstBank Holdings profits: ${maryam_profit:,}")
# Output: FirstBank Holdings profits: $45,000

# Select country of 'PowerGen Energy'
hassan_country = companies.loc['PowerGen Energy', 'country']
print(f"PowerGen Energy country: {hassan_country}")
# Output: PowerGen Energy country: Germany

Selecting Rows

Single Row Selection

Select one row by its label—returns a Series:

# Select all data for 'TechCorp Global'
ali_data = companies.loc['TechCorp Global']

print(f"Type: {type(ali_data)}")
print(ali_data)

Output:

Type: <class 'pandas.core.series.Series'>
sector                Technology
revenues                  125000
profits                    12000
employees                   1200
country                     USA
Name: TechCorp Global, dtype: object

The Series contains all columns for that row, with the company name as the Series name.

Multiple Row Selection

Select multiple rows using a list—returns a DataFrame:

# Select data for 3 specific companies
selected_companies = companies.loc[['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors']]

print(f"Type: {type(selected_companies)}")
print(f"Shape: {selected_companies.shape}")
print(selected_companies)

Output:

Type: <class 'pandas.core.frame.DataFrame'>
Shape: (3, 5)
              sector  revenues  profits  employees country
company
TechCorp Global  Technology    125000    12000       1200    USA
FreshMart Inc      Food     89000     8500        890    USA
AutoDrive Motors Automotive   156000    -3000       2300    USA

Slicing Rows

Important Difference

Slicing with .loc[] is inclusive on both ends. This is different from Python slicing!

# Slice from 'TechCorp Global' to 'PowerGen Energy' (INCLUSIVE!)
slice_companies = companies.loc['TechCorp Global':'PowerGen Energy']

print(f"Shape: {slice_companies.shape}")
print(slice_companies)

This includes both ‘TechCorp Global’ and ‘PowerGen Energy’ plus everything in between.

Comparison:

# Python list slicing (EXCLUSIVE end)
my_list = ['a', 'b', 'c', 'd', 'e']
print(f"my_list[1:3] = {my_list[1:3]}")
# Output: ['b', 'c'] - 'd' NOT included

# .loc[] slicing (INCLUSIVE end)
# companies.loc['FreshMart Inc':'FirstBank Holdings']
# 'FirstBank Holdings' IS included!

Visual representation:

Row Labels:
TechCorp Global       ← Start (included)
FreshMart Inc
AutoDrive Motors
FirstBank Holdings
PowerGen Energy  ← End (included)
MediPharm Solutions
...

companies.loc['TechCorp Global':'PowerGen Energy'] gets all 5 rows

Selecting Columns

Use : for “all rows” and specify column names:

Single Column Selection

# All rows, single column (returns Series)
all_revenues = companies.loc[:, 'revenues']

print(f"Type: {type(all_revenues)}")
print(f"Shape: {all_revenues.shape}")
print(all_revenues)

Output:

Type: <class 'pandas.core.series.Series'>
Shape: (10,)
company
TechCorp Global        125000
FreshMart Inc       89000
AutoDrive Motors     156000
...

Shorthand:

# These are equivalent:
all_revenues_v1 = companies.loc[:, 'revenues']
all_revenues_v2 = companies['revenues']

# Both give the same result
print(all_revenues_v1.equals(all_revenues_v2))
# Output: True

Multiple Column Selection

# All rows, multiple columns (returns DataFrame)
financials = companies.loc[:, ['revenues', 'profits']]

print(f"Type: {type(financials)}")
print(f"Shape: {financials.shape}")
print(financials.head())

Output:

Type: <class 'pandas.core.frame.DataFrame'>
Shape: (10, 2)
             revenues  profits
company
TechCorp Global       125000    12000
FreshMart Inc      89000     8500
AutoDrive Motors    156000    -3000
...

Column Slicing

Slice columns (inclusive on both ends):

# Slice columns from 'sector' to 'profits' (INCLUSIVE!)
slice_cols = companies.loc[:, 'sector':'profits']

print(f"Shape: {slice_cols.shape}")
print(slice_cols.head())

Output:

Shape: (10, 3)
              sector  revenues  profits
company
TechCorp Global  Technology    125000    12000
FreshMart Inc      Food     89000     8500
...

Non-Consecutive Columns

# Select specific columns (not adjacent)
selected_cols = companies.loc[:, ['sector', 'profits', 'country']]
print(selected_cols.head())

Output:

              sector  profits country
company
TechCorp Global  Technology    12000    USA
FreshMart Inc      Food     8500    USA
...

Combining Row and Column Selections

The real power of .loc[] comes from combining row and column selections:

Specific Rows and Columns

# Select specific rows and specific columns
subset = companies.loc[
    ['TechCorp Global', 'FreshMart Inc', 'PowerGen Energy'],  # These rows
    ['revenues', 'profits', 'employees']           # These columns
]

print(f"Shape: {subset.shape}")
print(subset)

Output:

Shape: (3, 3)
               revenues  profits  employees
company
TechCorp Global         125000    12000       1200
FreshMart Inc        89000     8500        890
PowerGen Energy    178000    23000       3400

Row Slice with Column Slice

# Row slice + column slice
subset = companies.loc[
    'TechCorp Global':'AutoDrive Motors',  # First 3 companies
    'sector':'profits'          # First 3 columns
]

print(subset)

Output:

              sector  revenues  profits
company
TechCorp Global  Technology    125000    12000
FreshMart Inc      Food     89000     8500
AutoDrive Motors Automotive   156000    -3000

Row List with Column Slice

# Specific rows + column slice
subset = companies.loc[
    ['FirstBank Holdings', 'PowerGen Energy', 'NetLink Telecom'],
    'revenues':'employees'
]

print(subset)

Output:

               revenues  profits  employees
company
FirstBank Holdings      234000    45000       5600
PowerGen Energy    178000    23000       3400
NetLink Telecom    165000    28000       6700

Selection Syntax Summary

Here is a complete reference for .loc[] syntax:

# Single element
df.loc[row_label, col_label]

# Single row, all columns
df.loc[row_label]

# Multiple rows, all columns
df.loc[[row1, row2, row3]]

# Row slice, all columns (INCLUSIVE!)
df.loc[start_row:end_row]

# All rows, single column
df.loc[:, col_label]

# All rows, multiple columns
df.loc[:, [col1, col2, col3]]

# All rows, column slice (INCLUSIVE!)
df.loc[:, start_col:end_col]

# Specific rows and columns
df.loc[[row1, row2], [col1, col2]]

# Row slice and column slice
df.loc[start_row:end_row, start_col:end_col]

Visual guide:

DataFrame:
         Col1  Col2  Col3  Col4
Row1      a     b     c     d
Row2      e     f     g     h
Row3      i     j     k     l

df.loc['Row1', 'Col2']          → b (single value)
df.loc['Row1']                  → [a, b, c, d] (Series)
df.loc[['Row1', 'Row3']]        → Rows 1 and 3 (DataFrame)
df.loc['Row1':'Row2']           → Rows 1 and 2 (inclusive!)
df.loc[:, 'Col2']               → [b, f, j] (Series)
df.loc[:, ['Col1', 'Col3']]     → Columns 1 and 3 (DataFrame)
df.loc['Row1':'Row2', 'Col2':'Col3'] → 2x2 subset (DataFrame)

Practice Exercises

Apply label-based selection with these exercises.

Exercise 1: Single Selections

  1. Get the number of employees at ‘MediPharm Solutions’
  2. Get the sector of ‘NetLink Telecom’
  3. Get all information for ‘SteelCore Industries’
# Your code here

Exercise 2: Multiple Row Selection

  1. Get data for both technology companies: ‘TechCorp Global’ and ‘NetLink Telecom’
  2. Get data for companies from ‘PowerGen Energy’ to ‘RetailHub Ltd’ (using slice)
  3. How many companies are in the slice?
# Your code here

Hint

Remember that slicing with .loc[] includes both endpoints!

Exercise 3: Column Selection

  1. Get the ‘profits’ column for all companies
  2. Get ‘sector’ and ‘country’ columns for all companies
  3. Get columns from ‘revenues’ to ’employees’ (slice)
# Your code here

Exercise 4: Combined Selection

  1. Get ‘revenues’ and ‘profits’ for ‘FreshMart Inc’ and ‘FirstBank Holdings’
  2. Get all financial columns (revenues, profits, employees) for UK companies
  3. Challenge: Get ‘sector’ and ‘country’ for companies from ‘MediPharm Solutions’ to ‘NetLink Telecom’
# Your code here

Summary

You now master label-based selection with .loc[]. Let’s review the key concepts.

Key Concepts

.loc[] Uses Labels

  • Row labels (index) and column names
  • Not numeric positions
  • Makes code self-documenting and readable

Slicing is Inclusive

  • df.loc['start':'end'] includes both start AND end
  • Different from Python slicing (which excludes end)
  • Different from NumPy indexing

Return Types

  • Single element → scalar value
  • Single row or column → Series
  • Multiple rows or columns → DataFrame

Flexibility

  • Select rows only, columns only, or both
  • Use lists for specific items
  • Use slices for ranges
  • Combine any row selection with any column selection

Syntax Reference

# Single element
df.loc[row_label, col_label]

# Rows only
df.loc[row_label]                    # Single row
df.loc[[row1, row2]]                 # Multiple rows
df.loc[start:end]                    # Row slice (INCLUSIVE!)

# Columns only
df.loc[:, col_label]                 # Single column
df.loc[:, [col1, col2]]              # Multiple columns
df.loc[:, start:end]                 # Column slice (INCLUSIVE!)

# Rows AND columns
df.loc[[row1, row2], [col1, col2]]   # Specific rows and columns
df.loc[row_start:row_end, col_start:col_end]  # Slices

NumPy vs Pandas Comparison

FeatureNumPyPandas .loc[]
IndexingPosition-basedLabel-based
Examplearr[0, 5]df.loc['Ali', 'revenue']
ReadableNo (what is column 5?)Yes (clear meaning)
RobustNo (breaks if order changes)Yes (uses names)
SlicingExclusive endInclusive end

Important Reminders

  • Labels, not positions: Use meaningful names
  • Inclusive slicing: Both start and end included
  • Use lists for multiple: [row1, row2] or [col1, col2]
  • : means all: All rows or all columns
  • Returns vary: Scalar, Series, or DataFrame depending on selection

Next Steps

You can now select data precisely using labels. In the next lesson, you will learn .iloc[] for position-based selection—useful when you need to select by location rather than name.

Continue to Lesson 4 - Selecting with .iloc[]

Learn position-based selection using integer indices

Back to Lesson 2 - DataFrames and Reading Data

Review loading data from CSV, JSON, and Excel files


Master Label-Based Selection

Label-based selection with .loc[] is the pandas way of accessing data. It makes your code readable, maintainable, and robust to changes in data order.

Use .loc[] as your default selection method. In the next lesson, you will learn .iloc[] for those situations where position-based selection makes more sense!