Lesson 3 - Selecting Data with .loc[]
Precise Label-Based Selection
You have learned to load data into DataFrames. Now you need to access specific parts of that data. This lesson teaches you .loc[]—pandas’ powerful label-based selection tool.
By the end of this lesson, you will be able to:
- Select single values using row and column labels
- Select entire rows by label
- Select multiple rows and columns
- Understand label-based slicing (inclusive on both ends)
- Combine row and column selections
- Write readable, self-documenting selection code
Label-based selection makes your code clear and maintainable. Let’s master it.
Understanding .loc[]
The .loc[] accessor uses labels (row names and column names) to select data. This is different from NumPy, which uses numeric positions.
Syntax:
df.loc[row_label, column_label]Why use labels instead of positions?
- Self-documenting:
df.loc['TechCorp Global', 'revenues']is clear - Robust: Works even if row order changes
- Readable: Anyone can understand what you’re selecting
Let’s create a sample dataset to practice:
import pandas as pd
import numpy as np
# Create a dataset of companies
companies = pd.DataFrame({
'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
'revenues': [125000, 89000, 156000, 234000, 178000,
98000, 112000, 187000, 145000, 165000],
'profits': [12000, 8500, -3000, 45000, 23000,
15000, 9800, 21000, 18000, 28000],
'employees': [1200, 890, 2300, 5600, 3400,
2100, 4500, 8900, 3200, 6700],
'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
'USA', 'UK', 'Germany', 'USA', 'UK']
})
# Set company names as the index (row labels)
companies = companies.set_index('company')
print("Companies DataFrame:")
print(companies.head())Notice that company is now the index (row labels), not a regular column.
Selecting Single Elements
Select a specific value using both row and column labels:
# Select revenues of 'TechCorp Global'
ali_revenue = companies.loc['TechCorp Global', 'revenues']
print(f"TechCorp Global revenues: ${ali_revenue:,}")
print(f"Type: {type(ali_revenue)}")Output:
TechCorp Global revenues: $125,000
Type: <class 'numpy.int64'>Compare with NumPy:
# NumPy way: arr[0, 1] <- What is column 1?
# Pandas way: df.loc['TechCorp Global', 'revenues'] <- Clear and readable!More examples:
# Select profit of 'FirstBank Holdings'
maryam_profit = companies.loc['FirstBank Holdings', 'profits']
print(f"FirstBank Holdings profits: ${maryam_profit:,}")
# Output: FirstBank Holdings profits: $45,000
# Select country of 'PowerGen Energy'
hassan_country = companies.loc['PowerGen Energy', 'country']
print(f"PowerGen Energy country: {hassan_country}")
# Output: PowerGen Energy country: GermanySelecting Rows
Single Row Selection
Select one row by its label—returns a Series:
# Select all data for 'TechCorp Global'
ali_data = companies.loc['TechCorp Global']
print(f"Type: {type(ali_data)}")
print(ali_data)Output:
Type: <class 'pandas.core.series.Series'>
sector Technology
revenues 125000
profits 12000
employees 1200
country USA
Name: TechCorp Global, dtype: objectThe Series contains all columns for that row, with the company name as the Series name.
Multiple Row Selection
Select multiple rows using a list—returns a DataFrame:
# Select data for 3 specific companies
selected_companies = companies.loc[['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors']]
print(f"Type: {type(selected_companies)}")
print(f"Shape: {selected_companies.shape}")
print(selected_companies)Output:
Type: <class 'pandas.core.frame.DataFrame'>
Shape: (3, 5)
sector revenues profits employees country
company
TechCorp Global Technology 125000 12000 1200 USA
FreshMart Inc Food 89000 8500 890 USA
AutoDrive Motors Automotive 156000 -3000 2300 USASlicing Rows
Important Difference
Slicing with .loc[] is inclusive on both ends. This is different from Python slicing!
# Slice from 'TechCorp Global' to 'PowerGen Energy' (INCLUSIVE!)
slice_companies = companies.loc['TechCorp Global':'PowerGen Energy']
print(f"Shape: {slice_companies.shape}")
print(slice_companies)This includes both ‘TechCorp Global’ and ‘PowerGen Energy’ plus everything in between.
Comparison:
# Python list slicing (EXCLUSIVE end)
my_list = ['a', 'b', 'c', 'd', 'e']
print(f"my_list[1:3] = {my_list[1:3]}")
# Output: ['b', 'c'] - 'd' NOT included
# .loc[] slicing (INCLUSIVE end)
# companies.loc['FreshMart Inc':'FirstBank Holdings']
# 'FirstBank Holdings' IS included!Visual representation:
Row Labels:
TechCorp Global ← Start (included)
FreshMart Inc
AutoDrive Motors
FirstBank Holdings
PowerGen Energy ← End (included)
MediPharm Solutions
...
companies.loc['TechCorp Global':'PowerGen Energy'] gets all 5 rowsSelecting Columns
Use : for “all rows” and specify column names:
Single Column Selection
# All rows, single column (returns Series)
all_revenues = companies.loc[:, 'revenues']
print(f"Type: {type(all_revenues)}")
print(f"Shape: {all_revenues.shape}")
print(all_revenues)Output:
Type: <class 'pandas.core.series.Series'>
Shape: (10,)
company
TechCorp Global 125000
FreshMart Inc 89000
AutoDrive Motors 156000
...Shorthand:
# These are equivalent:
all_revenues_v1 = companies.loc[:, 'revenues']
all_revenues_v2 = companies['revenues']
# Both give the same result
print(all_revenues_v1.equals(all_revenues_v2))
# Output: TrueMultiple Column Selection
# All rows, multiple columns (returns DataFrame)
financials = companies.loc[:, ['revenues', 'profits']]
print(f"Type: {type(financials)}")
print(f"Shape: {financials.shape}")
print(financials.head())Output:
Type: <class 'pandas.core.frame.DataFrame'>
Shape: (10, 2)
revenues profits
company
TechCorp Global 125000 12000
FreshMart Inc 89000 8500
AutoDrive Motors 156000 -3000
...Column Slicing
Slice columns (inclusive on both ends):
# Slice columns from 'sector' to 'profits' (INCLUSIVE!)
slice_cols = companies.loc[:, 'sector':'profits']
print(f"Shape: {slice_cols.shape}")
print(slice_cols.head())Output:
Shape: (10, 3)
sector revenues profits
company
TechCorp Global Technology 125000 12000
FreshMart Inc Food 89000 8500
...Non-Consecutive Columns
# Select specific columns (not adjacent)
selected_cols = companies.loc[:, ['sector', 'profits', 'country']]
print(selected_cols.head())Output:
sector profits country
company
TechCorp Global Technology 12000 USA
FreshMart Inc Food 8500 USA
...Combining Row and Column Selections
The real power of .loc[] comes from combining row and column selections:
Specific Rows and Columns
# Select specific rows and specific columns
subset = companies.loc[
['TechCorp Global', 'FreshMart Inc', 'PowerGen Energy'], # These rows
['revenues', 'profits', 'employees'] # These columns
]
print(f"Shape: {subset.shape}")
print(subset)Output:
Shape: (3, 3)
revenues profits employees
company
TechCorp Global 125000 12000 1200
FreshMart Inc 89000 8500 890
PowerGen Energy 178000 23000 3400Row Slice with Column Slice
# Row slice + column slice
subset = companies.loc[
'TechCorp Global':'AutoDrive Motors', # First 3 companies
'sector':'profits' # First 3 columns
]
print(subset)Output:
sector revenues profits
company
TechCorp Global Technology 125000 12000
FreshMart Inc Food 89000 8500
AutoDrive Motors Automotive 156000 -3000Row List with Column Slice
# Specific rows + column slice
subset = companies.loc[
['FirstBank Holdings', 'PowerGen Energy', 'NetLink Telecom'],
'revenues':'employees'
]
print(subset)Output:
revenues profits employees
company
FirstBank Holdings 234000 45000 5600
PowerGen Energy 178000 23000 3400
NetLink Telecom 165000 28000 6700Selection Syntax Summary
Here is a complete reference for .loc[] syntax:
# Single element
df.loc[row_label, col_label]
# Single row, all columns
df.loc[row_label]
# Multiple rows, all columns
df.loc[[row1, row2, row3]]
# Row slice, all columns (INCLUSIVE!)
df.loc[start_row:end_row]
# All rows, single column
df.loc[:, col_label]
# All rows, multiple columns
df.loc[:, [col1, col2, col3]]
# All rows, column slice (INCLUSIVE!)
df.loc[:, start_col:end_col]
# Specific rows and columns
df.loc[[row1, row2], [col1, col2]]
# Row slice and column slice
df.loc[start_row:end_row, start_col:end_col]Visual guide:
DataFrame:
Col1 Col2 Col3 Col4
Row1 a b c d
Row2 e f g h
Row3 i j k l
df.loc['Row1', 'Col2'] → b (single value)
df.loc['Row1'] → [a, b, c, d] (Series)
df.loc[['Row1', 'Row3']] → Rows 1 and 3 (DataFrame)
df.loc['Row1':'Row2'] → Rows 1 and 2 (inclusive!)
df.loc[:, 'Col2'] → [b, f, j] (Series)
df.loc[:, ['Col1', 'Col3']] → Columns 1 and 3 (DataFrame)
df.loc['Row1':'Row2', 'Col2':'Col3'] → 2x2 subset (DataFrame)Practice Exercises
Apply label-based selection with these exercises.
Exercise 1: Single Selections
- Get the number of employees at ‘MediPharm Solutions’
- Get the sector of ‘NetLink Telecom’
- Get all information for ‘SteelCore Industries’
# Your code hereExercise 2: Multiple Row Selection
- Get data for both technology companies: ‘TechCorp Global’ and ‘NetLink Telecom’
- Get data for companies from ‘PowerGen Energy’ to ‘RetailHub Ltd’ (using slice)
- How many companies are in the slice?
# Your code hereHint
Remember that slicing with .loc[] includes both endpoints!
Exercise 3: Column Selection
- Get the ‘profits’ column for all companies
- Get ‘sector’ and ‘country’ columns for all companies
- Get columns from ‘revenues’ to ’employees’ (slice)
# Your code hereExercise 4: Combined Selection
- Get ‘revenues’ and ‘profits’ for ‘FreshMart Inc’ and ‘FirstBank Holdings’
- Get all financial columns (revenues, profits, employees) for UK companies
- Challenge: Get ‘sector’ and ‘country’ for companies from ‘MediPharm Solutions’ to ‘NetLink Telecom’
# Your code hereSummary
You now master label-based selection with .loc[]. Let’s review the key concepts.
Key Concepts
.loc[] Uses Labels
- Row labels (index) and column names
- Not numeric positions
- Makes code self-documenting and readable
Slicing is Inclusive
df.loc['start':'end']includes both start AND end- Different from Python slicing (which excludes end)
- Different from NumPy indexing
Return Types
- Single element → scalar value
- Single row or column → Series
- Multiple rows or columns → DataFrame
Flexibility
- Select rows only, columns only, or both
- Use lists for specific items
- Use slices for ranges
- Combine any row selection with any column selection
Syntax Reference
# Single element
df.loc[row_label, col_label]
# Rows only
df.loc[row_label] # Single row
df.loc[[row1, row2]] # Multiple rows
df.loc[start:end] # Row slice (INCLUSIVE!)
# Columns only
df.loc[:, col_label] # Single column
df.loc[:, [col1, col2]] # Multiple columns
df.loc[:, start:end] # Column slice (INCLUSIVE!)
# Rows AND columns
df.loc[[row1, row2], [col1, col2]] # Specific rows and columns
df.loc[row_start:row_end, col_start:col_end] # SlicesNumPy vs Pandas Comparison
| Feature | NumPy | Pandas .loc[] |
|---|---|---|
| Indexing | Position-based | Label-based |
| Example | arr[0, 5] | df.loc['Ali', 'revenue'] |
| Readable | No (what is column 5?) | Yes (clear meaning) |
| Robust | No (breaks if order changes) | Yes (uses names) |
| Slicing | Exclusive end | Inclusive end |
Important Reminders
- Labels, not positions: Use meaningful names
- Inclusive slicing: Both start and end included
- Use lists for multiple:
[row1, row2]or[col1, col2] :means all: All rows or all columns- Returns vary: Scalar, Series, or DataFrame depending on selection
Next Steps
You can now select data precisely using labels. In the next lesson, you will learn .iloc[] for position-based selection—useful when you need to select by location rather than name.
Continue to Lesson 4 - Selecting with .iloc[]
Learn position-based selection using integer indices
Back to Lesson 2 - DataFrames and Reading Data
Review loading data from CSV, JSON, and Excel files
Master Label-Based Selection
Label-based selection with .loc[] is the pandas way of accessing data. It makes your code readable, maintainable, and robust to changes in data order.
Use .loc[] as your default selection method. In the next lesson, you will learn .iloc[] for those situations where position-based selection makes more sense!