Lesson 4 - Selecting Data with .iloc[]
Position-Based Selection
You learned .loc[] for label-based selection. Now you will learn .iloc[]—position-based selection using integer indices, just like NumPy arrays.
By the end of this lesson, you will be able to:
- Select data by integer position using
.iloc[] - Use negative indices to count from the end
- Understand the difference between
.loc[]and.iloc[] - Choose the right selection method for your task
- Apply NumPy-style indexing to DataFrames
Position-based selection is useful when you need to work with data by location rather than by name. Let’s explore when and how to use it.
Understanding .iloc[]
The .iloc[] accessor uses integer positions (0-based indexing) to select data. This works exactly like NumPy arrays and Python lists.
Syntax:
df.iloc[row_position, column_position]Key Comparison: .loc[] vs .iloc[]
| Feature | .loc[] | .iloc[] |
|---|---|---|
| Uses | Labels (names) | Positions (0, 1, 2, …) |
| Slicing | Inclusive both ends | Exclusive end (like Python) |
| Example | df.loc['TechCorp Global', 'revenues'] | df.iloc[0, 2] |
| Negative Index | Not allowed | Supported (-1 = last) |
Let’s create our sample dataset:
import pandas as pd
import numpy as np
# Create companies dataset
companies = pd.DataFrame({
'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
'revenues': [125000, 89000, 156000, 234000, 178000,
98000, 112000, 187000, 145000, 165000],
'profits': [12000, 8500, -3000, 45000, 23000,
15000, 9800, 21000, 18000, 28000],
'employees': [1200, 890, 2300, 5600, 3400,
2100, 4500, 8900, 3200, 6700],
'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
'USA', 'UK', 'Germany', 'USA', 'UK']
})
# Set company as index
companies = companies.set_index('company')
print(f"Shape: {companies.shape}")
companies.head()This creates a DataFrame with 10 companies and 5 columns.
Selecting Rows by Position
Select single or multiple rows using integer positions:
Single Row Selection
# Select first row (position 0)
first_row = companies.iloc[0]
print("First row (position 0):")
print(first_row)
print(f"Type: {type(first_row)}")Output:
First row (position 0):
sector Technology
revenues 125000
profits 12000
employees 1200
country USA
Name: TechCorp Global, dtype: object
Type: <class 'pandas.core.series.Series'>The first row is at position 0, and it returns a Series.
Negative Indexing
# Select last row (position -1, like Python lists)
last_row = companies.iloc[-1]
print("Last row:")
print(last_row)Output:
sector Technology
revenues 165000
profits 28000
employees 6700
country UK
Name: NetLink Telecom, dtype: objectNegative indices count from the end: -1 is the last row, -2 is second-to-last, etc.
Row Slicing
Important Difference
Slicing with .iloc[] is exclusive on the end, just like Python slicing. This is different from .loc[]!
# Select first 3 rows (positions 0, 1, 2)
first_three = companies.iloc[0:3] # EXCLUSIVE end!
print(f"Shape: {first_three.shape}")
first_threeThis gets rows at positions 0, 1, and 2 (3 rows total).
Comparison of Slicing:
# .iloc[] slicing (EXCLUSIVE end)
print("iloc[0:3] gives rows at positions 0, 1, 2")
print(f"Result: {companies.iloc[0:3].shape[0]} rows")
# .loc[] slicing (INCLUSIVE end)
print("\nloc['TechCorp Global':'AutoDrive Motors'] includes both endpoints")
print(f"Result: {companies.loc['TechCorp Global':'AutoDrive Motors'].shape[0]} rows")Output:
iloc[0:3] gives rows at positions 0, 1, 2
Result: 3 rows
loc['TechCorp Global':'AutoDrive Motors'] includes both endpoints
Result: 3 rowsVisual representation:
Position Row Label
0 TechCorp Global ← iloc[0:3] starts here
1 FreshMart Inc
2 AutoDrive Motors ← iloc[0:3] stops BEFORE position 3
3 FirstBank Holdings ← Not included!
4 PowerGen Energy
...Selecting Specific Rows
# Select rows at positions 0, 3, 7
selected_rows = companies.iloc[[0, 3, 7]]
print("Rows at positions 0, 3, 7:")
selected_rowsUse a list of positions inside double brackets to select non-consecutive rows.
Selecting Rows and Columns Together
Just like NumPy: df.iloc[rows, columns]
Single Element
# Select row 0, column 2
value = companies.iloc[0, 2]
print(f"Value at position [0, 2]: ${value:,}")
print(f"This is '{companies.columns[2]}' for '{companies.index[0]}'")Output:
Value at position [0, 2]: $12,000
This is 'profits' for 'TechCorp Global'You need to know that column 2 is ‘profits’—not self-documenting!
Row and Column Slices
# First 3 rows, first 2 columns
subset = companies.iloc[0:3, 0:2]
print(f"Shape: {subset.shape}")
subsetOutput:
Shape: (3, 2)
sector revenues
company
TechCorp Global Technology 125000
FreshMart Inc Food 89000
AutoDrive Motors Automotive 156000Selecting Specific Columns
# All rows, columns at positions 1 and 2
revenues_profits = companies.iloc[:, [1, 2]]
print(f"Columns at positions 1 and 2: {list(companies.columns[[1, 2]])}")
revenues_profits.head()The : means “all rows”, and [1, 2] selects columns at positions 1 and 2.
Last N Rows
# Last 5 rows, first 3 columns
last_five = companies.iloc[-5:, :3]
print(f"Shape: {last_five.shape}")
last_fiveNegative indices work great for “last N” selections.
Step Slicing
# Every other row
every_other = companies.iloc[::2, :]
print(f"Every other row (step=2): {every_other.shape[0]} rows")
every_otherOutput shows rows at positions 0, 2, 4, 6, 8 (every other row).
Specific Rows and Columns
# Rows 1, 3, 5 and columns 0, 2, 4
subset2 = companies.iloc[[1, 3, 5], [0, 2, 4]]
print("Selected rows: 1, 3, 5")
print(f"Selected columns: {list(companies.columns[[0, 2, 4]])}")
subset2Combine lists of positions for both dimensions.
When to Use .loc[] vs .iloc[]
Choose the right tool for the job:
Use .loc[] When:
- You know the label (row name, column name)
- You want readable, maintainable code
- Working with business logic (“get data for TechCorp Global”)
- The selection has semantic meaning
Example:
# Get data for specific company (label-based makes sense)
ali_data = companies.loc['TechCorp Global']
print("TechCorp Global data (using .loc):")
print(ali_data)This is self-documenting and clear.
Use .iloc[] When:
- You know the position (row 5, column 2)
- You need first/last N rows
- You’re iterating with indices
- Working with generic operations (sampling, every Nth row)
- Position-based logic is more natural
Example:
# Get first 5 companies (position-based makes sense)
top_5 = companies.iloc[:5]
print("First 5 companies:")
print(top_5)For “first 5 rows”, position-based selection is natural.
Side-by-Side Comparison
# Both can achieve the same result!
# Method 1: .iloc (position-based)
revenue_iloc = companies.iloc[0, 1] # Must know column position!
# Method 2: .loc (label-based) - MORE READABLE!
revenue_loc = companies.loc['TechCorp Global', 'revenues']
print(f"Using .iloc[0, 1]: ${revenue_iloc:,}")
print(f"Using .loc['TechCorp Global', 'revenues']: ${revenue_loc:,}")
print(f"\nSame result? {revenue_iloc == revenue_loc}")
print("\nBUT .loc is more readable and maintainable!")Both work, but .loc[] is clearer.
Real-World Scenario
Imagine your boss asks: “What are the revenues for PowerGen Energy?”
Bad approach (requires manual counting):
# You count manually: PowerGen Energy is row 4, revenues is column 1
answer = companies.iloc[4, 1] # What if data order changes?Good approach (clear and robust):
answer = companies.loc['PowerGen Energy', 'revenues'] # Self-documenting!The .loc[] version is:
- Readable: Anyone knows what you’re selecting
- Robust: Works even if row order changes
- Maintainable: Easy to understand months later
When .iloc[] Makes Sense
# Sample 3 random companies (position-based makes sense)
sample_indices = np.random.choice(companies.shape[0], size=3, replace=False)
random_sample = companies.iloc[sample_indices]
print(f"Random sample of 3 companies (positions {sample_indices}):")
random_sampleFor random sampling or algorithmic selection, .iloc[] is appropriate.
Selection Syntax Summary
Complete reference for .iloc[]:
# Single row
df.iloc[0] # First row
df.iloc[-1] # Last row
# Multiple rows
df.iloc[[0, 2, 4]] # Specific rows
df.iloc[0:5] # First 5 rows (0,1,2,3,4)
df.iloc[-3:] # Last 3 rows
df.iloc[::2] # Every other row
# Single element
df.iloc[0, 2] # Row 0, column 2
# Rows and columns together
df.iloc[:3, :2] # First 3 rows, first 2 columns
df.iloc[[0,1], [2,3]] # Specific rows and columns
df.iloc[-5:, -3:] # Last 5 rows, last 3 columns
df.iloc[::2, 1:4] # Every other row, columns 1-3
# All rows or all columns
df.iloc[:, 0] # All rows, first column
df.iloc[0, :] # First row, all columnsVisual guide:
DataFrame positions:
Col0 Col1 Col2 Col3
Row0 a b c d
Row1 e f g h
Row2 i j k l
df.iloc[0, 1] → b (single value)
df.iloc[0] → [a, b, c, d] (Series)
df.iloc[[0, 2]] → Rows 0 and 2 (DataFrame)
df.iloc[0:2] → Rows 0 and 1 (exclusive!)
df.iloc[:, 1] → [b, f, j] (Series)
df.iloc[:, [0, 2]] → Columns 0 and 2 (DataFrame)
df.iloc[0:2, 1:3] → 2x2 subset (DataFrame)Practice Exercises
Apply position-based selection with these exercises.
Exercise 1: Basic .iloc[] Operations
- Get the second row
- Get rows 3, 4, 5 using a slice
- Get the last 3 rows
# Your code hereHint
Remember: .iloc[] slicing is exclusive on the end, just like Python!
Exercise 2: Rows and Columns Together
- Get the value at row 2, column 3
- Get first 4 rows and first 2 columns
- Get all rows, columns 1, 3, and 4
# Your code hereExercise 3: .loc[] vs .iloc[] Comparison
For the company ‘FirstBank Holdings’:
- Get its profits using
.loc[] - Get its profits using
.iloc[](find its position first) - Which method is easier and why?
# Your code hereExercise 4: Advanced Selection
- Get every 3rd row (rows 0, 3, 6, 9)
- Get the middle 4 rows
- Get last 2 rows and last 2 columns
# Your code hereSummary
You now understand position-based selection with .iloc[]. Let’s review the key concepts.
Key Concepts
.iloc[] Uses Integer Positions
- 0-based indexing like NumPy and Python
- Position 0 is first row/column
- Position -1 is last row/column
Slicing is Exclusive on End
df.iloc[0:3]gets positions 0, 1, 2 (NOT 3)- Different from
.loc[]which is inclusive - Same as Python list slicing
Negative Indices Work
-1is last row/column-2is second-to-lastdf.iloc[-5:]gets last 5 rows
Choose the Right Tool
.loc[]for labels → readable, maintainable.iloc[]for positions → generic operations
Syntax Reference
# Rows
df.iloc[0] # First row
df.iloc[-1] # Last row
df.iloc[0:5] # First 5 rows (0-4)
df.iloc[[0,2,4]] # Specific rows
df.iloc[::2] # Every other row
# Rows and columns
df.iloc[0, 2] # Single value
df.iloc[:3, :2] # First 3 rows, first 2 cols
df.iloc[[0,1], [2,3]] # Specific rows and cols
df.iloc[-5:, -3:] # Last 5 rows, last 3 cols.loc[] vs .iloc[] Decision Guide
| Situation | Use .loc[] | Use .iloc[] |
|---|---|---|
| Know the name | ✅ Yes | ❌ No |
| Know the position | ❌ No | ✅ Yes |
| First/last N rows | ❌ No | ✅ Yes |
| Specific company/ID | ✅ Yes | ❌ No |
| Random sampling | ❌ No | ✅ Yes |
| Business logic | ✅ Yes | ❌ No |
| Generic operations | ❌ No | ✅ Yes |
Important Reminders
- Positions, not labels: Use integers (0, 1, 2, …)
- Exclusive slicing: End position not included in slice
- Negative indexing: Count from the end with
-1,-2, etc. - Use lists for multiple:
[0, 2, 4]for specific positions - Prefer
.loc[]: More readable for production code - Use
.iloc[]: For truly position-based operations
Next Steps
You can now select data using both labels and positions. In the next lesson, you will learn Series operations and value counting—essential for analyzing categorical data.
Continue to Lesson 5 - Series Operations & Value Counts
Learn to perform calculations on Series and analyze categorical data
Back to Lesson 3 - Selecting with .loc[]
Review label-based selection using meaningful names
Master Both Selection Methods
Label-based selection with .loc[] makes code readable. Position-based selection with .iloc[] handles generic operations. Knowing both makes you a versatile pandas user.
Prefer .loc[] when possible—it’s more maintainable. Use .iloc[] when position-based logic makes sense!