Lesson 4 - Selecting Data with .iloc[]

Position-Based Selection

You learned .loc[] for label-based selection. Now you will learn .iloc[]—position-based selection using integer indices, just like NumPy arrays.

By the end of this lesson, you will be able to:

  • Select data by integer position using .iloc[]
  • Use negative indices to count from the end
  • Understand the difference between .loc[] and .iloc[]
  • Choose the right selection method for your task
  • Apply NumPy-style indexing to DataFrames

Position-based selection is useful when you need to work with data by location rather than by name. Let’s explore when and how to use it.


Understanding .iloc[]

The .iloc[] accessor uses integer positions (0-based indexing) to select data. This works exactly like NumPy arrays and Python lists.

Syntax:

df.iloc[row_position, column_position]

Key Comparison: .loc[] vs .iloc[]

Feature.loc[].iloc[]
UsesLabels (names)Positions (0, 1, 2, …)
SlicingInclusive both endsExclusive end (like Python)
Exampledf.loc['TechCorp Global', 'revenues']df.iloc[0, 2]
Negative IndexNot allowedSupported (-1 = last)

Let’s create our sample dataset:

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000, 145000, 165000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000, 18000, 28000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900, 3200, 6700],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany', 'USA', 'UK']
})

# Set company as index
companies = companies.set_index('company')

print(f"Shape: {companies.shape}")
companies.head()

This creates a DataFrame with 10 companies and 5 columns.


Selecting Rows by Position

Select single or multiple rows using integer positions:

Single Row Selection

# Select first row (position 0)
first_row = companies.iloc[0]

print("First row (position 0):")
print(first_row)
print(f"Type: {type(first_row)}")

Output:

First row (position 0):
sector                Technology
revenues                  125000
profits                    12000
employees                   1200
country                     USA
Name: TechCorp Global, dtype: object
Type: <class 'pandas.core.series.Series'>

The first row is at position 0, and it returns a Series.

Negative Indexing

# Select last row (position -1, like Python lists)
last_row = companies.iloc[-1]

print("Last row:")
print(last_row)

Output:

sector                Technology
revenues                  165000
profits                    28000
employees                   6700
country                      UK
Name: NetLink Telecom, dtype: object

Negative indices count from the end: -1 is the last row, -2 is second-to-last, etc.

Row Slicing

Important Difference

Slicing with .iloc[] is exclusive on the end, just like Python slicing. This is different from .loc[]!

# Select first 3 rows (positions 0, 1, 2)
first_three = companies.iloc[0:3]  # EXCLUSIVE end!

print(f"Shape: {first_three.shape}")
first_three

This gets rows at positions 0, 1, and 2 (3 rows total).

Comparison of Slicing:

# .iloc[] slicing (EXCLUSIVE end)
print("iloc[0:3] gives rows at positions 0, 1, 2")
print(f"Result: {companies.iloc[0:3].shape[0]} rows")

# .loc[] slicing (INCLUSIVE end)
print("\nloc['TechCorp Global':'AutoDrive Motors'] includes both endpoints")
print(f"Result: {companies.loc['TechCorp Global':'AutoDrive Motors'].shape[0]} rows")

Output:

iloc[0:3] gives rows at positions 0, 1, 2
Result: 3 rows

loc['TechCorp Global':'AutoDrive Motors'] includes both endpoints
Result: 3 rows

Visual representation:

Position  Row Label
   0      TechCorp Global       ← iloc[0:3] starts here
   1      FreshMart Inc
   2      AutoDrive Motors    ← iloc[0:3] stops BEFORE position 3
   3      FirstBank Holdings    ← Not included!
   4      PowerGen Energy
   ...

Selecting Specific Rows

# Select rows at positions 0, 3, 7
selected_rows = companies.iloc[[0, 3, 7]]

print("Rows at positions 0, 3, 7:")
selected_rows

Use a list of positions inside double brackets to select non-consecutive rows.


Selecting Rows and Columns Together

Just like NumPy: df.iloc[rows, columns]

Single Element

# Select row 0, column 2
value = companies.iloc[0, 2]

print(f"Value at position [0, 2]: ${value:,}")
print(f"This is '{companies.columns[2]}' for '{companies.index[0]}'")

Output:

Value at position [0, 2]: $12,000
This is 'profits' for 'TechCorp Global'

You need to know that column 2 is ‘profits’—not self-documenting!

Row and Column Slices

# First 3 rows, first 2 columns
subset = companies.iloc[0:3, 0:2]

print(f"Shape: {subset.shape}")
subset

Output:

Shape: (3, 2)
              sector  revenues
company
TechCorp Global  Technology    125000
FreshMart Inc      Food     89000
AutoDrive Motors Automotive   156000

Selecting Specific Columns

# All rows, columns at positions 1 and 2
revenues_profits = companies.iloc[:, [1, 2]]

print(f"Columns at positions 1 and 2: {list(companies.columns[[1, 2]])}")
revenues_profits.head()

The : means “all rows”, and [1, 2] selects columns at positions 1 and 2.

Last N Rows

# Last 5 rows, first 3 columns
last_five = companies.iloc[-5:, :3]

print(f"Shape: {last_five.shape}")
last_five

Negative indices work great for “last N” selections.

Step Slicing

# Every other row
every_other = companies.iloc[::2, :]

print(f"Every other row (step=2): {every_other.shape[0]} rows")
every_other

Output shows rows at positions 0, 2, 4, 6, 8 (every other row).

Specific Rows and Columns

# Rows 1, 3, 5 and columns 0, 2, 4
subset2 = companies.iloc[[1, 3, 5], [0, 2, 4]]

print("Selected rows: 1, 3, 5")
print(f"Selected columns: {list(companies.columns[[0, 2, 4]])}")
subset2

Combine lists of positions for both dimensions.


When to Use .loc[] vs .iloc[]

Choose the right tool for the job:

Use .loc[] When:

  • You know the label (row name, column name)
  • You want readable, maintainable code
  • Working with business logic (“get data for TechCorp Global”)
  • The selection has semantic meaning

Example:

# Get data for specific company (label-based makes sense)
ali_data = companies.loc['TechCorp Global']
print("TechCorp Global data (using .loc):")
print(ali_data)

This is self-documenting and clear.

Use .iloc[] When:

  • You know the position (row 5, column 2)
  • You need first/last N rows
  • You’re iterating with indices
  • Working with generic operations (sampling, every Nth row)
  • Position-based logic is more natural

Example:

# Get first 5 companies (position-based makes sense)
top_5 = companies.iloc[:5]
print("First 5 companies:")
print(top_5)

For “first 5 rows”, position-based selection is natural.

Side-by-Side Comparison

# Both can achieve the same result!

# Method 1: .iloc (position-based)
revenue_iloc = companies.iloc[0, 1]  # Must know column position!

# Method 2: .loc (label-based) - MORE READABLE!
revenue_loc = companies.loc['TechCorp Global', 'revenues']

print(f"Using .iloc[0, 1]: ${revenue_iloc:,}")
print(f"Using .loc['TechCorp Global', 'revenues']: ${revenue_loc:,}")
print(f"\nSame result? {revenue_iloc == revenue_loc}")
print("\nBUT .loc is more readable and maintainable!")

Both work, but .loc[] is clearer.

Real-World Scenario

Imagine your boss asks: “What are the revenues for PowerGen Energy?”

Bad approach (requires manual counting):

# You count manually: PowerGen Energy is row 4, revenues is column 1
answer = companies.iloc[4, 1]  # What if data order changes?

Good approach (clear and robust):

answer = companies.loc['PowerGen Energy', 'revenues']  # Self-documenting!

The .loc[] version is:

  • Readable: Anyone knows what you’re selecting
  • Robust: Works even if row order changes
  • Maintainable: Easy to understand months later

When .iloc[] Makes Sense

# Sample 3 random companies (position-based makes sense)
sample_indices = np.random.choice(companies.shape[0], size=3, replace=False)
random_sample = companies.iloc[sample_indices]

print(f"Random sample of 3 companies (positions {sample_indices}):")
random_sample

For random sampling or algorithmic selection, .iloc[] is appropriate.


Selection Syntax Summary

Complete reference for .iloc[]:

# Single row
df.iloc[0]                 # First row
df.iloc[-1]                # Last row

# Multiple rows
df.iloc[[0, 2, 4]]         # Specific rows
df.iloc[0:5]               # First 5 rows (0,1,2,3,4)
df.iloc[-3:]               # Last 3 rows
df.iloc[::2]               # Every other row

# Single element
df.iloc[0, 2]              # Row 0, column 2

# Rows and columns together
df.iloc[:3, :2]            # First 3 rows, first 2 columns
df.iloc[[0,1], [2,3]]      # Specific rows and columns
df.iloc[-5:, -3:]          # Last 5 rows, last 3 columns
df.iloc[::2, 1:4]          # Every other row, columns 1-3

# All rows or all columns
df.iloc[:, 0]              # All rows, first column
df.iloc[0, :]              # First row, all columns

Visual guide:

DataFrame positions:
         Col0  Col1  Col2  Col3
Row0      a     b     c     d
Row1      e     f     g     h
Row2      i     j     k     l

df.iloc[0, 1]           → b (single value)
df.iloc[0]              → [a, b, c, d] (Series)
df.iloc[[0, 2]]         → Rows 0 and 2 (DataFrame)
df.iloc[0:2]            → Rows 0 and 1 (exclusive!)
df.iloc[:, 1]           → [b, f, j] (Series)
df.iloc[:, [0, 2]]      → Columns 0 and 2 (DataFrame)
df.iloc[0:2, 1:3]       → 2x2 subset (DataFrame)

Practice Exercises

Apply position-based selection with these exercises.

Exercise 1: Basic .iloc[] Operations

  1. Get the second row
  2. Get rows 3, 4, 5 using a slice
  3. Get the last 3 rows
# Your code here

Hint

Remember: .iloc[] slicing is exclusive on the end, just like Python!

Exercise 2: Rows and Columns Together

  1. Get the value at row 2, column 3
  2. Get first 4 rows and first 2 columns
  3. Get all rows, columns 1, 3, and 4
# Your code here

Exercise 3: .loc[] vs .iloc[] Comparison

For the company ‘FirstBank Holdings’:

  1. Get its profits using .loc[]
  2. Get its profits using .iloc[] (find its position first)
  3. Which method is easier and why?
# Your code here

Exercise 4: Advanced Selection

  1. Get every 3rd row (rows 0, 3, 6, 9)
  2. Get the middle 4 rows
  3. Get last 2 rows and last 2 columns
# Your code here

Summary

You now understand position-based selection with .iloc[]. Let’s review the key concepts.

Key Concepts

.iloc[] Uses Integer Positions

  • 0-based indexing like NumPy and Python
  • Position 0 is first row/column
  • Position -1 is last row/column

Slicing is Exclusive on End

  • df.iloc[0:3] gets positions 0, 1, 2 (NOT 3)
  • Different from .loc[] which is inclusive
  • Same as Python list slicing

Negative Indices Work

  • -1 is last row/column
  • -2 is second-to-last
  • df.iloc[-5:] gets last 5 rows

Choose the Right Tool

  • .loc[] for labels → readable, maintainable
  • .iloc[] for positions → generic operations

Syntax Reference

# Rows
df.iloc[0]              # First row
df.iloc[-1]             # Last row
df.iloc[0:5]            # First 5 rows (0-4)
df.iloc[[0,2,4]]        # Specific rows
df.iloc[::2]            # Every other row

# Rows and columns
df.iloc[0, 2]           # Single value
df.iloc[:3, :2]         # First 3 rows, first 2 cols
df.iloc[[0,1], [2,3]]   # Specific rows and cols
df.iloc[-5:, -3:]       # Last 5 rows, last 3 cols

.loc[] vs .iloc[] Decision Guide

SituationUse .loc[]Use .iloc[]
Know the name✅ Yes❌ No
Know the position❌ No✅ Yes
First/last N rows❌ No✅ Yes
Specific company/ID✅ Yes❌ No
Random sampling❌ No✅ Yes
Business logic✅ Yes❌ No
Generic operations❌ No✅ Yes

Important Reminders

  • Positions, not labels: Use integers (0, 1, 2, …)
  • Exclusive slicing: End position not included in slice
  • Negative indexing: Count from the end with -1, -2, etc.
  • Use lists for multiple: [0, 2, 4] for specific positions
  • Prefer .loc[]: More readable for production code
  • Use .iloc[]: For truly position-based operations

Next Steps

You can now select data using both labels and positions. In the next lesson, you will learn Series operations and value counting—essential for analyzing categorical data.

Continue to Lesson 5 - Series Operations & Value Counts

Learn to perform calculations on Series and analyze categorical data

Back to Lesson 3 - Selecting with .loc[]

Review label-based selection using meaningful names


Master Both Selection Methods

Label-based selection with .loc[] makes code readable. Position-based selection with .iloc[] handles generic operations. Knowing both makes you a versatile pandas user.

Prefer .loc[] when possible—it’s more maintainable. Use .iloc[] when position-based logic makes sense!