Lesson 8 - Sorting and Ranking
Ordering Data for Analysis
You can filter data to get specific rows. Now you will learn to sort data—ordering rows to identify top performers, create rankings, and organize results for analysis.
By the end of this lesson, you will be able to:
- Sort DataFrame by single or multiple columns
- Sort in ascending or descending order
- Find top N and bottom N values
- Understand the
inplaceparameter - Create rankings and leaderboards
- Sort by index
Sorting reveals patterns in data—who are the top sellers, which products cost the most, what are the worst-performing investments. Let’s master these essential skills.
Understanding Sorting
Sorting reorders DataFrame rows based on column values. This helps you:
- Identify top performers: Highest revenue, most employees
- Find outliers: Lowest profits, smallest companies
- Create rankings: First place, second place, third place
- Organize data: Alphabetical order, chronological order
Let’s create sample data:
import pandas as pd
import numpy as np
# Create companies dataset
companies = pd.DataFrame({
'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
'revenues': [125000, 89000, 156000, 234000, 178000,
98000, 112000, 187000, 145000, 165000],
'profits': [12000, 8500, -3000, 45000, 23000,
15000, 9800, 21000, 18000, 28000],
'employees': [1200, 890, 2300, 5600, 3400,
2100, 4500, 8900, 3200, 6700],
'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
'USA', 'UK', 'Germany', 'USA', 'UK']
})
print(f"Original data (unsorted):")
companies.head()Sort by Single Column
Use .sort_values() to sort by one column.
Ascending Order (Default)
# Sort by revenues (ascending - smallest to largest)
sorted_asc = companies.sort_values('revenues')
print("Sorted by revenues (ascending):")
sorted_asc[['company', 'revenues']]Output:
company revenues
1 FreshMart Inc 89000
5 MediPharm Solutions 98000
6 RetailHub Ltd 112000
0 TechCorp Global 125000
8 SteelCore Industries 145000
2 AutoDrive Motors 156000
9 NetLink Telecom 165000
4 PowerGen Energy 178000
7 SkyWings Airlines 187000
3 FirstBank Holdings 234000Smallest revenues first, largest last.
Descending Order
# Sort by revenues (descending - largest to smallest)
sorted_desc = companies.sort_values('revenues', ascending=False)
print("Sorted by revenues (descending):")
sorted_desc[['company', 'revenues']]Output:
company revenues
3 FirstBank Holdings 234000
7 SkyWings Airlines 187000
4 PowerGen Energy 178000
9 NetLink Telecom 165000
2 AutoDrive Motors 156000
8 SteelCore Industries 145000
0 TechCorp Global 125000
6 RetailHub Ltd 112000
5 MediPharm Solutions 98000
1 FreshMart Inc 89000Largest revenues first, smallest last.
Top N Values
# Top 5 companies by revenue
top_5_revenue = companies.sort_values('revenues', ascending=False).head(5)
print("Top 5 by revenue:")
top_5_revenue[['company', 'revenues']]Output shows the 5 highest-revenue companies.
Using .nlargest() and .nsmallest()
# Alternative: Use .nlargest() for top N
print("Top 5 by revenue (using .nlargest):")
print(companies.nlargest(5, 'revenues')[['company', 'revenues']])
print("\nBottom 3 by revenue (using .nsmallest):")
print(companies.nsmallest(3, 'revenues')[['company', 'revenues']])Output:
Top 5 by revenue (using .nlargest):
company revenues
3 FirstBank Holdings 234000
7 SkyWings Airlines 187000
4 PowerGen Energy 178000
9 NetLink Telecom 165000
2 AutoDrive Motors 156000
Bottom 3 by revenue (using .nsmallest):
company revenues
1 FreshMart Inc 89000
5 MediPharm Solutions 98000
6 RetailHub Ltd 112000.nlargest() and .nsmallest() are shortcuts for finding top/bottom values.
Sort by Different Columns
# Sort by profits to see most/least profitable
most_profitable = companies.sort_values('profits', ascending=False)
print("Companies by profitability:")
most_profitable[['company', 'profits']]Notice companies with negative profits appear at the bottom.
Sort by Multiple Columns
Sort by first column, then by second column for ties.
Two-Column Sort
# Sort by country, then by revenues within each country
sorted_multi = companies.sort_values(['country', 'revenues'])
print("Sorted by country, then revenues:")
sorted_multi[['company', 'country', 'revenues']]Output:
company country revenues
1 FreshMart Inc USA 89000
6 RetailHub Ltd UK 112000
0 TechCorp Global USA 125000
8 SteelCore Industries USA 145000
2 AutoDrive Motors USA 156000
5 MediPharm Solutions USA 98000
9 NetLink Telecom UK 165000
4 PowerGen Energy Germany 178000
7 SkyWings Airlines Germany 187000
3 FirstBank Holdings UK 234000First sorted by country (USA, Germany, UK), then by revenues within each country.
Mixed Sort Directions
# Sort by country (ascending), then revenues (descending)
sorted_mixed = companies.sort_values(
['country', 'revenues'],
ascending=[True, False] # Country A→Z, Revenues high→low
)
print("Country A→Z, Revenues high→low within each country:")
sorted_mixed[['company', 'country', 'revenues']]Output:
company country revenues
2 AutoDrive Motors USA 156000
8 SteelCore Industries USA 145000
0 TechCorp Global USA 125000
1 FreshMart Inc USA 89000
5 MediPharm Solutions USA 98000
7 SkyWings Airlines Germany 187000
4 PowerGen Energy Germany 178000
3 FirstBank Holdings UK 234000
9 NetLink Telecom UK 165000
6 RetailHub Ltd UK 112000Within each country, companies are sorted by revenues from high to low.
Finding Leaders in Each Group
# Sort by sector, then by profits (find best performer per sector)
sector_leaders = companies.sort_values(
['sector', 'profits'],
ascending=[True, False]
)
print("Companies by sector and profitability:")
sector_leaders[['company', 'sector', 'profits']]This shows the most profitable company in each sector first.
The inplace Parameter
By default, .sort_values() returns a new DataFrame and leaves the original unchanged.
Default Behavior (Returns New DataFrame)
# Default: Creates new DataFrame
companies_sorted = companies.sort_values('revenues')
print("Original DataFrame (still unsorted):")
print(companies[['company', 'revenues']].head(3))
print("\nNew sorted DataFrame:")
print(companies_sorted[['company', 'revenues']].head(3))Output shows original is unchanged, new DataFrame is sorted.
Using inplace=True
# Using inplace=True modifies the original
# Let's create a copy first to demonstrate
companies_copy = companies.copy()
print("Before inplace sort:")
print(companies_copy[['company', 'revenues']].head(3))
# Sort in place
companies_copy.sort_values('revenues', inplace=True)
print("\nAfter inplace sort:")
print(companies_copy[['company', 'revenues']].head(3))The original DataFrame is modified directly.
Best Practice
# PREFERRED: Create new DataFrame (safer, clearer)
df_sorted = df.sort_values('col')
# USE CAREFULLY: Modify original (can't undo easily!)
df.sort_values('col', inplace=True)When to use inplace=True:
- Working with very large DataFrames (save memory)
- You’re certain you don’t need the original order
- Doing many sequential operations
Most of the time, prefer creating new DataFrames—safer and more flexible.
Resetting Index After Sorting
# Reset index after sorting (optional)
sorted_reset = companies.sort_values('revenues').reset_index(drop=True)
print("Sorted with reset index:")
sorted_reset[['company', 'revenues']].head()Output:
company revenues
0 FreshMart Inc 89000
1 MediPharm Solutions 98000
2 RetailHub Ltd 112000
3 TechCorp Global 125000
4 SteelCore Industries 145000Index now goes 0, 1, 2, 3… instead of original positions.
Practice Exercises
Apply sorting techniques with these exercises.
Exercise 1: Single Column Sorting
- Sort companies by number of employees (ascending)
- Find the 3 most profitable companies
- Find the 3 least profitable companies
# Your code hereHint
Negative profits are less than positive profits—they’ll appear at the bottom when sorting ascending or at the top when sorting descending!
Exercise 2: Multiple Column Sorting
- Sort by sector (A→Z), then by revenues (high→low) within each sector
- Sort by country, then by number of employees
- Find the highest revenue company in each country
# Your code hereExercise 3: Practical Scenarios
- Create a “leaderboard” showing top 5 companies by profits with rank numbers
- Find the smallest company (by employees) in each sector
- Show all companies ranked by revenue with their rank number
# Your code hereSummary
You now master sorting and ranking in pandas. Let’s review the key concepts.
Key Concepts
Sort by Single Column
- Use
.sort_values('column') - Default is ascending (small to large)
- Use
ascending=Falsefor descending
Sort by Multiple Columns
- Pass list of column names
- Use list for
ascendingto mix directions - First column is primary sort, second breaks ties
Top/Bottom N Values
.nlargest(n, 'column')for top N.nsmallest(n, 'column')for bottom N- Faster than sort + head
inplace Parameter
- Default creates new DataFrame
inplace=Truemodifies original- Prefer creating new DataFrames (safer)
Syntax Reference
# Single column sort
df.sort_values('col') # Ascending
df.sort_values('col', ascending=False) # Descending
# Multiple columns
df.sort_values(['col1', 'col2']) # Both ascending
df.sort_values(['col1', 'col2'],
ascending=[True, False]) # Mixed directions
# Top/bottom N
df.nlargest(n, 'col') # Top N
df.nsmallest(n, 'col') # Bottom N
# Inplace modification
df.sort_values('col', inplace=True) # Modifies original
# Reset index after sorting
df.sort_values('col').reset_index(drop=True)
# Sort by index
df.sort_index() # Sort by row indexCommon Patterns
# Top 10 by revenue
df.nlargest(10, 'revenue')
# Worst performers
df.nsmallest(5, 'profit')
# Alphabetical by name
df.sort_values('name')
# Group leaders (best in each category)
df.sort_values(['category', 'score'], ascending=[True, False])
.drop_duplicates('category')
# Create ranking
df_sorted = df.sort_values('score', ascending=False)
df_sorted['rank'] = range(1, len(df_sorted) + 1)Important Reminders
- Returns new DataFrame: Unless
inplace=True - Multiple columns: Sort priority follows list order
- Negative values: Included in sorting (lowest values)
- NaN values: Placed at the end by default
- Reset index: Use
.reset_index(drop=True)for clean index - Top N shortcut: Use
.nlargest()and.nsmallest() - Stable sort: Equal values maintain original relative order
Next Steps
You can now sort data to reveal patterns and create rankings. In the next lesson, you will learn to add and modify columns—creating new calculated fields and transforming existing data.
Continue to Lesson 9 - Adding and Modifying Columns
Learn to create new columns and transform data
Back to Lesson 7 - Boolean Filtering
Review filtering rows based on conditions
Master Data Organization
Sorting is fundamental for understanding data—finding extremes, creating rankings, and organizing results. Combined with filtering and selection, sorting gives you complete control over data organization.
Use sorting to answer “who’s first?” and “what’s best?” questions in your data!