Lesson 8 - Sorting and Ranking

Ordering Data for Analysis

You can filter data to get specific rows. Now you will learn to sort data—ordering rows to identify top performers, create rankings, and organize results for analysis.

By the end of this lesson, you will be able to:

Sort DataFrame by single or multiple columns
Sort in ascending or descending order
Find top N and bottom N values
Understand the inplace parameter
Create rankings and leaderboards
Sort by index

Sorting reveals patterns in data—who are the top sellers, which products cost the most, what are the worst-performing investments. Let’s master these essential skills.

Understanding Sorting

Sorting reorders DataFrame rows based on column values. This helps you:

Identify top performers: Highest revenue, most employees
Find outliers: Lowest profits, smallest companies
Create rankings: First place, second place, third place
Organize data: Alphabetical order, chronological order

Let’s create sample data:

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000, 145000, 165000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000, 18000, 28000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900, 3200, 6700],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany', 'USA', 'UK']
})

print(f"Original data (unsorted):")
companies.head()

Sort by Single Column

Use .sort_values() to sort by one column.

Ascending Order (Default)

# Sort by revenues (ascending - smallest to largest)
sorted_asc = companies.sort_values('revenues')

print("Sorted by revenues (ascending):")
sorted_asc[['company', 'revenues']]

Output:

             company  revenues
1         FreshMart Inc     89000
5       MediPharm Solutions     98000
6         RetailHub Ltd    112000
0           TechCorp Global    125000
8     SteelCore Industries    145000
2        AutoDrive Motors    156000
9      NetLink Telecom    165000
4      PowerGen Energy    178000
7   SkyWings Airlines     187000
3        FirstBank Holdings    234000

Smallest revenues first, largest last.

Descending Order

# Sort by revenues (descending - largest to smallest)
sorted_desc = companies.sort_values('revenues', ascending=False)

print("Sorted by revenues (descending):")
sorted_desc[['company', 'revenues']]

Output:

             company  revenues
3        FirstBank Holdings    234000
7   SkyWings Airlines     187000
4      PowerGen Energy    178000
9      NetLink Telecom    165000
2        AutoDrive Motors    156000
8     SteelCore Industries    145000
0           TechCorp Global    125000
6         RetailHub Ltd    112000
5       MediPharm Solutions     98000
1         FreshMart Inc     89000

Largest revenues first, smallest last.

Top N Values

# Top 5 companies by revenue
top_5_revenue = companies.sort_values('revenues', ascending=False).head(5)

print("Top 5 by revenue:")
top_5_revenue[['company', 'revenues']]

Output shows the 5 highest-revenue companies.

Using .nlargest() and .nsmallest()

# Alternative: Use .nlargest() for top N
print("Top 5 by revenue (using .nlargest):")
print(companies.nlargest(5, 'revenues')[['company', 'revenues']])

print("\nBottom 3 by revenue (using .nsmallest):")
print(companies.nsmallest(3, 'revenues')[['company', 'revenues']])

Output:

Top 5 by revenue (using .nlargest):
             company  revenues
3        FirstBank Holdings    234000
7   SkyWings Airlines     187000
4      PowerGen Energy    178000
9      NetLink Telecom    165000
2        AutoDrive Motors    156000

Bottom 3 by revenue (using .nsmallest):
          company  revenues
1      FreshMart Inc     89000
5    MediPharm Solutions     98000
6      RetailHub Ltd    112000

.nlargest() and .nsmallest() are shortcuts for finding top/bottom values.

Sort by Different Columns

# Sort by profits to see most/least profitable
most_profitable = companies.sort_values('profits', ascending=False)

print("Companies by profitability:")
most_profitable[['company', 'profits']]

Notice companies with negative profits appear at the bottom.

Sort by Multiple Columns

Sort by first column, then by second column for ties.

Two-Column Sort

# Sort by country, then by revenues within each country
sorted_multi = companies.sort_values(['country', 'revenues'])

print("Sorted by country, then revenues:")
sorted_multi[['company', 'country', 'revenues']]

Output:

             company country  revenues
1         FreshMart Inc    USA     89000
6         RetailHub Ltd     UK    112000
0           TechCorp Global    USA    125000
8     SteelCore Industries    USA    145000
2        AutoDrive Motors    USA    156000
5       MediPharm Solutions    USA     98000
9      NetLink Telecom     UK    165000
4      PowerGen Energy   Germany    178000
7   SkyWings Airlines     Germany    187000
3        FirstBank Holdings     UK    234000

First sorted by country (USA, Germany, UK), then by revenues within each country.

Mixed Sort Directions

# Sort by country (ascending), then revenues (descending)
sorted_mixed = companies.sort_values(
    ['country', 'revenues'],
    ascending=[True, False]  # Country A→Z, Revenues high→low
)

print("Country A→Z, Revenues high→low within each country:")
sorted_mixed[['company', 'country', 'revenues']]

Output:

             company country  revenues
2        AutoDrive Motors    USA    156000
8     SteelCore Industries    USA    145000
0           TechCorp Global    USA    125000
1         FreshMart Inc    USA     89000
5       MediPharm Solutions    USA     98000
7   SkyWings Airlines     Germany    187000
4      PowerGen Energy   Germany    178000
3        FirstBank Holdings     UK    234000
9      NetLink Telecom     UK    165000
6         RetailHub Ltd     UK    112000

Within each country, companies are sorted by revenues from high to low.

Finding Leaders in Each Group

# Sort by sector, then by profits (find best performer per sector)
sector_leaders = companies.sort_values(
    ['sector', 'profits'],
    ascending=[True, False]
)

print("Companies by sector and profitability:")
sector_leaders[['company', 'sector', 'profits']]

This shows the most profitable company in each sector first.

The inplace Parameter

By default, .sort_values() returns a new DataFrame and leaves the original unchanged.

Default Behavior (Returns New DataFrame)

# Default: Creates new DataFrame
companies_sorted = companies.sort_values('revenues')

print("Original DataFrame (still unsorted):")
print(companies[['company', 'revenues']].head(3))

print("\nNew sorted DataFrame:")
print(companies_sorted[['company', 'revenues']].head(3))

Output shows original is unchanged, new DataFrame is sorted.

Using inplace=True

# Using inplace=True modifies the original
# Let's create a copy first to demonstrate
companies_copy = companies.copy()

print("Before inplace sort:")
print(companies_copy[['company', 'revenues']].head(3))

# Sort in place
companies_copy.sort_values('revenues', inplace=True)

print("\nAfter inplace sort:")
print(companies_copy[['company', 'revenues']].head(3))

The original DataFrame is modified directly.

Best Practice

# PREFERRED: Create new DataFrame (safer, clearer)
df_sorted = df.sort_values('col')

# USE CAREFULLY: Modify original (can't undo easily!)
df.sort_values('col', inplace=True)

When to use inplace=True:

Working with very large DataFrames (save memory)
You’re certain you don’t need the original order
Doing many sequential operations

Most of the time, prefer creating new DataFrames—safer and more flexible.

Resetting Index After Sorting

# Reset index after sorting (optional)
sorted_reset = companies.sort_values('revenues').reset_index(drop=True)

print("Sorted with reset index:")
sorted_reset[['company', 'revenues']].head()

Output:

        company  revenues
0    FreshMart Inc     89000
1  MediPharm Solutions     98000
2    RetailHub Ltd    112000
3      TechCorp Global    125000
4 SteelCore Industries   145000

Index now goes 0, 1, 2, 3… instead of original positions.

Practice Exercises

Apply sorting techniques with these exercises.

Exercise 1: Single Column Sorting

Sort companies by number of employees (ascending)
Find the 3 most profitable companies
Find the 3 least profitable companies

# Your code here

Hint

Negative profits are less than positive profits—they’ll appear at the bottom when sorting ascending or at the top when sorting descending!

Exercise 2: Multiple Column Sorting

Sort by sector (A→Z), then by revenues (high→low) within each sector
Sort by country, then by number of employees
Find the highest revenue company in each country

# Your code here

Exercise 3: Practical Scenarios

Create a “leaderboard” showing top 5 companies by profits with rank numbers
Find the smallest company (by employees) in each sector
Show all companies ranked by revenue with their rank number

# Your code here

Summary

You now master sorting and ranking in pandas. Let’s review the key concepts.

Key Concepts

Sort by Single Column

Use .sort_values('column')
Default is ascending (small to large)
Use ascending=False for descending

Sort by Multiple Columns

Pass list of column names
Use list for ascending to mix directions
First column is primary sort, second breaks ties

Top/Bottom N Values

.nlargest(n, 'column') for top N
.nsmallest(n, 'column') for bottom N
Faster than sort + head

inplace Parameter

Default creates new DataFrame
inplace=True modifies original
Prefer creating new DataFrames (safer)

Syntax Reference

# Single column sort
df.sort_values('col')                    # Ascending
df.sort_values('col', ascending=False)   # Descending

# Multiple columns
df.sort_values(['col1', 'col2'])         # Both ascending
df.sort_values(['col1', 'col2'],
               ascending=[True, False])   # Mixed directions

# Top/bottom N
df.nlargest(n, 'col')                    # Top N
df.nsmallest(n, 'col')                   # Bottom N

# Inplace modification
df.sort_values('col', inplace=True)      # Modifies original

# Reset index after sorting
df.sort_values('col').reset_index(drop=True)

# Sort by index
df.sort_index()                          # Sort by row index

Common Patterns

# Top 10 by revenue
df.nlargest(10, 'revenue')

# Worst performers
df.nsmallest(5, 'profit')

# Alphabetical by name
df.sort_values('name')

# Group leaders (best in each category)
df.sort_values(['category', 'score'], ascending=[True, False])
   .drop_duplicates('category')

# Create ranking
df_sorted = df.sort_values('score', ascending=False)
df_sorted['rank'] = range(1, len(df_sorted) + 1)

Important Reminders

Returns new DataFrame: Unless inplace=True
Multiple columns: Sort priority follows list order
Negative values: Included in sorting (lowest values)
NaN values: Placed at the end by default
Reset index: Use .reset_index(drop=True) for clean index
Top N shortcut: Use .nlargest() and .nsmallest()
Stable sort: Equal values maintain original relative order

Next Steps

You can now sort data to reveal patterns and create rankings. In the next lesson, you will learn to add and modify columns—creating new calculated fields and transforming existing data.

Continue to Lesson 9 - Adding and Modifying Columns

Learn to create new columns and transform data

Back to Lesson 7 - Boolean Filtering

Review filtering rows based on conditions

Master Data Organization

Sorting is fundamental for understanding data—finding extremes, creating rankings, and organizing results. Combined with filtering and selection, sorting gives you complete control over data organization.

Use sorting to answer “who’s first?” and “what’s best?” questions in your data!

Lesson 7 - Boolean Filtering in Pandas

Lesson 9 - Adding and Modifying Columns

Courses

DATATWEETS

Title here

Lesson 8 - Sorting and Ranking

Ordering Data for Analysis

Understanding Sorting

Sort by Single Column

Ascending Order (Default)

Descending Order

Top N Values

Using .nlargest() and .nsmallest()

Sort by Different Columns

Sort by Multiple Columns

Two-Column Sort

Mixed Sort Directions

Finding Leaders in Each Group

The inplace Parameter

Default Behavior (Returns New DataFrame)

Using inplace=True

Best Practice

Resetting Index After Sorting

Practice Exercises

Exercise 1: Single Column Sorting

Exercise 2: Multiple Column Sorting

Exercise 3: Practical Scenarios

Summary

Key Concepts

Syntax Reference

Common Patterns

Important Reminders

Next Steps

Continue to Lesson 9 - Adding and Modifying Columns

Back to Lesson 7 - Boolean Filtering

Master Data Organization

Lesson 8 - Sorting and Ranking

Ordering Data for Analysis#

Understanding Sorting#

Sort by Single Column#

Ascending Order (Default)#

Descending Order#

Top N Values#

Using .nlargest() and .nsmallest()#

Sort by Different Columns#

Sort by Multiple Columns#

Two-Column Sort#

Mixed Sort Directions#

Finding Leaders in Each Group#

The inplace Parameter#

Default Behavior (Returns New DataFrame)#

Using inplace=True#

Best Practice#

Resetting Index After Sorting#

Practice Exercises#

Exercise 1: Single Column Sorting#

Exercise 2: Multiple Column Sorting#

Exercise 3: Practical Scenarios#

Summary#

Key Concepts#

Syntax Reference#

Common Patterns#

Important Reminders#

Next Steps#

Continue to Lesson 9 - Adding and Modifying Columns

Back to Lesson 7 - Boolean Filtering

Master Data Organization#

Ordering Data for Analysis

Understanding Sorting

Sort by Single Column

Ascending Order (Default)

Descending Order

Top N Values

Using .nlargest() and .nsmallest()

Sort by Different Columns

Sort by Multiple Columns

Two-Column Sort

Mixed Sort Directions

Finding Leaders in Each Group

The inplace Parameter

Default Behavior (Returns New DataFrame)

Using inplace=True

Best Practice

Resetting Index After Sorting

Practice Exercises

Exercise 1: Single Column Sorting

Exercise 2: Multiple Column Sorting

Exercise 3: Practical Scenarios

Summary

Key Concepts

Syntax Reference

Common Patterns

Important Reminders

Next Steps

Master Data Organization