Lesson 8 - Sorting and Ranking

Ordering Data for Analysis

You can filter data to get specific rows. Now you will learn to sort data—ordering rows to identify top performers, create rankings, and organize results for analysis.

By the end of this lesson, you will be able to:

  • Sort DataFrame by single or multiple columns
  • Sort in ascending or descending order
  • Find top N and bottom N values
  • Understand the inplace parameter
  • Create rankings and leaderboards
  • Sort by index

Sorting reveals patterns in data—who are the top sellers, which products cost the most, what are the worst-performing investments. Let’s master these essential skills.


Understanding Sorting

Sorting reorders DataFrame rows based on column values. This helps you:

  • Identify top performers: Highest revenue, most employees
  • Find outliers: Lowest profits, smallest companies
  • Create rankings: First place, second place, third place
  • Organize data: Alphabetical order, chronological order

Let’s create sample data:

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000, 145000, 165000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000, 18000, 28000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900, 3200, 6700],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany', 'USA', 'UK']
})

print(f"Original data (unsorted):")
companies.head()

Sort by Single Column

Use .sort_values() to sort by one column.

Ascending Order (Default)

# Sort by revenues (ascending - smallest to largest)
sorted_asc = companies.sort_values('revenues')

print("Sorted by revenues (ascending):")
sorted_asc[['company', 'revenues']]

Output:

             company  revenues
1         FreshMart Inc     89000
5       MediPharm Solutions     98000
6         RetailHub Ltd    112000
0           TechCorp Global    125000
8     SteelCore Industries    145000
2        AutoDrive Motors    156000
9      NetLink Telecom    165000
4      PowerGen Energy    178000
7   SkyWings Airlines     187000
3        FirstBank Holdings    234000

Smallest revenues first, largest last.

Descending Order

# Sort by revenues (descending - largest to smallest)
sorted_desc = companies.sort_values('revenues', ascending=False)

print("Sorted by revenues (descending):")
sorted_desc[['company', 'revenues']]

Output:

             company  revenues
3        FirstBank Holdings    234000
7   SkyWings Airlines     187000
4      PowerGen Energy    178000
9      NetLink Telecom    165000
2        AutoDrive Motors    156000
8     SteelCore Industries    145000
0           TechCorp Global    125000
6         RetailHub Ltd    112000
5       MediPharm Solutions     98000
1         FreshMart Inc     89000

Largest revenues first, smallest last.

Top N Values

# Top 5 companies by revenue
top_5_revenue = companies.sort_values('revenues', ascending=False).head(5)

print("Top 5 by revenue:")
top_5_revenue[['company', 'revenues']]

Output shows the 5 highest-revenue companies.

Using .nlargest() and .nsmallest()

# Alternative: Use .nlargest() for top N
print("Top 5 by revenue (using .nlargest):")
print(companies.nlargest(5, 'revenues')[['company', 'revenues']])

print("\nBottom 3 by revenue (using .nsmallest):")
print(companies.nsmallest(3, 'revenues')[['company', 'revenues']])

Output:

Top 5 by revenue (using .nlargest):
             company  revenues
3        FirstBank Holdings    234000
7   SkyWings Airlines     187000
4      PowerGen Energy    178000
9      NetLink Telecom    165000
2        AutoDrive Motors    156000

Bottom 3 by revenue (using .nsmallest):
          company  revenues
1      FreshMart Inc     89000
5    MediPharm Solutions     98000
6      RetailHub Ltd    112000

.nlargest() and .nsmallest() are shortcuts for finding top/bottom values.

Sort by Different Columns

# Sort by profits to see most/least profitable
most_profitable = companies.sort_values('profits', ascending=False)

print("Companies by profitability:")
most_profitable[['company', 'profits']]

Notice companies with negative profits appear at the bottom.


Sort by Multiple Columns

Sort by first column, then by second column for ties.

Two-Column Sort

# Sort by country, then by revenues within each country
sorted_multi = companies.sort_values(['country', 'revenues'])

print("Sorted by country, then revenues:")
sorted_multi[['company', 'country', 'revenues']]

Output:

             company country  revenues
1         FreshMart Inc    USA     89000
6         RetailHub Ltd     UK    112000
0           TechCorp Global    USA    125000
8     SteelCore Industries    USA    145000
2        AutoDrive Motors    USA    156000
5       MediPharm Solutions    USA     98000
9      NetLink Telecom     UK    165000
4      PowerGen Energy   Germany    178000
7   SkyWings Airlines     Germany    187000
3        FirstBank Holdings     UK    234000

First sorted by country (USA, Germany, UK), then by revenues within each country.

Mixed Sort Directions

# Sort by country (ascending), then revenues (descending)
sorted_mixed = companies.sort_values(
    ['country', 'revenues'],
    ascending=[True, False]  # Country A→Z, Revenues high→low
)

print("Country A→Z, Revenues high→low within each country:")
sorted_mixed[['company', 'country', 'revenues']]

Output:

             company country  revenues
2        AutoDrive Motors    USA    156000
8     SteelCore Industries    USA    145000
0           TechCorp Global    USA    125000
1         FreshMart Inc    USA     89000
5       MediPharm Solutions    USA     98000
7   SkyWings Airlines     Germany    187000
4      PowerGen Energy   Germany    178000
3        FirstBank Holdings     UK    234000
9      NetLink Telecom     UK    165000
6         RetailHub Ltd     UK    112000

Within each country, companies are sorted by revenues from high to low.

Finding Leaders in Each Group

# Sort by sector, then by profits (find best performer per sector)
sector_leaders = companies.sort_values(
    ['sector', 'profits'],
    ascending=[True, False]
)

print("Companies by sector and profitability:")
sector_leaders[['company', 'sector', 'profits']]

This shows the most profitable company in each sector first.


The inplace Parameter

By default, .sort_values() returns a new DataFrame and leaves the original unchanged.

Default Behavior (Returns New DataFrame)

# Default: Creates new DataFrame
companies_sorted = companies.sort_values('revenues')

print("Original DataFrame (still unsorted):")
print(companies[['company', 'revenues']].head(3))

print("\nNew sorted DataFrame:")
print(companies_sorted[['company', 'revenues']].head(3))

Output shows original is unchanged, new DataFrame is sorted.

Using inplace=True

# Using inplace=True modifies the original
# Let's create a copy first to demonstrate
companies_copy = companies.copy()

print("Before inplace sort:")
print(companies_copy[['company', 'revenues']].head(3))

# Sort in place
companies_copy.sort_values('revenues', inplace=True)

print("\nAfter inplace sort:")
print(companies_copy[['company', 'revenues']].head(3))

The original DataFrame is modified directly.

Best Practice

# PREFERRED: Create new DataFrame (safer, clearer)
df_sorted = df.sort_values('col')

# USE CAREFULLY: Modify original (can't undo easily!)
df.sort_values('col', inplace=True)

When to use inplace=True:

  • Working with very large DataFrames (save memory)
  • You’re certain you don’t need the original order
  • Doing many sequential operations

Most of the time, prefer creating new DataFrames—safer and more flexible.

Resetting Index After Sorting

# Reset index after sorting (optional)
sorted_reset = companies.sort_values('revenues').reset_index(drop=True)

print("Sorted with reset index:")
sorted_reset[['company', 'revenues']].head()

Output:

        company  revenues
0    FreshMart Inc     89000
1  MediPharm Solutions     98000
2    RetailHub Ltd    112000
3      TechCorp Global    125000
4 SteelCore Industries   145000

Index now goes 0, 1, 2, 3… instead of original positions.


Practice Exercises

Apply sorting techniques with these exercises.

Exercise 1: Single Column Sorting

  1. Sort companies by number of employees (ascending)
  2. Find the 3 most profitable companies
  3. Find the 3 least profitable companies
# Your code here

Hint

Negative profits are less than positive profits—they’ll appear at the bottom when sorting ascending or at the top when sorting descending!

Exercise 2: Multiple Column Sorting

  1. Sort by sector (A→Z), then by revenues (high→low) within each sector
  2. Sort by country, then by number of employees
  3. Find the highest revenue company in each country
# Your code here

Exercise 3: Practical Scenarios

  1. Create a “leaderboard” showing top 5 companies by profits with rank numbers
  2. Find the smallest company (by employees) in each sector
  3. Show all companies ranked by revenue with their rank number
# Your code here

Summary

You now master sorting and ranking in pandas. Let’s review the key concepts.

Key Concepts

Sort by Single Column

  • Use .sort_values('column')
  • Default is ascending (small to large)
  • Use ascending=False for descending

Sort by Multiple Columns

  • Pass list of column names
  • Use list for ascending to mix directions
  • First column is primary sort, second breaks ties

Top/Bottom N Values

  • .nlargest(n, 'column') for top N
  • .nsmallest(n, 'column') for bottom N
  • Faster than sort + head

inplace Parameter

  • Default creates new DataFrame
  • inplace=True modifies original
  • Prefer creating new DataFrames (safer)

Syntax Reference

# Single column sort
df.sort_values('col')                    # Ascending
df.sort_values('col', ascending=False)   # Descending

# Multiple columns
df.sort_values(['col1', 'col2'])         # Both ascending
df.sort_values(['col1', 'col2'],
               ascending=[True, False])   # Mixed directions

# Top/bottom N
df.nlargest(n, 'col')                    # Top N
df.nsmallest(n, 'col')                   # Bottom N

# Inplace modification
df.sort_values('col', inplace=True)      # Modifies original

# Reset index after sorting
df.sort_values('col').reset_index(drop=True)

# Sort by index
df.sort_index()                          # Sort by row index

Common Patterns

# Top 10 by revenue
df.nlargest(10, 'revenue')

# Worst performers
df.nsmallest(5, 'profit')

# Alphabetical by name
df.sort_values('name')

# Group leaders (best in each category)
df.sort_values(['category', 'score'], ascending=[True, False])
   .drop_duplicates('category')

# Create ranking
df_sorted = df.sort_values('score', ascending=False)
df_sorted['rank'] = range(1, len(df_sorted) + 1)

Important Reminders

  • Returns new DataFrame: Unless inplace=True
  • Multiple columns: Sort priority follows list order
  • Negative values: Included in sorting (lowest values)
  • NaN values: Placed at the end by default
  • Reset index: Use .reset_index(drop=True) for clean index
  • Top N shortcut: Use .nlargest() and .nsmallest()
  • Stable sort: Equal values maintain original relative order

Next Steps

You can now sort data to reveal patterns and create rankings. In the next lesson, you will learn to add and modify columns—creating new calculated fields and transforming existing data.

Continue to Lesson 9 - Adding and Modifying Columns

Learn to create new columns and transform data

Back to Lesson 7 - Boolean Filtering

Review filtering rows based on conditions


Master Data Organization

Sorting is fundamental for understanding data—finding extremes, creating rankings, and organizing results. Combined with filtering and selection, sorting gives you complete control over data organization.

Use sorting to answer “who’s first?” and “what’s best?” questions in your data!