Lesson 4 - Ranking Functions: ROW_NUMBER, RANK, and NTILE
Welcome to Ranking Functions
Some of the most common analytical questions are about order and position: Which observation is best, second best, or worst? What are the top three products in each category? Which customers fall in the top 25% by spending? These are hard to answer with ordinary SQL, but ranking window functions make them straightforward.
In this lesson you will meet four ranking functions. The first three — ROW_NUMBER(), RANK(), and DENSE_RANK() — assign a sequential number to each row. The fourth, NTILE(), splits rows into a fixed number of buckets. The differences between them are subtle but important, and choosing the right one is the whole game.
By the end of this lesson, you will be able to:
- Explain the purpose of ranking window functions
- Use
ROW_NUMBER(),RANK(), andDENSE_RANK()and describe how each handles ties - Use
NTILE()to divide rows into equal-sized buckets - Combine ranking functions with CTEs to solve top-N and “nth place” problems
This lesson assumes you are comfortable with OVER(), PARTITION BY, and ORDER BY. Let’s dive in.
Data for this lesson
Engine: PostgreSQL — the examples print small sample tables inline.
Tables used: trips, employees
Want to run them yourself? Download window_practice.db and use any SQLite client (SQLite supports window functions too), or recreate the tables in PostgreSQL with psql.
How Ranking Functions Use the Window
All four ranking functions share the same windowing rules:
PARTITION BYis optional. Without it, the whole result set is one partition.ORDER BYis technically optional but strongly required in practice — it determines the rank. Without an ordering, the ranks are meaningless and unpredictable.- Framing does not apply. Ranking is always relative to the entire partition.
The simplest form is:
RANKING_FUNCTION() OVER (
[PARTITION BY column]
ORDER BY column
)ROW_NUMBER(), RANK(), and DENSE_RANK() take no arguments, but the empty parentheses are required. NTILE(n) takes the number of buckets.
For the examples, we will use a trips table of twelve bike trips. Each row has the usual trip details plus a rider_rating column from 2 to 5 indicating how satisfied the rider was.
The ROW_NUMBER() Function
ROW_NUMBER() assigns a unique integer to every row in the partition, starting at 1, in the order set by ORDER BY. Even rows with identical values get distinct numbers.
SELECT start_date, bike_number, member_type, rider_rating,
ROW_NUMBER() OVER(ORDER BY rider_rating)
FROM trips;start_date | bike_number | member_type | rider_rating | row_number
--------------------+-------------+-------------+--------------+-----------
2017-10-01 05:01:00 | W23254 | Member | 2 | 1
2017-10-01 05:01:00 | W00143 | Member | 3 | 2
2017-10-01 03:08:00 | W23272 | Member | 3 | 3
2017-10-04 05:21:00 | W22051 | Casual | 3 | 4
2017-10-05 08:08:00 | W20184 | Member | 3 | 5
2017-10-05 08:08:00 | W00895 | Casual | 4 | 6
... | | | |
2017-10-04 08:30:00 | W22517 | Casual | 5 | 12Notice that the four rows with rider_rating = 3 still get distinct numbers (2, 3, 4, 5). Add PARTITION BY to restart the numbering within each group:
SELECT start_date, bike_number, member_type, rider_rating,
ROW_NUMBER() OVER(PARTITION BY member_type
ORDER BY rider_rating)
FROM trips;Now the Casual partition runs 1 through 5 and the Member partition restarts at 1.
ROW_NUMBER() is nondeterministic on ties
When the ORDER BY columns are not unique, there is no guarantee ROW_NUMBER() assigns the same number to tied rows on every run. For reproducible results, order by something unique (or add a tiebreaker column to the ORDER BY).
The RANK() Function
RANK() looks like ROW_NUMBER() but treats ties differently: rows with equal ORDER BY values get the same rank, and the next distinct value skips ahead, leaving gaps.
To find the highest-rated trips, order by rider_rating descending:
SELECT start_date, bike_number, member_type, rider_rating,
RANK() OVER(ORDER BY rider_rating DESC)
FROM trips;start_date | bike_number | member_type | rider_rating | rank
--------------------+-------------+-------------+--------------+-----
2017-10-04 08:30:00 | W22517 | Casual | 5 | 1
2017-10-04 04:58:00 | W23052 | Casual | 5 | 1
2017-10-03 12:00:00 | W22965 | Casual | 5 | 1
2017-10-02 03:30:00 | W21096 | Member | 4 | 4
2017-10-03 09:34:00 | W20095 | Member | 4 | 4
... | | | |Three trips share rating 5 and all rank 1. The next rating (4) ranks 4, not 2, because three rows already occupy positions 1 through 3. RANK() answers “how many rows are ahead of this one?” — so the ranks can skip numbers.
PARTITION BY resets the ranking per group:
SELECT start_date, bike_number, member_type, rider_rating,
RANK() OVER(PARTITION BY member_type
ORDER BY rider_rating DESC)
FROM trips;The Casual and Member partitions each restart at rank 1.
The DENSE_RANK() Function
DENSE_RANK() also gives tied rows the same rank, but it does not skip numbers. After a tie, the next distinct value gets the very next integer.
SELECT start_date, bike_number, member_type, rider_rating,
DENSE_RANK() OVER(ORDER BY rider_rating DESC)
FROM trips;start_date | bike_number | member_type | rider_rating | dense_rank
--------------------+-------------+-------------+--------------+-----------
2017-10-04 08:30:00 | W22517 | Casual | 5 | 1
2017-10-04 04:58:00 | W23052 | Casual | 5 | 1
2017-10-03 12:00:00 | W22965 | Casual | 5 | 1
2017-10-02 03:30:00 | W21096 | Member | 4 | 2
2017-10-03 09:34:00 | W20095 | Member | 4 | 2
2017-10-04 06:07:00 | W23268 | Member | 4 | 2
2017-10-05 08:08:00 | W00895 | Casual | 4 | 2
2017-10-04 05:21:00 | W22051 | Casual | 3 | 3
... | | | |
2017-10-01 05:01:00 | W23254 | Member | 2 | 4The three top trips rank 1; the rating-4 trips rank 2 (no gap). DENSE_RANK() answers “how many distinct values are ahead of this one?”
Comparing the Three
Running all three side by side over rider_rating DESC makes the difference unmistakable:
SELECT start_date, bike_number, member_type, rider_rating,
ROW_NUMBER() OVER(ORDER BY rider_rating DESC),
RANK() OVER(ORDER BY rider_rating DESC),
DENSE_RANK() OVER(ORDER BY rider_rating DESC)
FROM trips;rider_rating | row_number | rank | dense_rank
-------------+------------+------+-----------
5 | 1 | 1 | 1
5 | 2 | 1 | 1
5 | 3 | 1 | 1
4 | 4 | 4 | 2
4 | 5 | 4 | 2
4 | 6 | 4 | 2
4 | 7 | 4 | 2
3 | 8 | 8 | 3
3 | 9 | 8 | 3
3 | 10 | 8 | 3
3 | 11 | 8 | 3
2 | 12 | 12 | 4ROW_NUMBERis always unique (1–12).RANKrepeats on ties and skips numbers (1, 1, 1, 4, …).DENSE_RANKrepeats on ties but never skips (1, 1, 1, 2, …).
Pick ROW_NUMBER() when you need a strict, unique ordering (such as keeping exactly one row per group). Pick RANK() when ties should share a position and you want gaps to reflect how many rows tied. Pick DENSE_RANK() when you want consecutive rank numbers regardless of ties (such as “second highest distinct value”).
The NTILE() Function
NTILE(n) is different from the other three: instead of numbering rows, it divides the ordered rows into n roughly equal buckets and labels each row with its bucket number, from 1 to n. It is the go-to tool for segmenting data into quartiles, deciles, or any equal-sized groups.
SELECT start_date, bike_number, duration, rider_rating,
NTILE(5) OVER(ORDER BY duration DESC)
FROM trips;With twelve rows split into five buckets, the longest trips land in bucket 1 and the shortest in bucket 5. Because 12 does not divide evenly by 5, some buckets get one extra row.
How NTILE distributes uneven rows
When the row count is not divisible by the number of buckets, NTILE() makes the earlier buckets larger. With 12 rows and 5 buckets, buckets 1 and 2 get 3 rows each and buckets 3, 4, and 5 get 2 rows each. The bigger buckets always come first.
A common use is segmentation. Dividing each day’s trips into two buckets by rating — best-rated bikes in bucket 1, worst in bucket 2 — helps the company prioritize maintenance:
SELECT start_date, bike_number, rider_rating,
NTILE(2) OVER(PARTITION BY EXTRACT(DAY FROM start_date)
ORDER BY rider_rating DESC)
FROM trips;Solving Real Problems with Ranking and CTEs
Ranking functions truly shine when you combine them with a common table expression (CTE) and then filter on the rank. Remember from Lesson 3 that you cannot put a window function in WHERE, so you compute the rank in a CTE and filter in the outer query.
The Longest Trip Each Day
Partition by day, rank by duration, and keep rank 1:
WITH longtrips AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY EXTRACT(DAY FROM start_date)
ORDER BY duration DESC) AS row_number
FROM trips
)
SELECT *
FROM longtrips
WHERE row_number = 1;The CTE numbers trips within each day, longest first. The outer query keeps only the top trip per day. Using ROW_NUMBER() guarantees exactly one row per day even if two trips tie on duration.
The Second-Highest-Rated Trip Each Day
Here DENSE_RANK() is the right choice, because “second highest distinct rating” means rank 2 with no gaps:
WITH rating AS (
SELECT start_date, duration, bike_number, rider_rating,
DENSE_RANK() OVER(PARTITION BY EXTRACT(DAY FROM start_date)
ORDER BY rider_rating DESC) AS rank
FROM trips
)
SELECT *
FROM rating
WHERE rank = 2;This returns each day’s second-best rating, even when several trips tie for first.
Practice Exercises
Use the trips table.
Exercise 1: Number Trips by Start Date
Return start_date, bike_number, member_type, rider_rating, and a row number assigned in order of start_date.
-- Your code hereHint
Use ROW_NUMBER() OVER(ORDER BY start_date). No PARTITION BY is needed for a single sequence across all rows.
Exercise 2: Longest Trips per Day
Return all columns plus a dense rank of each day’s trips by duration (longest = rank 1), aliased trip_dense_rank, then keep only the rows where that rank equals 1.
Hint
Put DENSE_RANK() OVER(PARTITION BY EXTRACT(DAY FROM start_date) ORDER BY duration DESC) AS trip_dense_rank in a CTE, then filter WHERE trip_dense_rank = 1 in the outer query.
Exercise 3: Bonus Buckets
Split riders into four groups by duration using NTILE(4), then add 1 to the bucket number to compute a bonus — the first group earns USD 5, the last earns USD 2. Return start_date, bike_number, member_type, duration, and the bonus, sorted by bonus descending.
Hint
Use NTILE(4) OVER(ORDER BY duration) + 1 AS bonus. The shortest trips fall in bucket 4 (bonus 5) and the longest in bucket 1 — adjust the ordering to match the reward you want, then ORDER BY bonus DESC.
Summary
You learned four ranking window functions and, crucially, how each treats ties. ROW_NUMBER() numbers every row uniquely; RANK() shares a rank on ties and leaves gaps; DENSE_RANK() shares a rank but never skips; and NTILE(n) splits rows into equal buckets. Combined with a CTE, these functions answer top-N and nth-place questions cleanly, since you compute the rank first and filter on it afterward.
Key Concepts
ROW_NUMBER()— a unique sequential integer per row; ideal for “exactly one row per group.”RANK()— equal values share a rank and the next value skips ahead, so ranks can have gaps.DENSE_RANK()— equal values share a rank but the next value is consecutive, with no gaps.NTILE(n)— divides ordered rows into n roughly equal buckets, filling earlier buckets first when rows do not divide evenly.- CTE + filter — compute a rank in a CTE, then filter on it in the outer query to extract top-N or nth-place rows.
Why This Matters
Ranking is one of the most requested analytics patterns: leaderboards, top sellers per region, percentile segments for marketing, and “second-best” comparisons all rely on these functions. Knowing the difference between RANK() and DENSE_RANK() prevents subtle off-by-one bugs, and the CTE-plus-filter pattern is the standard professional way to surface top-N rows. You will use these constantly in reporting and data science work.
Next Steps
Continue to Lesson 5 - Offset Functions: LAG, LEAD, and FIRST_VALUE
Compare each row to its neighbors with LAG() and LEAD(), and pull the first, last, or nth value in a window.
Back to Module Overview
Return to the Window Functions module overview
Continue Building Your Skills
Practice predicting which ranking function a question calls for. “How many rows beat this one?” points to RANK(). “What is the third distinct value?” points to DENSE_RANK(). “Give me exactly one winner per group” points to ROW_NUMBER(). “Split customers into quartiles” points to NTILE(). The faster you map a business question to the right function, the more naturally these queries will come.