Lesson 1 - Introduction to Window Functions
Welcome to Window Functions
So far in SQL, whenever you needed a total, a count, or an average across groups of rows, you reached for GROUP BY. That works well until you hit questions it simply cannot answer: What is the 7-day rolling average of daily sales? How does each employee’s salary compare to their department’s average? What is the running total of revenue over time?
These questions have something in common. They need an aggregate and the individual rows at the same time. GROUP BY cannot give you both — it collapses your rows into summary lines. Window functions can. They are one of SQL’s most powerful analytical tools, and this lesson introduces the idea and the single clause that powers it: OVER().
By the end of this lesson, you will be able to:
- Explain what window functions are and how they differ from
GROUP BY - Name the four categories of window functions
- Define a window with the
OVER()clause - Split a window into groups using
PARTITION BY - Order rows within a window using
ORDER BY
No prior window-function experience is needed — just comfort with SELECT, WHERE, and aggregate functions like SUM() and AVG(). Let’s begin.
Data for this lesson
Engine: PostgreSQL — the examples print small sample tables inline.
Tables used: tbl_bikeshare, trips, employee, apple_sales_quantity_by_month
Want to run them yourself? Download window_practice.db and use any SQLite client (SQLite supports window functions too), or recreate the tables in PostgreSQL with psql.
Window Functions vs. GROUP BY
The fastest way to understand window functions is to compare them with something you already know: the GROUP BY clause.
How GROUP BY Collapses Rows
Adding GROUP BY to a query lets you apply an aggregate function such as SUM, COUNT, or AVG to groups of rows. The catch is that GROUP BY collapses the individual rows that share a group value into a single summary row. You keep the grouping columns and the aggregate, and you lose everything else.
Consider a table of bike-share trips called tbl_bikeshare. Each row is one trip, and the member_type column is either Member (a monthly membership) or Casual (a 3-day pass). Here are a few rows:
start_date | duration | bike_number | member_type
----------------------+----------+-------------+------------
2017-10-01 01:00:00 | 285 | W22821 | Member
2017-10-01 01:01:00 | 816 | W00940 | Member
2017-10-01 01:01:00 | 2000 | W00201 | Casual
2017-10-01 01:02:00 | 798 | W23360 | Member
2017-10-01 01:02:00 | 3978 | W22595 | CasualTo get the average trip duration for each member type, you would write:
SELECT member_type, AVG(duration) AS avg_trip_duration
FROM tbl_bikeshare
GROUP BY member_type;member_type | avg_trip_duration
------------+-------------------
Casual | 2745.67
Member | 633.00The result has just two rows — one summary per group. You can no longer see any individual trip, its bike_number, or its exact duration. That detail is gone.
How Window Functions Keep Every Row
A window function performs a calculation across a set of rows that are related to the current row, and it returns a value for every row without collapsing anything. The aggregate appears as just another column alongside all your detail columns.
Here is the same average, computed as a window function:
SELECT bike_number, member_type, duration,
AVG(duration) OVER(PARTITION BY member_type) AS avg_trip_duration
FROM tbl_bikeshare;bike_number | member_type | duration | avg_trip_duration
------------+-------------+----------+------------------
W00201 | Casual | 2000 | 2745.67
W22595 | Casual | 3978 | 2745.67
W20604 | Casual | 2259 | 2745.67
W22821 | Member | 285 | 633.00
W00940 | Member | 816 | 633.00
W23360 | Member | 798 | 633.00Every trip is still here. Each row carries its own duration and the average for its member type. That is the superpower: you can now compare a single trip against the group it belongs to, all in one query. The keyword that makes this happen is OVER(), and the rest of this lesson is about what goes inside it.
The Four Types of Window Functions
Window functions come in four families. You will meet each one in detail in later lessons, but it helps to see the map now.
- Aggregate functions — the familiar
SUM(),AVG(),COUNT(),MIN(), andMAX(), but applied over a window instead of aGROUP BY. Covered in Lesson 3. - Ranking functions —
ROW_NUMBER(),RANK(),DENSE_RANK(), andNTILE(), which number and rank rows within a window. Covered in Lesson 4. - Offset functions —
LAG(),LEAD(),FIRST_VALUE(),LAST_VALUE(), andNTH_VALUE(), which pull a value from another row relative to or independent of the current row. Covered in Lesson 5. - Distribution functions —
CUME_DIST()andPERCENT_RANK(), used for statistical analysis of where a value falls in a distribution. Covered in Lesson 6.
What unites all four is that they are defined by an OVER() clause. Let’s look at that clause now.
The Window Function Syntax
A window function appears in the SELECT list and always pairs a function name with the OVER() clause:
function_name([expression]) OVER (
[partition_definition]
[order_definition]
[frame_definition]
) [AS column_name]The OVER() clause defines the window — the set of rows the function looks at when it computes a value for each row. It supports three building blocks:
PARTITION BYdivides the result set into partitions (groups) based on one or more columns. The function is applied separately to each partition.ORDER BYsets the order in which rows are processed inside the window, ascending (ASC) or descending (DESC).- The frame definition narrows the window to a specific band of rows around the current row. Framing is an advanced topic you will meet in Lesson 2.
Not every element is required for every function. In fact, the OVER() clause can even be empty — and that turns out to be a useful starting point.
The Empty OVER() Clause
When you write OVER() with nothing inside it, you define a window that covers the entire result set. The function is computed once across all rows and that single value is attached to every row.
Consider an employee table:
employee_id | last_name | first_name | department | title | hire_date | salary
------------+-----------+------------+------------+---------------------+------------+----------
1 | Adams | Andrew | Management | General Manager | 2002-08-13 | 108000.00
2 | Edwards | Nancy | Sales | Sales Manager | 2002-04-30 | 98900.00
3 | Peacock | Jane | Sales | Sales Support Agent | 2002-03-31 | 87000.00
4 | Park | Margaret | Sales | Sales Support Agent | 2003-05-02 | 69800.00
5 | Johnson | Steve | Sales | Sales Support Agent | 2003-10-16 | 76500.00
6 | Mitchell | Michael | IT | IT Manager | 2003-10-16 | 89900.00
7 | King | Robert | IT | IT Staff | 2004-01-01 | 67800.00
8 | Callahan | Laura | IT | IT Staff | 2004-03-03 | 78000.00
9 | John | Edward | IT | IT Staff | 2004-09-18 | 75900.00Suppose you want each employee’s details next to the company-wide average salary. With an empty OVER():
SELECT first_name, last_name, department, title, salary,
AVG(salary) OVER() AS average_salary
FROM employee;first_name | last_name | department | title | salary | average_salary
-----------+-----------+------------+---------------------+-----------+---------------
Andrew | Adams | Management | General Manager | 108000.00 | 83533.33
Nancy | Edwards | Sales | Sales Manager | 98900.00 | 83533.33
Jane | Peacock | Sales | Sales Support Agent | 87000.00 | 83533.33
Margaret | Park | Sales | Sales Support Agent | 69800.00 | 83533.33
Steve | Johnson | Sales | Sales Support Agent | 76500.00 | 83533.33
Michael | Mitchell | IT | IT Manager | 89900.00 | 83533.33
Robert | King | IT | IT Staff | 67800.00 | 83533.33
Laura | Callahan | IT | IT Staff | 78000.00 | 83533.33
Edward | John | IT | IT Staff | 75900.00 | 83533.33Every row gets the same average_salary of 83533.33, the average across all nine employees. Because the rows are not collapsed, you can now do something GROUP BY could never do in one step — compare each salary directly to the average:
SELECT first_name, last_name, salary,
salary - AVG(salary) OVER() AS difference
FROM employee;The difference column tells you how far above or below the company average each person sits. Andrew Adams is about above the mean.
Partitioning a Window with PARTITION BY
A company-wide average is useful, but most analysis happens within groups. The PARTITION BY clause splits the window into smaller windows based on one or more columns. Each partition contains only the rows that share the same value as the current row, and the function is computed separately for each.
To get the average salary per department:
SELECT first_name, last_name, department, title, salary,
AVG(salary) OVER(PARTITION BY department) AS average_salary
FROM employee;first_name | last_name | department | title | salary | average_salary
-----------+-----------+------------+---------------------+-----------+---------------
Andrew | Adams | Management | General Manager | 108000.00 | 108000.00
Michael | Mitchell | IT | IT Manager | 89900.00 | 77900.00
Robert | King | IT | IT Staff | 67800.00 | 77900.00
Laura | Callahan | IT | IT Staff | 78000.00 | 77900.00
Edward | John | IT | IT Staff | 75900.00 | 77900.00
Nancy | Edwards | Sales | Sales Manager | 98900.00 | 83050.00
Jane | Peacock | Sales | Sales Support Agent | 87000.00 | 83050.00
Margaret | Park | Sales | Sales Support Agent | 69800.00 | 83050.00
Steve | Johnson | Sales | Sales Support Agent | 76500.00 | 83050.00Now the average_salary differs by department. Because the data is partitioned by department, a salary from one department never leaks into another department’s average. This is the workhorse pattern you will use constantly: an aggregate computed per group, shown next to every detail row.
PARTITION BY is not limited to AVG(). To count how many employees share each department:
SELECT first_name, last_name, department, title,
COUNT(employee_id) OVER(PARTITION BY department) AS num_employees_department
FROM employee;Each IT employee sees 4, each Sales employee sees 4, and Andrew in Management sees 1.
Ordering a Window with ORDER BY
The third element inside OVER() is ORDER BY. It defines the logical order of rows within the whole result set or within each partition. For some functions this ordering changes what the function computes — most notably, it lets you build a running total.
Consider a small table, apple_sales_quantity_by_month:
sales_date | brand | quantity
-----------+-------+---------
2022-01-31 | Apple | 50
2022-02-28 | Apple | 40
2022-03-31 | Apple | 25
2022-04-30 | Apple | 30
2022-05-31 | Apple | 47Adding ORDER BY sales_date inside OVER() makes SUM() accumulate as it walks down the ordered rows:
SELECT sales_date, brand, quantity,
SUM(quantity) OVER(ORDER BY sales_date) AS running_total_quantity
FROM apple_sales_quantity_by_month;sales_date | brand | quantity | running_total_quantity
-----------+-------+----------+-----------------------
2022-01-31 | Apple | 50 | 50
2022-02-28 | Apple | 40 | 90
2022-03-31 | Apple | 25 | 115
2022-04-30 | Apple | 30 | 145
2022-05-31 | Apple | 47 | 192The first row’s running total is just its own quantity. The second adds the first and second. By the last row, the running total is the sum of everything. The ORDER BY inside OVER() is what makes this accumulation possible.
ORDER BY plays different roles for different window functions, which makes it one of the more nuanced parts of the OVER() clause. You will see exactly how it shapes running totals and rankings in the lessons ahead. For now, the key idea is that ordering rows inside the window is separate from ordering your final output with a regular ORDER BY at the end of the query.
Practice Exercises
Use the employee table and the apple_sales_quantity_by_month table from this lesson.
Exercise 1: Salary Difference from the Company Average
Write a query returning each employee’s first_name, last_name, salary, and the difference between their salary and the average salary of all employees, aliased as difference. Make sure all employee rows stay visible.
-- Your code hereHint
Use an empty OVER() clause to build a window over the whole table, then apply AVG(salary). Subtract that windowed average from each row’s salary: salary - AVG(salary) OVER() AS difference.
Exercise 2: Employees per Department
Write a query returning first_name, last_name, department, title, and the number of employees in the same department as the current row, aliased as num_employees_department.
Hint
Use COUNT(employee_id) OVER(PARTITION BY department). The PARTITION BY restricts the count to rows in the same department.
Exercise 3: Running Total of Monthly Sales
Using apple_sales_quantity_by_month, return sales_date, brand, quantity, and a running total of monthly sales aliased as running_total_quantity.
-- Your code hereHint
Apply SUM(quantity) and add ORDER BY sales_date inside the OVER() clause. The ordering is what turns a plain sum into an accumulating running total.
Summary
You met window functions — calculations that run across a set of related rows while keeping every individual row in your result. Unlike GROUP BY, which collapses rows into summaries, window functions add the aggregate as another column, so you can compare each row to the group it belongs to. The key is the OVER() clause, which defines the window using PARTITION BY, ORDER BY, and an optional frame.
Key Concepts
- Window function — a function that computes a value across a window of rows related to the current row, returning one value per row without collapsing the result set.
OVER()clause — defines the window; an emptyOVER()spans the entire result set.PARTITION BY— splits the window into groups so the function is computed separately within each group.ORDER BY(insideOVER()) — sets the logical order of rows in the window; it is what enables running totals and rankings.- Four families — aggregate, ranking, offset, and distribution functions, all defined by
OVER().
Why This Matters
Window functions are everywhere in real analytics work. Running totals, period-over-period comparisons, rankings, and “how does this row compare to its group” questions all flow naturally from OVER(), and they are awkward or impossible with GROUP BY alone. Analysts and data engineers reach for these patterns daily to build dashboards, detect trends, and answer business questions in a single query. Mastering the OVER() clause now sets up everything that follows in this module.
Next Steps
Continue to Lesson 2 - Window Frames: ROWS and RANGE
Control exactly which rows a window function sees using ROWS BETWEEN and RANGE BETWEEN.
Back to Module Overview
Return to the Window Functions module overview
Continue Building Your Skills
You have written your first window functions and seen why they matter. The mental model — a function computed over a window of rows, one result per row — is the foundation for everything else in this module. Keep experimenting: swap AVG() for MIN() or MAX(), change the PARTITION BY column, and watch how the windowed value moves. The more fluently you read an OVER() clause, the more powerful your analysis becomes.