Lesson 1 - Introduction to Window Functions

Welcome to Window Functions

So far in SQL, whenever you needed a total, a count, or an average across groups of rows, you reached for GROUP BY. That works well until you hit questions it simply cannot answer: What is the 7-day rolling average of daily sales? How does each employee’s salary compare to their department’s average? What is the running total of revenue over time?

These questions have something in common. They need an aggregate and the individual rows at the same time. GROUP BY cannot give you both — it collapses your rows into summary lines. Window functions can. They are one of SQL’s most powerful analytical tools, and this lesson introduces the idea and the single clause that powers it: OVER().

By the end of this lesson, you will be able to:

  • Explain what window functions are and how they differ from GROUP BY
  • Name the four categories of window functions
  • Define a window with the OVER() clause
  • Split a window into groups using PARTITION BY
  • Order rows within a window using ORDER BY

No prior window-function experience is needed — just comfort with SELECT, WHERE, and aggregate functions like SUM() and AVG(). Let’s begin.

Data for this lesson

Engine: PostgreSQL — the examples print small sample tables inline.

Tables used: tbl_bikeshare, trips, employee, apple_sales_quantity_by_month

Want to run them yourself? Download window_practice.db and use any SQLite client (SQLite supports window functions too), or recreate the tables in PostgreSQL with psql.


Window Functions vs. GROUP BY

The fastest way to understand window functions is to compare them with something you already know: the GROUP BY clause.

How GROUP BY Collapses Rows

Adding GROUP BY to a query lets you apply an aggregate function such as SUM, COUNT, or AVG to groups of rows. The catch is that GROUP BY collapses the individual rows that share a group value into a single summary row. You keep the grouping columns and the aggregate, and you lose everything else.

Consider a table of bike-share trips called tbl_bikeshare. Each row is one trip, and the member_type column is either Member (a monthly membership) or Casual (a 3-day pass). Here are a few rows:

start_date            | duration | bike_number | member_type
----------------------+----------+-------------+------------
2017-10-01 01:00:00   | 285      | W22821      | Member
2017-10-01 01:01:00   | 816      | W00940      | Member
2017-10-01 01:01:00   | 2000     | W00201      | Casual
2017-10-01 01:02:00   | 798      | W23360      | Member
2017-10-01 01:02:00   | 3978     | W22595      | Casual

To get the average trip duration for each member type, you would write:

SELECT member_type, AVG(duration) AS avg_trip_duration
  FROM tbl_bikeshare
 GROUP BY member_type;
member_type | avg_trip_duration
------------+-------------------
Casual      | 2745.67
Member      | 633.00

The result has just two rows — one summary per group. You can no longer see any individual trip, its bike_number, or its exact duration. That detail is gone.

How Window Functions Keep Every Row

A window function performs a calculation across a set of rows that are related to the current row, and it returns a value for every row without collapsing anything. The aggregate appears as just another column alongside all your detail columns.

Here is the same average, computed as a window function:

SELECT bike_number, member_type, duration,
       AVG(duration) OVER(PARTITION BY member_type) AS avg_trip_duration
  FROM tbl_bikeshare;
bike_number | member_type | duration | avg_trip_duration
------------+-------------+----------+------------------
W00201      | Casual      | 2000     | 2745.67
W22595      | Casual      | 3978     | 2745.67
W20604      | Casual      | 2259     | 2745.67
W22821      | Member      | 285      | 633.00
W00940      | Member      | 816      | 633.00
W23360      | Member      | 798      | 633.00

Every trip is still here. Each row carries its own duration and the average for its member type. That is the superpower: you can now compare a single trip against the group it belongs to, all in one query. The keyword that makes this happen is OVER(), and the rest of this lesson is about what goes inside it.


The Four Types of Window Functions

Window functions come in four families. You will meet each one in detail in later lessons, but it helps to see the map now.

  • Aggregate functions — the familiar SUM(), AVG(), COUNT(), MIN(), and MAX(), but applied over a window instead of a GROUP BY. Covered in Lesson 3.
  • Ranking functionsROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE(), which number and rank rows within a window. Covered in Lesson 4.
  • Offset functionsLAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), and NTH_VALUE(), which pull a value from another row relative to or independent of the current row. Covered in Lesson 5.
  • Distribution functionsCUME_DIST() and PERCENT_RANK(), used for statistical analysis of where a value falls in a distribution. Covered in Lesson 6.

What unites all four is that they are defined by an OVER() clause. Let’s look at that clause now.


The Window Function Syntax

A window function appears in the SELECT list and always pairs a function name with the OVER() clause:

function_name([expression]) OVER (
    [partition_definition]
    [order_definition]
    [frame_definition]
) [AS column_name]

The OVER() clause defines the window — the set of rows the function looks at when it computes a value for each row. It supports three building blocks:

  • PARTITION BY divides the result set into partitions (groups) based on one or more columns. The function is applied separately to each partition.
  • ORDER BY sets the order in which rows are processed inside the window, ascending (ASC) or descending (DESC).
  • The frame definition narrows the window to a specific band of rows around the current row. Framing is an advanced topic you will meet in Lesson 2.

Not every element is required for every function. In fact, the OVER() clause can even be empty — and that turns out to be a useful starting point.


The Empty OVER() Clause

When you write OVER() with nothing inside it, you define a window that covers the entire result set. The function is computed once across all rows and that single value is attached to every row.

Consider an employee table:

employee_id | last_name | first_name | department | title               | hire_date  | salary
------------+-----------+------------+------------+---------------------+------------+----------
1           | Adams     | Andrew     | Management | General Manager     | 2002-08-13 | 108000.00
2           | Edwards   | Nancy      | Sales      | Sales Manager       | 2002-04-30 | 98900.00
3           | Peacock   | Jane       | Sales      | Sales Support Agent | 2002-03-31 | 87000.00
4           | Park      | Margaret   | Sales      | Sales Support Agent | 2003-05-02 | 69800.00
5           | Johnson   | Steve      | Sales      | Sales Support Agent | 2003-10-16 | 76500.00
6           | Mitchell  | Michael    | IT         | IT Manager          | 2003-10-16 | 89900.00
7           | King      | Robert     | IT         | IT Staff            | 2004-01-01 | 67800.00
8           | Callahan  | Laura      | IT         | IT Staff            | 2004-03-03 | 78000.00
9           | John      | Edward     | IT         | IT Staff            | 2004-09-18 | 75900.00

Suppose you want each employee’s details next to the company-wide average salary. With an empty OVER():

SELECT first_name, last_name, department, title, salary,
       AVG(salary) OVER() AS average_salary
  FROM employee;
first_name | last_name | department | title               | salary    | average_salary
-----------+-----------+------------+---------------------+-----------+---------------
Andrew     | Adams     | Management | General Manager     | 108000.00 | 83533.33
Nancy      | Edwards   | Sales      | Sales Manager       | 98900.00  | 83533.33
Jane       | Peacock   | Sales      | Sales Support Agent | 87000.00  | 83533.33
Margaret   | Park      | Sales      | Sales Support Agent | 69800.00  | 83533.33
Steve      | Johnson   | Sales      | Sales Support Agent | 76500.00  | 83533.33
Michael    | Mitchell  | IT         | IT Manager          | 89900.00  | 83533.33
Robert     | King      | IT         | IT Staff            | 67800.00  | 83533.33
Laura      | Callahan  | IT         | IT Staff            | 78000.00  | 83533.33
Edward     | John      | IT         | IT Staff            | 75900.00  | 83533.33

Every row gets the same average_salary of 83533.33, the average across all nine employees. Because the rows are not collapsed, you can now do something GROUP BY could never do in one step — compare each salary directly to the average:

SELECT first_name, last_name, salary,
       salary - AVG(salary) OVER() AS difference
  FROM employee;

The difference column tells you how far above or below the company average each person sits. Andrew Adams is about 10800083533.33=24466.67 108000 - 83533.33 = 24466.67 above the mean.


Partitioning a Window with PARTITION BY

A company-wide average is useful, but most analysis happens within groups. The PARTITION BY clause splits the window into smaller windows based on one or more columns. Each partition contains only the rows that share the same value as the current row, and the function is computed separately for each.

To get the average salary per department:

SELECT first_name, last_name, department, title, salary,
       AVG(salary) OVER(PARTITION BY department) AS average_salary
  FROM employee;
first_name | last_name | department | title               | salary    | average_salary
-----------+-----------+------------+---------------------+-----------+---------------
Andrew     | Adams     | Management | General Manager     | 108000.00 | 108000.00
Michael    | Mitchell  | IT         | IT Manager          | 89900.00  | 77900.00
Robert     | King      | IT         | IT Staff            | 67800.00  | 77900.00
Laura      | Callahan  | IT         | IT Staff            | 78000.00  | 77900.00
Edward     | John      | IT         | IT Staff            | 75900.00  | 77900.00
Nancy      | Edwards   | Sales      | Sales Manager       | 98900.00  | 83050.00
Jane       | Peacock   | Sales      | Sales Support Agent | 87000.00  | 83050.00
Margaret   | Park      | Sales      | Sales Support Agent | 69800.00  | 83050.00
Steve      | Johnson   | Sales      | Sales Support Agent | 76500.00  | 83050.00

Now the average_salary differs by department. Because the data is partitioned by department, a salary from one department never leaks into another department’s average. This is the workhorse pattern you will use constantly: an aggregate computed per group, shown next to every detail row.

PARTITION BY is not limited to AVG(). To count how many employees share each department:

SELECT first_name, last_name, department, title,
       COUNT(employee_id) OVER(PARTITION BY department) AS num_employees_department
  FROM employee;

Each IT employee sees 4, each Sales employee sees 4, and Andrew in Management sees 1.


Ordering a Window with ORDER BY

The third element inside OVER() is ORDER BY. It defines the logical order of rows within the whole result set or within each partition. For some functions this ordering changes what the function computes — most notably, it lets you build a running total.

Consider a small table, apple_sales_quantity_by_month:

sales_date | brand | quantity
-----------+-------+---------
2022-01-31 | Apple | 50
2022-02-28 | Apple | 40
2022-03-31 | Apple | 25
2022-04-30 | Apple | 30
2022-05-31 | Apple | 47

Adding ORDER BY sales_date inside OVER() makes SUM() accumulate as it walks down the ordered rows:

SELECT sales_date, brand, quantity,
       SUM(quantity) OVER(ORDER BY sales_date) AS running_total_quantity
  FROM apple_sales_quantity_by_month;
sales_date | brand | quantity | running_total_quantity
-----------+-------+----------+-----------------------
2022-01-31 | Apple | 50       | 50
2022-02-28 | Apple | 40       | 90
2022-03-31 | Apple | 25       | 115
2022-04-30 | Apple | 30       | 145
2022-05-31 | Apple | 47       | 192

The first row’s running total is just its own quantity. The second adds the first and second. By the last row, the running total is the sum of everything. The ORDER BY inside OVER() is what makes this accumulation possible.

ORDER BY plays different roles for different window functions, which makes it one of the more nuanced parts of the OVER() clause. You will see exactly how it shapes running totals and rankings in the lessons ahead. For now, the key idea is that ordering rows inside the window is separate from ordering your final output with a regular ORDER BY at the end of the query.


Practice Exercises

Use the employee table and the apple_sales_quantity_by_month table from this lesson.

Exercise 1: Salary Difference from the Company Average

Write a query returning each employee’s first_name, last_name, salary, and the difference between their salary and the average salary of all employees, aliased as difference. Make sure all employee rows stay visible.

-- Your code here

Hint

Use an empty OVER() clause to build a window over the whole table, then apply AVG(salary). Subtract that windowed average from each row’s salary: salary - AVG(salary) OVER() AS difference.

Exercise 2: Employees per Department

Write a query returning first_name, last_name, department, title, and the number of employees in the same department as the current row, aliased as num_employees_department.

Hint

Use COUNT(employee_id) OVER(PARTITION BY department). The PARTITION BY restricts the count to rows in the same department.

Exercise 3: Running Total of Monthly Sales

Using apple_sales_quantity_by_month, return sales_date, brand, quantity, and a running total of monthly sales aliased as running_total_quantity.

-- Your code here

Hint

Apply SUM(quantity) and add ORDER BY sales_date inside the OVER() clause. The ordering is what turns a plain sum into an accumulating running total.


Summary

You met window functions — calculations that run across a set of related rows while keeping every individual row in your result. Unlike GROUP BY, which collapses rows into summaries, window functions add the aggregate as another column, so you can compare each row to the group it belongs to. The key is the OVER() clause, which defines the window using PARTITION BY, ORDER BY, and an optional frame.

Key Concepts

  • Window function — a function that computes a value across a window of rows related to the current row, returning one value per row without collapsing the result set.
  • OVER() clause — defines the window; an empty OVER() spans the entire result set.
  • PARTITION BY — splits the window into groups so the function is computed separately within each group.
  • ORDER BY (inside OVER()) — sets the logical order of rows in the window; it is what enables running totals and rankings.
  • Four families — aggregate, ranking, offset, and distribution functions, all defined by OVER().

Why This Matters

Window functions are everywhere in real analytics work. Running totals, period-over-period comparisons, rankings, and “how does this row compare to its group” questions all flow naturally from OVER(), and they are awkward or impossible with GROUP BY alone. Analysts and data engineers reach for these patterns daily to build dashboards, detect trends, and answer business questions in a single query. Mastering the OVER() clause now sets up everything that follows in this module.


Next Steps

Continue to Lesson 2 - Window Frames: ROWS and RANGE

Control exactly which rows a window function sees using ROWS BETWEEN and RANGE BETWEEN.

Back to Module Overview

Return to the Window Functions module overview


Continue Building Your Skills

You have written your first window functions and seen why they matter. The mental model — a function computed over a window of rows, one result per row — is the foundation for everything else in this module. Keep experimenting: swap AVG() for MIN() or MAX(), change the PARTITION BY column, and watch how the windowed value moves. The more fluently you read an OVER() clause, the more powerful your analysis becomes.