Lesson 5 - Offset Functions: LAG, LEAD, and FIRST_VALUE

Welcome to Offset Functions

Many analytical questions are about neighbors. How does this month compare to last month? What was the change from the previous quarter? What was the first value in this group, or the third-highest? Offset window functions answer these by reaching into other rows of the window — without the self-joins these tasks would otherwise require.

There are two flavors. Relative offsets (LAG() and LEAD()) point to a row a fixed number of positions before or after the current row. Absolute offsets (FIRST_VALUE(), LAST_VALUE(), NTH_VALUE()) point to a fixed position in the window — its first, last, or nth row — regardless of where the current row sits.

A simple analogy: ask a stranger for directions and they can answer relatively (“two stores north of here”) or absolutely (“at the corner of 6th and Spring”). Offset functions support both.

By the end of this lesson, you will be able to:

  • Use LAG() and LEAD() to compare each row to a prior or following row
  • Use FIRST_VALUE(), LAST_VALUE(), and NTH_VALUE() to pull fixed positions from a window
  • Apply the correct frame so LAST_VALUE() and NTH_VALUE() return the right result
  • Solve period-over-period and anomaly-detection problems with offset functions

This lesson builds on framing (Lesson 2) and CTEs. Let’s get started.

Data for this lesson

Engine: PostgreSQL — the examples print small sample tables inline.

Tables used: employees, phone_sales_quantity_by_month, phone_sales_revenue_by_month

Want to run them yourself? Download window_practice.db and use any SQLite client (SQLite supports window functions too), or recreate the tables in PostgreSQL with psql.


The LEAD() Function

LEAD() returns a value from a row below the current row — a specified number of rows forward. The default offset is 1 (the very next row).

LEAD(expression, offset, default_value) OVER (
    [PARTITION BY ...]
    ORDER BY ...
)
  • expression — the column to read from the forward row.
  • offset (optional) — how many rows forward; defaults to 1.
  • default_value (optional) — what to return when no such row exists; defaults to NULL.
  • ORDER BY is required — it defines what “next” means. PARTITION BY is optional.

LAG() and LEAD() do not support framing, because the offset is already relative to the current row.

Using phone_sales_quantity_by_month (six months, two brands), this computes next month’s sales and the change versus the current month, per brand:

SELECT *,
       LEAD(quantity) OVER(PARTITION BY brand ORDER BY sales_date) AS next_month_sales,
       LEAD(quantity) OVER(PARTITION BY brand ORDER BY sales_date) - quantity AS sales_diff
  FROM phone_sales_quantity_by_month;
sales_date | brand   | quantity | next_month_sales | sales_diff
-----------+---------+----------+------------------+-----------
2022-01-31 | Apple   | 110      | 60               | -50
2022-02-28 | Apple   | 60       | 85               | 25
2022-03-31 | Apple   | 85       | 134              | 49
2022-04-30 | Apple   | 134      | 90               | -44
2022-05-31 | Apple   | 90       | 100              | 10
2022-06-30 | Apple   | 100      | (null)           | (null)
2022-01-31 | Samsung | 117      | 75               | -42
...        |         |          |                  |
2022-06-30 | Samsung | 89       | (null)           | (null)

LEAD(quantity) pulls the next month’s quantity into the current row. Subtracting the current quantity gives the month-to-month change. The last row of each partition has no following row, so next_month_sales and sales_diff are NULL.


The LAG() Function

LAG() is the mirror image of LEAD(): it reads from a row above the current row — a specified number of rows back. The syntax is identical.

The phone_sales_revenue_by_month table holds monthly revenue per brand. To compare each month with the same month one quarter earlier, use an offset of 3 and a default of 0 for the first quarter:

SELECT sales_date, brand, revenue,
       LAG(revenue, 3, 0.0) OVER(PARTITION BY brand ORDER BY sales_date) AS prev_quarter_revenue,
       revenue - LAG(revenue, 3, 0.0) OVER(PARTITION BY brand ORDER BY sales_date) AS difference
  FROM phone_sales_revenue_by_month;
sales_date | brand   | revenue  | prev_quarter_revenue | difference
-----------+---------+----------+----------------------+-----------
2022-01-31 | Apple   | 49950.00 | 0                    | 49950
2022-02-28 | Apple   | 36960.00 | 0                    | 36960
2022-03-31 | Apple   | 24975.00 | 0                    | 24975
2022-04-30 | Apple   | 17970.00 | 49950                | -31980
2022-05-31 | Apple   | 28753.00 | 36960                | -8207
2022-06-30 | Apple   | 23960.00 | 24975                | -1015
2022-01-31 | Samsung | 48921.00 | 0                    | 48921
...        |         |          |                      |
2022-06-30 | Samsung | 59821.00 | 25790                | 34031

The offset of 3 reaches back three rows (three months), so April compares to January. The default_value of 0.0 keeps the first three months from showing NULL. This pattern — comparing a row to a fixed distance in the past — is the foundation of period-over-period analysis.

LAG() and LEAD() pair beautifully with CASE to label changes. For example, classifying month-over-month revenue movement:

SELECT sales_date, brand, revenue,
       CASE
           WHEN revenue - LAG(revenue) OVER(PARTITION BY brand ORDER BY sales_date) > 0 THEN 'Increase'
           WHEN revenue - LAG(revenue) OVER(PARTITION BY brand ORDER BY sales_date) < 0 THEN 'Decrease'
           ELSE 'No Change'
       END AS revenue_change
  FROM phone_sales_revenue_by_month;

The FIRST_VALUE() Function

The remaining three functions are absolute offsets: they point to a fixed position in the window. Because that position is defined relative to the window’s start and end, framing becomes relevant for them (unlike LAG/LEAD).

FIRST_VALUE() returns the value from the first row of the window:

FIRST_VALUE(expression) OVER (
    [PARTITION BY ...]
    ORDER BY ...
    [frame clause]
)

Using the employees table, to show the first-hired employee in each department next to every employee:

SELECT department, first_name || ' ' || last_name AS full_name, hire_date,
       FIRST_VALUE(first_name || ' ' || last_name) OVER (
           PARTITION BY department
           ORDER BY hire_date
       ) AS hired_first
  FROM employees
 WHERE department IN ('Sales', 'IT');
department | full_name        | hire_date  | hired_first
-----------+------------------+------------+-----------------
IT         | Michael Mitchell | 2003-10-16 | Michael Mitchell
IT         | Robert King      | 2004-01-01 | Michael Mitchell
IT         | Laura Callahan   | 2004-03-03 | Michael Mitchell
IT         | Edward John      | 2004-09-18 | Michael Mitchell
Sales      | Jane Peacock     | 2002-03-31 | Jane Peacock
Sales      | Nancy Edwards    | 2002-04-30 | Jane Peacock
...        |                  |            |

Every IT row shows Michael Mitchell (hired first in IT) and every Sales row shows Jane Peacock. FIRST_VALUE() works correctly with the default frame, because the default frame always includes the first row of the partition.


The LAST_VALUE() Function

LAST_VALUE() returns the value from the last row of the window — but it comes with a trap. The default frame ends at the current row, not the end of the partition. So with the default frame, LAST_VALUE() returns the current row’s value, which is almost never what you want.

Watch what the default frame produces:

SELECT department, first_name || ' ' || last_name AS full_name, hire_date,
       LAST_VALUE(hire_date) OVER (
           PARTITION BY department
           ORDER BY hire_date
       ) AS last_hire_date
  FROM employees
 WHERE department IN ('Sales', 'IT');
department | full_name        | hire_date  | last_hire_date
-----------+------------------+------------+---------------
IT         | Michael Mitchell | 2003-10-16 | 2003-10-16
IT         | Robert King      | 2004-01-01 | 2004-01-01
...        |                  |            |

The last_hire_date just repeats each row’s own hire_date — useless. The fix is to extend the frame to the end of the partition with ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING:

SELECT department, first_name || ' ' || last_name AS full_name, hire_date,
       LAST_VALUE(hire_date) OVER (
           PARTITION BY department
           ORDER BY hire_date
           ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS last_hire_date
  FROM employees
 WHERE department IN ('Sales', 'IT');
department | full_name        | hire_date  | last_hire_date
-----------+------------------+------------+---------------
IT         | Michael Mitchell | 2003-10-16 | 2004-09-18
IT         | Robert King      | 2004-01-01 | 2004-09-18
IT         | Laura Callahan   | 2004-03-03 | 2004-09-18
IT         | Edward John      | 2004-09-18 | 2004-09-18
Sales      | Jane Peacock     | 2002-03-31 | 2003-10-16
...        |                  |            |

Now every row shows the actual last hire date in its department.

Always frame LAST_VALUE()

FIRST_VALUE() is safe with the default frame, but LAST_VALUE() is not — the default frame stops at the current row. Whenever you use LAST_VALUE(), add ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING (or UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) to reach the true last row.


The NTH_VALUE() Function

NTH_VALUE(expression, n) returns the value from the nth row of the window frame, counting from the first row.

NTH_VALUE(expression, offset) OVER (
    [PARTITION BY ...]
    ORDER BY ...
    [frame clause]
)

Both arguments are required, and like LAST_VALUE(), you should specify a full frame so the function can see the whole partition. To find each department’s third-highest salary, order by salary descending and frame across all rows:

WITH third_highest_salary AS (
    SELECT department, first_name || ' ' || last_name AS full_name, hire_date, salary,
           NTH_VALUE(salary, 3) OVER (
               PARTITION BY department
               ORDER BY salary DESC
               ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS third_highest_salary
      FROM employees
     WHERE department IN ('IT', 'Sales')
)
SELECT full_name, department, hire_date, salary
  FROM third_highest_salary
 WHERE salary = third_highest_salary;
full_name     | department | hire_date  | salary
--------------+------------+------------+---------
Edward John   | IT         | 2004-09-18 | 75900.00
Steve Johnson | Sales      | 2003-10-16 | 76500.00

The frame ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING makes the entire department visible, so NTH_VALUE(salary, 3) reliably returns the third-largest salary. The outer query then keeps only employees who earn exactly that amount.


Detecting Anomalies with LAG and LEAD

Offset functions excel at time-series analysis. By comparing each row to both its previous and next rows, you can flag unusual spikes or drops. This query labels a month as an anomaly if its revenue exceeds 1.4x either neighbor:

SELECT sales_date, brand, revenue,
       CASE
           WHEN revenue > LAG(revenue) OVER(PARTITION BY brand ORDER BY sales_date) * 1.4
             OR revenue > LEAD(revenue) OVER(PARTITION BY brand ORDER BY sales_date) * 1.4
           THEN 'Anomaly'
           ELSE 'Normal'
       END AS sales_status
  FROM phone_sales_revenue_by_month
 ORDER BY brand, sales_date;
sales_date | brand   | revenue  | sales_status
-----------+---------+----------+-------------
2022-01-31 | Apple   | 49950.00 | Normal
2022-02-28 | Apple   | 36960.00 | Anomaly
2022-03-31 | Apple   | 24975.00 | Normal
2022-04-30 | Apple   | 17970.00 | Normal
2022-05-31 | Apple   | 28753.00 | Anomaly
2022-06-30 | Apple   | 23960.00 | Normal
2022-01-31 | Samsung | 48921.00 | Normal
...        |         |          |

Each month is compared against the one before and after it, all without a single join.


Practice Exercises

Use phone_sales_quantity_by_month and phone_sales_revenue_by_month.

Exercise 1: Month-over-Month Percentage Change

Return brand, sales_date, quantity, next month’s quantity (next_month_sales), and the percentage change to next month (sales_percentage_change), sorted by brand and sales_date.

-- Your code here

Hint

Use LEAD(quantity) OVER(PARTITION BY brand ORDER BY sales_date) for next month. For the percentage, compute (LEAD(quantity) OVER(...) - quantity) / quantity::numeric * 100. The ::numeric cast keeps the division from truncating.

Exercise 2: First and Last Month Comparison

Against phone_sales_revenue_by_month, return sales_date, brand, revenue, the percentage change from each brand’s first month (first_month_pct_change), and the percentage change from each brand’s last month (last_month_pct_change), rounded to two decimals.

Hint

Use FIRST_VALUE(revenue) OVER(PARTITION BY brand ORDER BY sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) for the first month, and LAST_VALUE(revenue) OVER(... ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) for the last. Wrap each (revenue - value) / value * 100 in ROUND(..., 2).

Exercise 3: Second-Best Month per Brand

Find the second-highest-revenue month for each brand. Return brand, sales_date, and second_highest_revenue.

Hint

In a CTE, compute NTH_VALUE(revenue, 2) OVER(PARTITION BY brand ORDER BY revenue DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS second_highest_revenue, then filter the outer query with WHERE revenue = second_highest_revenue.


Summary

You learned to reach into other rows of a window. LAG() and LEAD() pull values from a fixed number of rows back or forward, making period-over-period comparisons and anomaly detection simple. FIRST_VALUE(), LAST_VALUE(), and NTH_VALUE() pull values from fixed positions in the window — but LAST_VALUE() and NTH_VALUE() need an explicit frame to see the whole partition. All of this happens without a single self-join.

Key Concepts

  • LAG() / LEAD() — relative offsets that read from a row a fixed distance before or after the current row; ORDER BY is required and framing is not supported.
  • Default and offset arguments — both accept an offset (distance) and a default_value for when the target row does not exist.
  • FIRST_VALUE() — returns the window’s first value; safe with the default frame.
  • LAST_VALUE() — returns the window’s last value; needs ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING to be correct.
  • NTH_VALUE(expr, n) — returns the nth value; frame across the whole partition with UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

Why This Matters

Period-over-period change, growth rates, and “compare to first/last/nth” are core to almost every business dashboard and financial report. Before window functions, these required clumsy self-joins; now they are a single clause. The LAST_VALUE() framing gotcha is a classic interview question and a frequent real-world bug — knowing it marks you as someone who truly understands window frames.


Next Steps

Continue to Lesson 6 - Distribution and Percentile Functions

Measure where a value falls with CUME_DIST() and PERCENT_RANK(), find percentiles, and simplify queries with the WINDOW clause.

Back to Module Overview

Return to the Window Functions module overview


Continue Building Your Skills

Try building a small monthly metrics table and computing month-over-month change, quarter-over-quarter change, and a “compared to first month” growth column all in one query. Pay special attention to the LAST_VALUE() frame — run it once with the default frame and once with the explicit frame so the difference sticks. These offset patterns will serve you in every time-series analysis you ever write.