Lesson 4 - Common Table Expressions (CTEs)

Welcome to Common Table Expressions

Nested subqueries are powerful, but reading them means working inside-out, mentally unwrapping layer after layer of parentheses. There is a cleaner way. A common table expression (CTE) lets you give a subquery a name and define it up front, then refer to that name in your main query — so your logic reads top to bottom, like a story.

In this lesson you will write single and multiple CTEs with the WITH clause, and then meet recursive CTEs, which can traverse tree-structured data such as a company’s reporting hierarchy.

By the end of this lesson, you will be able to:

  • Replace a subquery with a named CTE using the WITH clause
  • Chain multiple CTEs together in one query
  • Apply CTE style conventions for clean, readable SQL
  • Explain the idea of recursion
  • Write a recursive CTE to traverse a hierarchy

You should be comfortable with multi-column subqueries from Lesson 2. Let’s begin.

Data for this lesson

Database: chinook.db — a digital music store (SQLite).

Tables used: invoice, customer, invoice_line, track, genre, employee

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


Your First CTE

A common table expression gives a name to a subquery, much like an alias. You define it with the WITH clause, and then use that name as a table in the query that follows.

From Subquery to CTE

Here is a query from Lesson 2 that averages the per-city sales counts, written with a subquery in the FROM clause:

SELECT AVG(billing_city_tally) AS billing_city_tally_avg
  FROM (SELECT billing_city, COUNT(*) AS billing_city_tally
          FROM invoice
         GROUP BY billing_city);

Rewritten as a CTE, the inner query moves to the top, gets a name, and the main query reads cleanly:

WITH
city_sales_table AS (
SELECT billing_city, COUNT(*) AS billing_city_tally
  FROM invoice
 GROUP BY billing_city
)

SELECT AVG(billing_city_tally) AS billing_city_tally_avg
  FROM city_sales_table;

The result is identical, but the structure is obvious: first define city_sales_table, then query it. The two benefits are clear — readability (you read top to bottom) and reusability (you can reference the named table more than once without rewriting it).

Note that the CTE’s name is the link: change city_sales_table to something else in the WITH clause, and you must change it in the FROM clause too, or the query errors.

A CTE in Place of a Joined Subquery

Any subquery-as-table can become a CTE. Here is the customer-average join from Lesson 2, rewritten:

WITH
customer_avg_table AS (
SELECT customer_id, AVG(total) AS total_avg
  FROM invoice
 GROUP BY customer_id
)

SELECT c.last_name, c.first_name, i.total_avg
  FROM customer AS c
  JOIN customer_avg_table AS i
    ON c.customer_id = i.customer_id;

The join now references the named CTE instead of a parenthesized subquery — easier to read and easier to change.


Multiple CTEs

You are not limited to one CTE. Define several by separating them with commas, and the main query can use them all.

The Syntax

WITH
table_name_1 AS (
  SELECT ...
),
table_name_2 AS (
  SELECT ...
)

SELECT ...
  FROM table_name_1 t1
  JOIN table_name_2 t2
    ON t1.some_id = t2.some_id;

Replacing Two Joined Subqueries

Recall the “average sales per country per customer” query that joined two subqueries. With two CTEs it becomes far more legible:

WITH
country_invoice_total_table AS (
  SELECT billing_country, SUM(total) AS invoice_total
    FROM invoice
   GROUP BY billing_country
),
country_total_table AS (
  SELECT country, COUNT(*) AS customer_tally
    FROM customer
   GROUP BY country
)

SELECT ct.country,
       ROUND(i.invoice_total / ct.customer_tally, 2) AS sale_avg
  FROM country_invoice_total_table AS i
  JOIN country_total_table AS ct
    ON i.billing_country = ct.country
 ORDER BY sale_avg DESC
 LIMIT 5;

Each CTE has a clear, descriptive name. The join at the bottom reads almost like plain English: join country totals to customer counts and compute the average.


CTE Style and Constraints

The name says it all — a Common Table Expression behaves like a table, so you use it wherever a table is allowed. A few style conventions keep multi-CTE queries readable:

  1. Write WITH on its own line.
  2. Start each CTE on a new line: cte_name AS (.
  3. Put the closing parenthesis on its own line, aligned with the start of the CTE’s name.
  4. Separate CTEs with a comma after the closing parenthesis.
  5. Align the CTEs and the final query to the same river.

CTEs can even build on earlier CTEs in the same WITH block. This lets you express a long chain of steps as a readable pipeline — each CTE refining the one before it:

WITH
usa_invoice_table AS (
SELECT invoice_id
  FROM invoice
 WHERE billing_country = 'USA'
),
invoice_line_track_table AS (
SELECT il.*, t.milliseconds, t.genre_id
  FROM invoice_line il
  JOIN track t
    ON il.track_id = t.track_id
),
usa_track_table AS (
SELECT i.*
  FROM invoice_line_track_table i
  JOIN usa_invoice_table u
    ON i.invoice_id = u.invoice_id
),
metal_genre_table AS (
SELECT genre_id
  FROM genre
 WHERE name LIKE '%Metal%'
),
metal_usa_track_table AS (
SELECT *
  FROM usa_track_table u
  JOIN metal_genre_table m
    ON u.genre_id = m.genre_id
)

SELECT invoice_id,
       SUM(quantity * unit_price) AS total,
       SUM(milliseconds) / 1000.0 / 60 AS minute
  FROM metal_usa_track_table
 GROUP BY invoice_id;

This is the same “Metal tracks sold in the USA” problem from Lesson 3 — but instead of nesting subqueries inside parentheses, each step is a named CTE that the next step references. Compare the two and you will feel why teams reach for CTEs on anything non-trivial.


Recursion and Recursive CTEs

CTEs do more than tidy up subqueries. A recursive CTE can repeatedly reference itself, which is the only practical way to traverse a tree-structured table in SQL — for example, a company org chart.

What Recursion Is

Recursion means a process that repeats by referring to itself, working on new data each time until it can go no further. Picture counting people in a line: the first says “I am 1,” the next says “I am 1 + 1, so 2,” and so on until someone has no one behind them. Every recursive process has three parts:

  1. Initialization — a starting value (the first person says “I am 1”).
  2. Recursive step — the action that calls itself (each person adds 1 to the previous count).
  3. Stop condition — what ends the recursion (no one left behind).

A Self-Referencing Table

The Chinook employee table has a reports_to column that points back to another row in the same table — that is what makes it a tree. The CEO reports to no one (reports_to IS NULL); everyone else reports to a manager above them.

Writing a Recursive CTE

A recursive CTE has four ingredients: the RECURSIVE keyword, an anchor member (the tree root), a compound operator like UNION ALL, and a recursive member that references the CTE itself. Here we walk the hierarchy below employee 1, tracking each person’s depth:

WITH RECURSIVE
under_adams_table(employee_id, last_name, first_name, level) AS (
SELECT employee_id, last_name, first_name, 0 AS level
  FROM employee
 WHERE employee_id = 1

 UNION ALL

SELECT e.employee_id,
       e.last_name,
       e.first_name,
       u.level + 1 AS level
  FROM employee e
  JOIN under_adams_table u
    ON e.reports_to = u.employee_id
)

SELECT SUBSTR('>>>', 1, level) || ' ' || last_name || ' ' || first_name AS hierarchy
  FROM under_adams_table;

Output:

hierarchy
Adams Andrew
> Edwards Nancy
> Mitchell Michael
» Peacock Jane
» Park Margaret
» Johnson Steve
» King Robert
» Callahan Laura

Here is how it runs:

  1. The anchor member selects the root — employee 1, Andrew Adams, at level 0.
  2. UNION ALL is the compound operator that stacks new rows onto the result.
  3. The recursive member joins employee back to the CTE, finding everyone whose reports_to matches an employee already in the table, and increments level by 1.
  4. SQL repeats the recursive step until no new matches are found.

The final SELECT uses SUBSTR('>>>', 1, level) to print a > for each level of depth, drawing the hierarchy visually.

Building a Path

A small change to the recursive member produces a full path string from the root to each employee:

WITH RECURSIVE
managers_chain(employee_id, path) AS (

SELECT employee_id, last_name || ' ' || first_name AS path
  FROM employee
 WHERE reports_to IS NULL

 UNION ALL

SELECT e.employee_id,
       m.path || '<--' || e.last_name || ' ' || e.first_name AS path
  FROM employee e
  JOIN managers_chain m
    ON e.reports_to = m.employee_id
)

SELECT path
  FROM managers_chain;

Output:

path
Adams Andrew
Adams Andrew<–Edwards Nancy
Adams Andrew<–Mitchell Michael
Adams Andrew<–Edwards Nancy<–Peacock Jane
Adams Andrew<–Edwards Nancy<–Park Margaret
Adams Andrew<–Edwards Nancy<–Johnson Steve
Adams Andrew<–Mitchell Michael<–King Robert
Adams Andrew<–Mitchell Michael<–Callahan Laura

The anchor starts each path at the root, and the recursive member appends <-- and the next employee’s name at every level. Recursive CTEs are how you handle org charts, category trees, bills of materials, and any other parent-child hierarchy in SQL.


Practice Exercises

Use the Chinook database for each exercise.

Exercise 1: Rewrite a Subquery as a CTE

Take the average-of-per-country-totals query from Lesson 2 and rewrite it using a single CTE named country_totals.

Hint

Define country_totals AS (SELECT billing_country, SUM(total) AS country_total FROM invoice GROUP BY billing_country), then SELECT AVG(country_total) FROM country_totals.

Exercise 2: Two CTEs

Using two CTEs — one counting invoices per country, one counting customers per country — return each country’s invoices-per-customer ratio, ordered from highest to lowest.

Hint

Name the CTEs invoices_by_country and customers_by_country, join them on the country name, and divide the two counts in the final SELECT.

Exercise 3: Traverse the Hierarchy

Modify the under_adams_table recursive CTE so it lists every employee who reports (directly or indirectly) to employee 2 instead of employee 1.

Hint

Change the anchor member’s filter to WHERE employee_id = 2. The recursive member stays the same.


Summary

You learned to name subqueries with the WITH clause, turning hard-to-read nested queries into top-to-bottom pipelines. You chained multiple CTEs, applied clean style conventions, and used recursive CTEs to traverse a self-referencing table — the standard tool for hierarchies.

Key Concepts

  • Common table expression (CTE) — a named subquery defined with WITH and used like a table.
  • WITH clause — introduces one or more CTEs before the main query.
  • Multiple CTEs — separated by commas; later CTEs can reference earlier ones.
  • Recursive CTE — a CTE that references itself, built from an anchor member, a compound operator (UNION ALL), and a recursive member.
  • Anchor member — the non-recursive starting query; the root of the traversal.

Why This Matters

CTEs are how professionals keep complex SQL maintainable. A query built from named steps is far easier to read, debug, and hand off than a deeply nested subquery. And recursive CTEs solve a whole class of problems — org charts, category trees, dependency graphs — that are otherwise impossible to express in standard SQL.


Next Steps

CTEs live only for the duration of a single query. What if you want to save a query and reuse it across many queries, as if it were a permanent table? That is exactly what views do — and they are next.

Continue to Lesson 5 - Creating and Using Views

Save queries as reusable views, build views on top of views, and drop them when you're done

Back to Module Overview

Return to the Subqueries, CTEs, and Views module overview


Continue Building Your Skills

CTEs will quickly become your default way to structure any query longer than a few lines. They cost nothing in performance on most databases and pay you back every time you or a teammate reads the query later. Reach for WITH whenever a subquery starts to feel cramped.