Lesson 3 - Nested and Correlated Subqueries
Welcome to Nested and Correlated Subqueries
So far your subqueries have run on their own — you could copy the inner query out, run it, and get a result independent of the outer query. In this lesson you will meet two more powerful patterns. A correlated subquery depends on the outer query and runs once per outer row. A nested subquery lives inside another subquery, letting you reach across three or more tables.
These are not new types of subquery — a correlated or nested subquery can still be scalar, multi-row, or multi-column. They are new ways of connecting subqueries to the queries around them.
By the end of this lesson, you will be able to:
- Recognize and write a correlated subquery that references the outer query
- Use the
EXISTSandNOT EXISTSoperators - Nest a subquery inside another subquery, several levels deep
- Apply a systematic approach to building complex subquery solutions
You should be comfortable with scalar, multi-row, and multi-column subqueries from Lessons 1 and 2. Let’s begin.
Data for this lesson
Database: chinook.db — a digital music store (SQLite).
Tables used: customer, invoice, track, invoice_line, playlist, playlist_track, employee, genre
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
Correlated Subqueries
A correlated subquery is an inner query that uses values from the outer query. Because it references the outer query, it cannot run on its own — and it is re-evaluated for each row the outer query processes.
A First Example
Here we attach each customer’s average sale to their name:
SELECT last_name,
first_name,
(SELECT AVG(total)
FROM invoice i
WHERE c.customer_id = i.customer_id) AS total_avg
FROM customer c;Look at the WHERE clause inside the subquery: c.customer_id = i.customer_id. The c refers to the outer query’s customer table. That reference is what makes this subquery correlated — for each customer row in the outer query, the subquery computes the average of just that customer’s invoices. Try to run the inner query by itself and it fails, because c.customer_id has no meaning without the outer query.
When to Prefer a Correlated Subquery
You could get the same result with a join to a grouped subquery:
SELECT last_name, first_name, total_avg
FROM customer c
JOIN (SELECT customer_id, AVG(total) AS total_avg
FROM invoice
GROUP BY customer_id) i
ON c.customer_id = i.customer_id;Both work. A correlated subquery often reads more naturally when the inner and outer queries refer to the same table. For example, to display every invoice whose total is greater than the average total in its own country:
SELECT *
FROM invoice i1
WHERE total > (SELECT AVG(total)
FROM invoice i2
WHERE i1.billing_country = i2.billing_country);For each invoice i1, the subquery computes the average total for that invoice’s country, then the outer query keeps the invoice only if it beats its own country’s average. Expressing “compare each row to a group it belongs to” is exactly what correlated subqueries do best.
The EXISTS and NOT EXISTS Operators
Correlated subqueries pair naturally with the EXISTS operator. EXISTS is a logical test that returns true if the subquery returns any row at all — it does not care about the values, only whether rows exist.
Finding Rows That Have a Match
To list customers who have purchased something:
SELECT first_name, last_name
FROM customer c
WHERE EXISTS (SELECT *
FROM invoice i
WHERE c.customer_id = i.customer_id);For each customer, the correlated subquery looks for at least one matching invoice. If one exists, EXISTS is true and the customer is kept.
| first_name | last_name |
|---|---|
| Luís | Gonçalves |
| Leonie | Köhler |
| François | Tremblay |
Finding Rows With No Match
Negate it with NOT EXISTS to find rows that have no match. To find customers who never purchased anything:
SELECT first_name, last_name
FROM customer c
WHERE NOT EXISTS (SELECT *
FROM invoice i
WHERE c.customer_id = i.customer_id);In Chinook this returns an empty table — every customer has bought at least once. The same pattern finds tracks that have never been sold by checking the invoice_line table:
SELECT *
FROM track t
WHERE NOT EXISTS (SELECT *
FROM invoice_line il
WHERE t.track_id = il.track_id);For each track, the subquery looks for any invoice line referencing it. If none exists, the track has never been sold and is kept. NOT EXISTS is often the clearest way to express “things with no related records.”
Nested Subqueries
A nested subquery is a subquery inside another subquery. When information is spread across more than three tables, nesting lets you step from one table to the next without writing a single giant join.
Stepping Across Three Tables
Suppose you want the names of playlists that contain at least one track lasting 15 minutes or more (900000 milliseconds). That spans three tables: playlist, playlist_track, and track. Nested subqueries walk the chain:
SELECT name
FROM playlist
WHERE playlist_id IN (SELECT playlist_id
FROM playlist_track
WHERE track_id IN (SELECT track_id
FROM track
WHERE milliseconds >= 900000));Read from the innermost query outward. The deepest subquery finds long tracks. The middle subquery finds the playlists containing those tracks. The outer query returns the names of those playlists. Each subquery feeds a list to the one above it.
Another Nested Example
The same approach finds employees who supported customers whose total spend exceeded 40 dollars:
SELECT last_name, first_name
FROM employee
WHERE employee_id IN (SELECT support_rep_id
FROM customer
WHERE customer_id IN (SELECT customer_id
FROM invoice
GROUP BY customer_id
HAVING SUM(total) > 40));The innermost query finds big-spending customers, the middle query finds their support reps, and the outer query returns those employees’ names. Notice that nesting and the other subquery features combine freely — here the innermost subquery uses GROUP BY and HAVING.
A Systematic Approach to Complex Subqueries
When a question needs several tables, writing the query in one shot is overwhelming. Use a repeatable process instead:
- Identify the tables that hold the information you need.
- Break the question into sub-problems.
- Write a small query for each sub-problem and confirm it works.
- Combine and modify those queries into a single statement.
- Optimize if needed — there is usually more than one valid approach.
Worked Example
Say you want, for each invoice in the United States, its total purchase amount and the total minutes of Metal-genre tracks it contains. That needs invoice, invoice_line, track, and genre.
First, the US customers’ invoices:
SELECT invoice_id
FROM invoice
WHERE billing_country = 'USA';Next, the Metal genre ids:
SELECT genre_id
FROM genre
WHERE name LIKE '%Metal%';Then join invoice_line and track so price, quantity, and duration sit together, filtered to US invoices:
SELECT il.*, t.milliseconds, t.genre_id
FROM invoice_line il
JOIN track t
ON il.track_id = t.track_id
WHERE invoice_id IN (SELECT invoice_id
FROM invoice
WHERE billing_country = 'USA');Finally, wrap that joined result as a subquery in the FROM clause, filter it to Metal tracks with the genre subquery, and aggregate per invoice:
SELECT tr.invoice_id AS invoice_id,
SUM(tr.quantity * tr.unit_price) AS total,
SUM(tr.milliseconds) / 1000.0 / 60 AS minute
FROM (SELECT il.*, t.milliseconds, t.genre_id
FROM invoice_line il
JOIN track t
ON il.track_id = t.track_id
WHERE invoice_id IN (SELECT invoice_id
FROM invoice
WHERE billing_country = 'USA')) tr
WHERE tr.genre_id IN (SELECT genre_id
FROM genre
WHERE name LIKE '%Metal%')
GROUP BY tr.invoice_id;Output (sample):
| invoice_id | total | minute |
|---|---|---|
| 5 | 4.95 | 22.606333 |
| 38 | 2.97 | 19.749417 |
| 39 | 4.95 | 25.604750 |
| 59 | 5.94 | 41.471950 |
| 60 | 1.98 | 9.148067 |
The query looks long, but you built it one sub-problem at a time. That is the only way to write complex SQL reliably.
Practice Exercises
Use the Chinook database for each exercise.
Exercise 1: Above the Customer’s Own Average
Write a correlated subquery that returns every invoice whose total is greater than the average total for that same customer. Select at least invoice_id and customer_id.
Hint
In the outer query alias the table i1; in the subquery alias it i2 and add WHERE i1.customer_id = i2.customer_id.
Exercise 2: Albums With No Sales
Use NOT EXISTS to find albums that have never had any of their tracks sold. You will correlate across album, track, and invoice_line.
Hint
For each album, check that no track belonging to it appears in invoice_line. A nested NOT EXISTS or a correlated subquery joining track to invoice_line both work.
Exercise 3: Playlists With Classical Tracks
Use nested subqueries to return the names of playlists that contain at least one track whose genre name is Classical.
Hint
Start from the innermost query: SELECT genre_id FROM genre WHERE name = 'Classical'. Use it to find track_id values, then playlist_id values, then playlist names.
Summary
You learned that a correlated subquery references the outer query and runs once per outer row, making it ideal for comparing each row to a group it belongs to. You used EXISTS and NOT EXISTS to test for the presence or absence of related rows, and you nested subqueries to walk across several tables. Most importantly, you saw a systematic way to build complex queries from small, verified pieces.
Key Concepts
- Correlated subquery — an inner query that uses values from the outer query and runs per outer row; it cannot execute on its own.
EXISTS— true if the subquery returns at least one row; pairs naturally with correlated subqueries.NOT EXISTS— true if the subquery returns no rows; the clearest way to find records with no match.- Nested subquery — a subquery inside another subquery, used to span more than three tables.
- Systematic approach — identify tables, split into sub-problems, solve each, combine, then optimize.
Why This Matters
Correlated subqueries and EXISTS answer some of the most common real questions: “rows above their group’s average,” “customers with no recent orders,” “products that never sold.” Nesting lets you reach data scattered across many tables. And the step-by-step build process is the difference between staring at a blank editor and confidently assembling a query that works the first time.
Next Steps
Nested and correlated subqueries are powerful but can grow hard to read. Next you will learn common table expressions — the WITH clause — which let you name each subquery and lay your logic out top to bottom.
Continue to Lesson 4 - Common Table Expressions (CTEs)
Make complex queries readable with WITH, and traverse hierarchies using recursive CTEs
Back to Module Overview
Return to the Subqueries, CTEs, and Views module overview
Continue Building Your Skills
You now command every shape of subquery and every way of connecting them. The systematic build process you practiced here will serve you far beyond SQL — breaking a hard problem into verifiable pieces is the core skill of every data professional. Keep using it.