Lesson 3 - Nested and Correlated Subqueries

Welcome to Nested and Correlated Subqueries

So far your subqueries have run on their own — you could copy the inner query out, run it, and get a result independent of the outer query. In this lesson you will meet two more powerful patterns. A correlated subquery depends on the outer query and runs once per outer row. A nested subquery lives inside another subquery, letting you reach across three or more tables.

These are not new types of subquery — a correlated or nested subquery can still be scalar, multi-row, or multi-column. They are new ways of connecting subqueries to the queries around them.

By the end of this lesson, you will be able to:

  • Recognize and write a correlated subquery that references the outer query
  • Use the EXISTS and NOT EXISTS operators
  • Nest a subquery inside another subquery, several levels deep
  • Apply a systematic approach to building complex subquery solutions

You should be comfortable with scalar, multi-row, and multi-column subqueries from Lessons 1 and 2. Let’s begin.

Data for this lesson

Database: chinook.db — a digital music store (SQLite).

Tables used: customer, invoice, track, invoice_line, playlist, playlist_track, employee, genre

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


Correlated Subqueries

A correlated subquery is an inner query that uses values from the outer query. Because it references the outer query, it cannot run on its own — and it is re-evaluated for each row the outer query processes.

A First Example

Here we attach each customer’s average sale to their name:

SELECT last_name,
       first_name,
       (SELECT AVG(total)
          FROM invoice i
         WHERE c.customer_id = i.customer_id) AS total_avg
  FROM customer c;

Look at the WHERE clause inside the subquery: c.customer_id = i.customer_id. The c refers to the outer query’s customer table. That reference is what makes this subquery correlated — for each customer row in the outer query, the subquery computes the average of just that customer’s invoices. Try to run the inner query by itself and it fails, because c.customer_id has no meaning without the outer query.

When to Prefer a Correlated Subquery

You could get the same result with a join to a grouped subquery:

SELECT last_name, first_name, total_avg
  FROM customer c
  JOIN (SELECT customer_id, AVG(total) AS total_avg
          FROM invoice
         GROUP BY customer_id) i
    ON c.customer_id = i.customer_id;

Both work. A correlated subquery often reads more naturally when the inner and outer queries refer to the same table. For example, to display every invoice whose total is greater than the average total in its own country:

SELECT *
  FROM invoice i1
 WHERE total > (SELECT AVG(total)
                  FROM invoice i2
                 WHERE i1.billing_country = i2.billing_country);

For each invoice i1, the subquery computes the average total for that invoice’s country, then the outer query keeps the invoice only if it beats its own country’s average. Expressing “compare each row to a group it belongs to” is exactly what correlated subqueries do best.


The EXISTS and NOT EXISTS Operators

Correlated subqueries pair naturally with the EXISTS operator. EXISTS is a logical test that returns true if the subquery returns any row at all — it does not care about the values, only whether rows exist.

Finding Rows That Have a Match

To list customers who have purchased something:

SELECT first_name, last_name
  FROM customer c
 WHERE EXISTS (SELECT *
                 FROM invoice i
                WHERE c.customer_id = i.customer_id);

For each customer, the correlated subquery looks for at least one matching invoice. If one exists, EXISTS is true and the customer is kept.

first_namelast_name
LuísGonçalves
LeonieKöhler
FrançoisTremblay

Finding Rows With No Match

Negate it with NOT EXISTS to find rows that have no match. To find customers who never purchased anything:

SELECT first_name, last_name
  FROM customer c
 WHERE NOT EXISTS (SELECT *
                     FROM invoice i
                    WHERE c.customer_id = i.customer_id);

In Chinook this returns an empty table — every customer has bought at least once. The same pattern finds tracks that have never been sold by checking the invoice_line table:

SELECT *
  FROM track t
 WHERE NOT EXISTS (SELECT *
                     FROM invoice_line il
                    WHERE t.track_id = il.track_id);

For each track, the subquery looks for any invoice line referencing it. If none exists, the track has never been sold and is kept. NOT EXISTS is often the clearest way to express “things with no related records.”


Nested Subqueries

A nested subquery is a subquery inside another subquery. When information is spread across more than three tables, nesting lets you step from one table to the next without writing a single giant join.

Stepping Across Three Tables

Suppose you want the names of playlists that contain at least one track lasting 15 minutes or more (900000 milliseconds). That spans three tables: playlist, playlist_track, and track. Nested subqueries walk the chain:

SELECT name
  FROM playlist
 WHERE playlist_id IN (SELECT playlist_id
                         FROM playlist_track
                        WHERE track_id IN (SELECT track_id
                                             FROM track
                                            WHERE milliseconds >= 900000));

Read from the innermost query outward. The deepest subquery finds long tracks. The middle subquery finds the playlists containing those tracks. The outer query returns the names of those playlists. Each subquery feeds a list to the one above it.

Another Nested Example

The same approach finds employees who supported customers whose total spend exceeded 40 dollars:

SELECT last_name, first_name
  FROM employee
 WHERE employee_id IN (SELECT support_rep_id
                         FROM customer
                        WHERE customer_id IN (SELECT customer_id
                                                FROM invoice
                                               GROUP BY customer_id
                                              HAVING SUM(total) > 40));

The innermost query finds big-spending customers, the middle query finds their support reps, and the outer query returns those employees’ names. Notice that nesting and the other subquery features combine freely — here the innermost subquery uses GROUP BY and HAVING.


A Systematic Approach to Complex Subqueries

When a question needs several tables, writing the query in one shot is overwhelming. Use a repeatable process instead:

  1. Identify the tables that hold the information you need.
  2. Break the question into sub-problems.
  3. Write a small query for each sub-problem and confirm it works.
  4. Combine and modify those queries into a single statement.
  5. Optimize if needed — there is usually more than one valid approach.

Worked Example

Say you want, for each invoice in the United States, its total purchase amount and the total minutes of Metal-genre tracks it contains. That needs invoice, invoice_line, track, and genre.

First, the US customers’ invoices:

SELECT invoice_id
  FROM invoice
 WHERE billing_country = 'USA';

Next, the Metal genre ids:

SELECT genre_id
  FROM genre
 WHERE name LIKE '%Metal%';

Then join invoice_line and track so price, quantity, and duration sit together, filtered to US invoices:

SELECT il.*, t.milliseconds, t.genre_id
  FROM invoice_line il
  JOIN track t
    ON il.track_id = t.track_id
 WHERE invoice_id IN (SELECT invoice_id
                        FROM invoice
                       WHERE billing_country = 'USA');

Finally, wrap that joined result as a subquery in the FROM clause, filter it to Metal tracks with the genre subquery, and aggregate per invoice:

SELECT tr.invoice_id AS invoice_id,
       SUM(tr.quantity * tr.unit_price) AS total,
       SUM(tr.milliseconds) / 1000.0 / 60 AS minute
  FROM (SELECT il.*, t.milliseconds, t.genre_id
          FROM invoice_line il
          JOIN track t
            ON il.track_id = t.track_id
         WHERE invoice_id IN (SELECT invoice_id
                                FROM invoice
                               WHERE billing_country = 'USA')) tr
 WHERE tr.genre_id IN (SELECT genre_id
                         FROM genre
                        WHERE name LIKE '%Metal%')
 GROUP BY tr.invoice_id;

Output (sample):

invoice_idtotalminute
54.9522.606333
382.9719.749417
394.9525.604750
595.9441.471950
601.989.148067

The query looks long, but you built it one sub-problem at a time. That is the only way to write complex SQL reliably.


Practice Exercises

Use the Chinook database for each exercise.

Exercise 1: Above the Customer’s Own Average

Write a correlated subquery that returns every invoice whose total is greater than the average total for that same customer. Select at least invoice_id and customer_id.

Hint

In the outer query alias the table i1; in the subquery alias it i2 and add WHERE i1.customer_id = i2.customer_id.

Exercise 2: Albums With No Sales

Use NOT EXISTS to find albums that have never had any of their tracks sold. You will correlate across album, track, and invoice_line.

Hint

For each album, check that no track belonging to it appears in invoice_line. A nested NOT EXISTS or a correlated subquery joining track to invoice_line both work.

Exercise 3: Playlists With Classical Tracks

Use nested subqueries to return the names of playlists that contain at least one track whose genre name is Classical.

Hint

Start from the innermost query: SELECT genre_id FROM genre WHERE name = 'Classical'. Use it to find track_id values, then playlist_id values, then playlist names.


Summary

You learned that a correlated subquery references the outer query and runs once per outer row, making it ideal for comparing each row to a group it belongs to. You used EXISTS and NOT EXISTS to test for the presence or absence of related rows, and you nested subqueries to walk across several tables. Most importantly, you saw a systematic way to build complex queries from small, verified pieces.

Key Concepts

  • Correlated subquery — an inner query that uses values from the outer query and runs per outer row; it cannot execute on its own.
  • EXISTS — true if the subquery returns at least one row; pairs naturally with correlated subqueries.
  • NOT EXISTS — true if the subquery returns no rows; the clearest way to find records with no match.
  • Nested subquery — a subquery inside another subquery, used to span more than three tables.
  • Systematic approach — identify tables, split into sub-problems, solve each, combine, then optimize.

Why This Matters

Correlated subqueries and EXISTS answer some of the most common real questions: “rows above their group’s average,” “customers with no recent orders,” “products that never sold.” Nesting lets you reach data scattered across many tables. And the step-by-step build process is the difference between staring at a blank editor and confidently assembling a query that works the first time.


Next Steps

Nested and correlated subqueries are powerful but can grow hard to read. Next you will learn common table expressions — the WITH clause — which let you name each subquery and lay your logic out top to bottom.

Continue to Lesson 4 - Common Table Expressions (CTEs)

Make complex queries readable with WITH, and traverse hierarchies using recursive CTEs

Back to Module Overview

Return to the Subqueries, CTEs, and Views module overview


Continue Building Your Skills

You now command every shape of subquery and every way of connecting them. The systematic build process you practiced here will serve you far beyond SQL — breaking a hard problem into verifiable pieces is the core skill of every data professional. Keep using it.