Lesson 2 - Multi-Row and Multi-Column Subqueries

Welcome to Multi-Row and Multi-Column Subqueries

A scalar subquery hands you one value. But many questions need a set of values — “which tracks belong to any of these media types?” — or even a whole table — “what is the average of each country’s largest sale?” For those, you need subqueries that return multiple rows or multiple columns.

In this lesson you will write multi-row subqueries that feed list operators like IN and NOT IN, and multi-column subqueries that stand in for tables in the FROM clause and in joins.

By the end of this lesson, you will be able to:

  • Write a multi-row subquery that returns several values in one column
  • Filter rows with IN and NOT IN against a subquery
  • Write a multi-column subquery that returns a table
  • Use a subquery as a table in the FROM clause
  • Join a subquery to a table or to another subquery

You should already understand scalar subqueries from Lesson 1. Let’s begin.

Data for this lesson

Database: chinook.db — a digital music store (SQLite).

Tables used: track, media_type, customer, invoice

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


Multi-Row Subqueries: Returning a List

A multi-row subquery returns several values in a single column — effectively a list. You cannot compare a column to a list with =, so you use the IN operator instead.

From a Hardcoded List to a Subquery

Look at the media_type table in Chinook:

media_type_idname
1MPEG audio file
2Protected AAC audio file
3Protected MPEG-4 video file
4Purchased AAC audio file
5AAC audio file

Media types 1 and 3 are the MPEG formats. To count tracks in those formats, you might hardcode the ids:

SELECT COUNT(*) AS tracks_tally
  FROM track
 WHERE media_type_id IN (1, 3);

But hardcoding (1, 3) means you must update the query every time the catalog changes. Instead, let a subquery generate those ids:

SELECT COUNT(*) AS tracks_tally
  FROM track
 WHERE media_type_id IN (SELECT media_type_id
                           FROM media_type
                          WHERE name LIKE '%MPEG%');

The inner query returns the ids of every MPEG media type — here, the list (1, 3). The outer query then counts tracks whose media_type_id appears in that list. If a new MPEG format is added later, this query picks it up automatically.

Filtering with a Subquery List

The same pattern filters customers. To return all invoices belonging to customers whose first name starts with the letter A:

SELECT *
  FROM invoice
 WHERE customer_id IN (SELECT customer_id
                         FROM customer
                        WHERE first_name LIKE 'A%');

The subquery produces the list of matching customer_id values; the outer query keeps only invoices whose customer_id is in that list. The key difference from a scalar subquery is the operator: you use IN, not =, because the subquery returns many values.


The NOT IN Operator

Sometimes the easier question is the negative one: not in a set. The NOT IN operator keeps rows whose value is absent from the subquery’s list.

Consider counting tracks that are not protected. The protected formats share the word “Protected”, but the unprotected ones share no pattern. You have two clean options.

You can describe the unprotected formats directly with NOT LIKE:

SELECT COUNT(*) AS tracks_tally
  FROM track
 WHERE media_type_id IN (SELECT media_type_id
                           FROM media_type
                          WHERE name NOT LIKE 'Protected%');

Or you can build the list of protected formats and exclude tracks that match it with NOT IN:

SELECT COUNT(*) AS tracks_tally
  FROM track
 WHERE media_type_id NOT IN (SELECT media_type_id
                               FROM media_type
                              WHERE name LIKE 'Protected%');

Both return the same count. The second reads especially well: “count tracks whose media type is not in the protected list.”

Combining NOT IN with HAVING

A subquery can itself contain a GROUP BY and HAVING. To list the names of customers who spent at least 100 dollars, you first build the list of customers who spent less than 100 and exclude them:

SELECT first_name, last_name
  FROM customer
 WHERE customer_id NOT IN (SELECT customer_id
                             FROM invoice
                            GROUP BY customer_id
                           HAVING SUM(total) < 100);

The subquery groups invoices by customer and keeps the low spenders. The outer query then returns the names of everyone not in that list — the customers who spent 100 dollars or more.


Multi-Column Subqueries: Returning a Table

A multi-column subquery returns several columns and rows — a whole table. You use it anywhere a table is allowed, most commonly in the FROM clause.

Why You Need This: Aggregating an Aggregate

Suppose you want the average of each country’s largest sale. You cannot do this in one flat aggregation — you would be averaging maxima, which requires two passes. A subquery as a table solves it:

SELECT AVG(billing_country_max) AS billing_country_max_avg
  FROM (SELECT billing_country, MAX(total) AS billing_country_max
          FROM invoice
         GROUP BY billing_country);

Read it inside-out. The inner query produces a small table — one row per country with that country’s maximum sale:

billing_countrybilling_country_max
Canada13.86
France16.86
USA23.86

The outer query treats that result as a table and averages the billing_country_max column, giving a single number such as 16.07. You have aggregated an already-aggregated table — something scalar and multi-row subqueries cannot do.

Practice the Pattern

The same shape answers “what is the average number of sales per billing city?”:

SELECT AVG(billing_city_tally) AS billing_city_tally_avg
  FROM (SELECT billing_city, COUNT(*) AS billing_city_tally
          FROM invoice
         GROUP BY billing_city);

The inner query counts sales per city; the outer query averages those counts.


Joining a Subquery

Because a multi-column subquery behaves like a table, you can join it to a real table or to another subquery. Give it an alias, just as you would any table.

Joining a Subquery to a Table

Let’s attach each customer’s average purchase amount to their name. The subquery computes the averages; the join brings in the names:

SELECT c.last_name, c.first_name, i.total_avg
  FROM customer AS c
  JOIN (SELECT customer_id, AVG(total) AS total_avg
          FROM invoice
         GROUP BY customer_id) AS i
    ON c.customer_id = i.customer_id;

Output (sample):

last_namefirst_nametotal_avg
GonçalvesLuís5.660000
KöhlerLeonie5.374286
TremblayFrançois5.660000

Notice the subquery has the alias i and includes customer_id, which is required so the join has a key to match on.

Joining Two Subqueries Together

You can even join two subqueries to each other. To compute the average number of sales per customer in each country — sales per country divided by customers per country — you build both aggregates separately and join them:

SELECT ct.country,
       i.invoice_tally / ct.customer_tally AS sale_avg_tally
  FROM (SELECT billing_country, COUNT(*) AS invoice_tally
          FROM invoice
         GROUP BY billing_country) AS i
  JOIN (SELECT country, COUNT(*) AS customer_tally
          FROM customer
         GROUP BY country) AS ct
    ON i.billing_country = ct.country
 ORDER BY sale_avg_tally DESC;

Subquery i counts invoices per country; subquery ct counts customers per country. The join links them on the country name, and the outer query divides one count by the other. This is the standard recipe for combining two independent aggregations.


Practice Exercises

Use the Chinook database for each exercise.

Exercise 1: Tracks by Genre List

Count the tracks whose genre name contains the word Rock. Use a subquery against the genre table to generate the matching genre_id values, and filter the track table with IN.

Hint

The subquery is (SELECT genre_id FROM genre WHERE name LIKE '%Rock%'). Filter track with WHERE genre_id IN (...).

Exercise 2: Customers With No Invoices

Return the first and last names of customers who have no invoices at all, using NOT IN and a subquery against the invoice table.

Hint

Select from customer where customer_id NOT IN (SELECT customer_id FROM invoice).

Exercise 3: Average of Per-Country Totals

Write a query that returns the average of each country’s total revenue. Use a multi-column subquery in the FROM clause that computes SUM(total) grouped by billing_country.

Hint

Inner query: SELECT billing_country, SUM(total) AS country_total FROM invoice GROUP BY billing_country. Outer query: SELECT AVG(country_total) FROM (...).


Summary

You expanded your subquery toolkit beyond single values. Multi-row subqueries return a list and pair with IN and NOT IN. Multi-column subqueries return a table and can sit in the FROM clause or be joined like any other table — which lets you aggregate already-aggregated data and combine independent aggregations.

Key Concepts

  • Multi-row subquery — returns several values in one column; use with IN and NOT IN.
  • IN / NOT IN — operators that test whether a value is present in (or absent from) a subquery’s list.
  • Multi-column subquery — returns a table; use it wherever a table is allowed (FROM, joins).
  • Subquery as a table — a subquery in FROM, given an alias, that the outer query reads like a real table.
  • Aggregating an aggregate — averaging or summing the rows of a grouped subquery, impossible in one flat query.

Why This Matters

Real analysis constantly combines results: “compare against the items in this category,” “exclude customers who churned,” “average the per-region totals.” Multi-row subqueries express the first two cleanly, while multi-column subqueries are how you stack one aggregation on top of another. Master both and you can answer layered business questions without exporting intermediate results to a spreadsheet.


Next Steps

You can now return lists and tables from subqueries. Next, you will nest subqueries inside other subqueries and write correlated subqueries that depend on the row currently being processed.

Continue to Lesson 3 - Nested and Correlated Subqueries

Nest subqueries within subqueries and write correlated subqueries that depend on the outer query

Back to Module Overview

Return to the Subqueries, CTEs, and Views module overview


Continue Building Your Skills

You now have three subquery shapes at your disposal: scalar values, lists, and tables. Each unlocks a different class of question. Keep an eye out for moments when you would otherwise run two queries and combine the results by hand — that is almost always a chance to use a multi-row or multi-column subquery instead.