Lesson 2 - Joins with Filtering, Grouping, and Sorting

Welcome to Joins with Other Clauses

In the last lesson you combined two tables. But a join is rarely the whole query. You usually want to filter the combined rows, count them by group, or sort them — the same WHERE, GROUP BY, and ORDER BY clauses you already know. The good news is that they work on a joined result exactly as they do on a single table.

Say you have noticed monthly sales slipping and want to find out why. Answering that means combining several tables, filtering to a period, and aggregating the result. This lesson shows you how the pieces fit together.

By the end of this lesson, you will be able to:

  • Treat the result of a join as a temporary table you can query further
  • Combine joins with LIMIT, WHERE, GROUP BY, and ORDER BY
  • Join three or more tables in a single query
  • State the order in which SQL executes its clauses

You will keep working with the Chinook music-store database. Let’s begin.

Data for this lesson

Database: chinook.db — a digital music store (SQLite).

Tables used: invoice_line, track, media_type, genre

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


The Key Idea: A Join Is a Temporary Table

The single most useful idea in this lesson is this: FROM, JOIN, and ON run first and run together, producing a temporary result you can then treat like any ordinary table.

In other words, SQL sees this query:

SELECT *
  FROM a_table
  JOIN another_table
    ON a_table.some_column = another_table.some_column;

as essentially the same as:

SELECT *
  FROM some_table;

…where some_table is the combined output of the join. Once you accept that, every clause you already know — WHERE, GROUP BY, HAVING, ORDER BY, LIMIT — slots onto the join with no new rules. The only extra care: when you name a column, qualify it with the table it came from.

Joins and Database Schemas

The links you join on are not random. They are defined by the database schema — the plan that describes how data is organized into tables and how those tables relate to one another. A schema is usually defined in code and drawn as a diagram, where lines connect a foreign key in one table to a primary key in another.

The Chinook schema connects, among many others:

  • invoice_line to track through track_id
  • track to media_type through media_type_id
  • track to genre through genre_id
  • invoice to customer through customer_id
  • customer to employee through support_rep_id (pointing at employee_id)

When you join tables, you follow these documented relationships. In a real job, the schema diagram for your database will usually be available from your team — keep it handy.


Joins with LIMIT

LIMIT trims the result to a number of rows, and it runs last of all. Adding it to a join is straightforward:

SELECT *
  FROM invoice_line AS il
  JOIN track AS t
    ON il.track_id = t.track_id
 LIMIT 3;

Because the join produces a temporary table, LIMIT 3 simply returns its first three rows. LIMIT is invaluable while you are developing a query: it lets you check the shape of the output without waiting for thousands of rows.


Joins with WHERE

WHERE filters rows. On a joined result, it filters the combined rows. Picture the temporary table from the join, then add a condition:

SELECT *
  FROM invoice_line AS il
  JOIN track AS t
    ON il.track_id = t.track_id
 WHERE il.invoice_id = 19;

This first joins invoice_line and track, then keeps only the rows belonging to invoice 19. Notice il.invoice_id is qualified — invoice_id could be ambiguous, so you state which table it comes from.

A helpful way to read this: the join builds a wide temporary table containing every invoice line and its track details, and WHERE then narrows it to one invoice. The join and the filter are independent steps you can reason about separately.


Joins with GROUP BY

GROUP BY collapses rows into groups so you can aggregate them with functions like COUNT(), SUM(), and AVG(). Combined with a join, you can summarize data that spans tables.

Suppose you want the number of tracks in each genre. The track count lives in track, but the genre names live in genre. Join them, then group:

SELECT g.name AS genre,
       COUNT(*) AS num_of_tracks
  FROM track AS t
  JOIN genre AS g
    ON t.genre_id = g.genre_id
 GROUP BY g.name
 ORDER BY num_of_tracks DESC;

The result gives one row per genre with its track count (largest first):

genre               | num_of_tracks
--------------------+--------------
Rock                | 1297
Latin               | 579
Metal               | 374
Alternative & Punk  | 332
Jazz                | 130
...                 | ...

Think it through with the temporary-table idea: the join produces a table where every track sits next to its genre name, and GROUP BY g.name then buckets those rows by genre so COUNT(*) can tally each bucket. It is exactly the grouping you already know, applied to a joined result.


Joining More Than Two Tables

Sometimes two tables are not enough. Suppose you want, for each track sold, its track_id, its name, its media type name, and the price paid. That information lives in three tables:

  • invoice_line and track relate through track_id
  • track and media_type relate through media_type_id

Because a join produces a temporary table, and JOIN is just another clause, you can join that temporary table to a third table. Build it up in stages. First join invoice_line and track:

SELECT il.track_id, il.unit_price,
       t.name, t.media_type_id
  FROM invoice_line AS il
  JOIN track AS t
    ON t.track_id = il.track_id;

The first three rows:

track_id | unit_price | name                  | media_type_id
---------+------------+-----------------------+--------------
2        | 0.99       | Balls to the Wall     | 2
4        | 0.99       | Restless and Wild     | 2
6        | 0.99       | Put The Finger On You | 1

Treat that as a temporary table and join it to media_type on media_type_id, selecting the columns you actually want and renaming media_type.name to media_type:

SELECT il.track_id, il.unit_price,
       t.name,
       mt.name AS media_type
  FROM invoice_line AS il
  JOIN track AS t
    ON t.track_id = il.track_id
  JOIN media_type AS mt
    ON t.media_type_id = mt.media_type_id
 LIMIT 3;

The result:

track_id | unit_price | name                  | media_type
---------+------------+-----------------------+--------------------------
2        | 0.99       | Balls to the Wall     | Protected AAC audio file
4        | 0.99       | Restless and Wild     | Protected AAC audio file
6        | 0.99       | Put The Finger On You | MPEG audio file

Two things worth noting:

  • Each new JOIN brings its own ON clause describing how it links to the tables already in play.
  • You join on media_type_id but never select it. You do not have to select the columns you join on — they are there to make the match, not necessarily to appear in the output.

You can chain as many joins as you need. A query that walks invoice → customer → employee lets you, for instance, attach the support representative’s name to each invoice, even though invoice has no direct link to employee.


The Order of Execution

You have now met all the major clauses. Here is the general shape of a query that uses them together:

SELECT columns, AGGREGATING_FUNCTION(expression)
  FROM a_table
  JOIN another_table
    ON a_table.some_column = another_table.some_column
 WHERE a_condition
 GROUP BY columns
HAVING some_condition
 ORDER BY some_columns
 LIMIT N;

SQL does not run these clauses top to bottom in the order you write them. It runs them in this order:

  1. FROM
  2. JOIN
  3. ON
  4. WHERE
  5. GROUP BY
  6. HAVING
  7. SELECT
  8. ORDER BY
  9. LIMIT

The crucial addition from this lesson is that JOIN and ON execute right after FROM, before everything else. That is exactly why the temporary-table model works: by the time WHERE, GROUP BY, or SELECT runs, the join has already produced the combined table they operate on.

This also explains a detail you may have wondered about: SELECT runs near the end, after GROUP BY. That is why a WHERE clause cannot reference a column alias you created in SELECT — at the moment WHERE runs, that alias does not exist yet.


Practice Exercises

Work these against the Chinook database. Keep the schema relationships from earlier in mind.

Exercise 1: Limit a Two-Table Join

Join invoice_line and track on track_id, and return only the first three rows.

-- Your code here

Hint

Build the join as usual, then add LIMIT 3 as the final clause. Aliasing the tables (il and t) keeps it short.

Exercise 2: Filter a Join

Join invoice_line and track on track_id, then keep only the rows whose invoice_id is 19.

-- Your code here

Hint

Add WHERE il.invoice_id = 19 after the ON clause. Qualify invoice_id with the invoice_line table since it appears there.

Exercise 3: Tracks per Genre

Produce a result with two columns: genre (the genre name) and num_of_tracks (the count of tracks in that genre). Join genre and track, then group by genre name.

-- Your code here

Hint

Join on genre_id. Select g.name AS genre and COUNT(*) AS num_of_tracks, then add GROUP BY g.name.

Exercise 4: Join Three Tables

For each invoice, return all columns from invoice plus the first name of the employee who handled it. The link runs invoice → customer → employee: invoice.customer_id = customer.customer_id, then customer.support_rep_id = employee.employee_id.

-- Your code here

Hint

You need three tables but only select columns from two of them. Use SELECT i.*, e.first_name and join customer purely to bridge invoice to employee. You do not have to select any column from customer.


Summary

You learned to treat a join as a temporary table and then layer the rest of SQL on top of it. LIMIT, WHERE, GROUP BY, and ORDER BY all work on a joined result the same way they work on a single table — you just qualify your columns. You also learned to join three or more tables by chaining JOIN ... ON clauses, and you saw the order in which SQL actually executes its clauses, with JOIN and ON running right after FROM.

Key Concepts

  • Join as a temporary tableFROM, JOIN, and ON run first and produce a result you can query like any table.
  • Database schema — the plan defining how data is split across tables and how the tables relate.
  • Chained joins — add more JOIN ... ON clauses to combine three or more tables in one query.
  • Order of executionFROM, JOIN, ON, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT.
  • Joining without selecting — columns used in ON do not need to appear in SELECT.

Why This Matters

Real analytical questions almost always combine “connect these tables” with “now filter, group, and sort the result.” Knowing that a join simply produces a table you can keep querying means you never have to learn special rules for joined data — your whole SQL toolkit just works. And understanding execution order is what lets you debug the puzzling errors, like why an alias is not available in WHERE, that trip up almost everyone at first.


Next Steps

So far every join has matched rows on equality and discarded the unmatched ones. Next, you will keep unmatched rows with outer joins, match a table to itself with self joins, and generate every combination with cross joins.

Continue to Lesson 3 - Self Joins, Outer Joins, and Cross Joins

Match a table to itself, keep unmatched rows, and generate all combinations of rows

Back to Module Overview

Return to the Combining Tables with Joins module overview


Continue Building Your Skills

You can now build reports that span several tables and shape them with filtering, grouping, and sorting — the bread and butter of analytical SQL. Keep the temporary-table model in mind: it is the single idea that makes every clause behave predictably no matter how many tables you combine.