Lesson 1 - Introduction to Joins
Welcome to Joins
Up to now, every query you have written has pulled from a single table. Real databases do not work that way. They spread information across many tables on purpose, and the most useful questions almost always need data from more than one of them. The tool that brings those tables back together is the join.
Imagine you need a list of sales by customer that includes each customer’s email address. The invoice table records sales, but it only stores a customer_id — not the email. The email lives in the customer table. A join lets you connect the two so you can see them side by side in a single result.
By the end of this lesson, you will be able to:
- Combine data from two tables using an inner join
- Explain, step by step, how SQL processes a join
- Select specific columns from different tables and resolve ambiguous column names
- Alias tables (and columns) to keep your queries short and readable
You will work with the Chinook database, a realistic model of a digital music store. No prior join experience is needed. Let’s begin.
Data for this lesson
Database: chinook.db — a digital music store (SQLite).
Tables used: customer, invoice
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
Fetching Data from Different Tables
Some tables are linked through a shared column. Look at small samples of the customer and invoice tables. The customer table holds one row per customer:
customer_id | first_name | last_name | email
------------+------------+-----------+----------------------
2 | Leonie | Köhler | [email protected]
4 | Bjørn | Hansen | [email protected]
8 | Daan | Peeters | [email protected]The invoice table holds one row per sale, and each sale records which customer it belongs to through a customer_id:
invoice_id | customer_id | invoice_date | total
-----------+-------------+---------------------+-------
1 | 2 | 2021-01-01 00:00:00 | 1.98
2 | 4 | 2021-01-02 00:00:00 | 3.96
3 | 8 | 2021-01-03 00:00:00 | 5.94
12 | 2 | 2021-02-11 00:00:00 | 13.86Notice that customer_id appears in both tables. That shared column is the link between them. The column that uniquely identifies a row in its own table (customer_id in customer) is the primary key; the column that points back to another table (customer_id in invoice) is a foreign key.
Following the Link by Hand
Suppose you want to know who placed invoices 1 and 12. You could work it out manually:
- Find invoices
1and12in theinvoicetable. - Read their
customer_idvalue — both are2. - Look up the row in
customerwhosecustomer_idis2. That is Leonie Köhler. - Record the combined information in one table.
The result of that manual lookup would be:
customer_id | first_name | email | invoice_id | invoice_date | total
------------+------------+-----------------------+------------+---------------------+-------
2 | Leonie | [email protected] | 1 | 2021-01-01 00:00:00 | 1.98
2 | Leonie | [email protected] | 12 | 2021-02-11 00:00:00 | 13.86That works for two invoices. It would be hopeless for thousands. A join tells SQL to do exactly this matching for the entire table at once.
Your First Inner Join
The most common join is the inner join: it keeps only the rows where the link matches in both tables. Here is how you combine customer and invoice:
SELECT *
FROM customer
INNER JOIN invoice
ON customer.customer_id = invoice.customer_id;This selects every column from both tables. Showing just a few columns for the earliest invoices (sorted by invoice_id), the result looks like this:
customer_id | first_name | email | invoice_id | invoice_date | total
------------+------------+-----------------------+------------+---------------------+-------
2 | Leonie | [email protected] | 1 | 2021-01-01 00:00:00 | 1.98
4 | Bjørn | [email protected] | 2 | 2021-01-02 00:00:00 | 3.96
8 | Daan | [email protected] | 3 | 2021-01-03 00:00:00 | 5.94
2 | Leonie | [email protected] | 12 | 2021-02-11 00:00:00 | 13.86Each sale now sits next to the customer who made it. Leonie (customer 2) appears twice because she has two invoices in this slice.
Reading the Syntax
The query introduced two new clauses: INNER JOIN and ON. Read it piece by piece:
SELECT *selects all columns.FROM customernames the first table.INNER JOIN invoicenames the second table to combine with it.ON customer.customer_id = invoice.customer_idsays how to match rows.
The ON clause is the heart of the join. It tells SQL to pair a customer row with an invoice row only when their customer_id values are equal. Notice the pattern table_name.column_name: because customer_id exists in both tables, you must say which one you mean.
When you write only JOIN without a type, SQL defaults to INNER JOIN. So the query above is identical to:
SELECT *
FROM customer
JOIN invoice
ON customer.customer_id = invoice.customer_id;You will see plain JOIN used most often in practice.
How a Join Actually Works
To see the matching process clearly, it helps to use two tiny made-up tables, hue and palette. They are not part of Chinook — they exist only to make the mechanics obvious.
hue palette
----------- ---------------
color | no colour | number
------+---- -------+-------
Red | 1 Purple | 2
Green | 2 Green | 2
Green | 3 Green | 4
Blue | 1 Blue | 5
Orange | 4Consider this join, matching hue.color to palette.colour:
SELECT *
FROM hue
JOIN palette
ON hue.color = palette.colour;Conceptually, SQL walks through the rows like this:
- Take the first row of
hue(Red). - Compare it against every row of
palette, checking whetherhue.color = palette.colour. - For every match, add a combined row to the result.
- Move to the next row of
hueand repeat.
Working through each hue row:
Redmatches nothing inpalette, so it contributes no rows.Green(no 2) matches the twoGreenrows inpalette, producing two result rows.Green(no 3) also matches bothGreenrows, producing two more.Bluematches the singleBluerow, producing one row.
The final result:
color | no | colour | number
------+----+--------+-------
Green | 2 | Green | 2
Green | 2 | Green | 4
Green | 3 | Green | 2
Green | 3 | Green | 4
Blue | 1 | Blue | 5Red disappears because an inner join keeps only matched rows.
Note
This row-by-row description is a useful mental model, but it is not literally how a database engine runs the query. Modern engines use sophisticated algorithms to find matches efficiently. The result is the same; the path to it is faster.
Table Order Does Not Matter for Inner Joins
Swapping the two tables in an inner join returns the same data (the column and row order may shift, but the values are identical):
SELECT *
FROM palette
JOIN hue
ON hue.color = palette.colour;This symmetry is specific to inner joins. As you will see in a later lesson, the outer joins do care about which table comes first.
Selecting Specific Columns
Selecting * returns every column from both tables, which is rarely what you want. Usually you need just a handful. Try fetching only invoice_id, invoice_date, total, customer_id, and email:
SELECT invoice_id, invoice_date, total, customer_id, email
FROM invoice
JOIN customer
ON customer.customer_id = invoice.customer_id;This raises an error:
Error: ambiguous column name: customer_idBecause customer_id exists in both tables, SQL cannot tell which one you mean. Even though the join condition guarantees they hold the same value, most SQL dialects will not guess. You resolve the ambiguity by qualifying the column with its table name:
SELECT invoice_id, invoice_date, total, customer.customer_id, email
FROM invoice
JOIN customer
ON customer.customer_id = invoice.customer_id;Writing customer.customer_id instead of bare customer_id tells SQL exactly where to pull that column from, and the query runs.
Selecting All Columns from One Table
You can mix a table-wide * with individual columns. To attach the customer’s email to every column of invoice:
SELECT invoice.*, customer.email
FROM invoice
JOIN customer
ON invoice.customer_id = customer.customer_id;invoice.* pulls every column from invoice, and customer.email adds the one column you need from customer.
A Best Practice for Readability
Even when there is no ambiguity, it is good practice to qualify every column with its table name in a join. It makes the origin of each column obvious to anyone reading the query later. It also helps to place columns from different tables on separate lines:
SELECT invoice.invoice_id, invoice.invoice_date, invoice.total,
customer.customer_id, customer.email
FROM invoice
JOIN customer
ON customer.customer_id = invoice.customer_id;These are guidelines, not rules, but following them keeps multi-table queries legible.
Aliasing Tables
Qualifying every column with its full table name quickly gets verbose. Look at how often invoice and customer repeat above. SQL lets you give each table a short alias using AS, just as you alias columns.
Alias invoice as i and customer as c:
SELECT i.invoice_id, i.invoice_date, i.total,
c.customer_id, c.email
FROM invoice AS i
JOIN customer AS c
ON c.customer_id = i.customer_id;Once a table has an alias, you use that alias everywhere — in SELECT, in ON, anywhere you would otherwise type the full name. The common convention is to use the table’s first letter (i for invoice, c for customer), which makes it easy to see at a glance where each column comes from.
You can alias columns at the same time. Here total is renamed to invoice_total:
SELECT i.invoice_id, i.invoice_date, i.total AS invoice_total,
c.customer_id, c.email
FROM invoice AS i
JOIN customer AS c
ON c.customer_id = i.customer_id;Three rows of the result (for the earliest invoices):
invoice_id | invoice_date | invoice_total | customer_id | email
-----------+---------------------+---------------+-------------+-----------------------
1 | 2021-01-01 00:00:00 | 1.98 | 2 | [email protected]
2 | 2021-01-02 00:00:00 | 3.96 | 4 | [email protected]
3 | 2021-01-03 00:00:00 | 5.94 | 8 | [email protected]This is the readable, professional form of a join: short table aliases, qualified columns, and one logical group per line.
Practice Exercises
Try these against the Chinook database before checking the hints.
Exercise 1: Join Invoice Lines to Tracks
The invoice_line table records each track sold on an invoice, but it stores only a track_id. The track’s name lives in the track table. Write a query that inner-joins invoice_line and track on track_id so each result row shows the invoice line together with the track it refers to.
-- Your code hereHint
Both tables share a track_id column. Join on it and qualify the column with its table name, like invoice_line.track_id = track.track_id.
Exercise 2: Customer and Invoice Details
Join customer and invoice so the result header reads, in order: customer_id, first_name, last_name, email, invoice_id, invoice_date, total.
-- Your code hereHint
Join on the common customer_id column. Qualify customer_id with customer. so it is not ambiguous, and list the columns in the order requested.
Exercise 3: Tracks and Genres with Aliases
Join track to genre on genre_id. Select track_id, the track’s name (aliased as track_name), composer, and the genre’s name (aliased as genre). Use table aliases and qualify every column.
-- Your code hereHint
Alias track as t and genre as g. Both tables have a name column, so you must qualify them: t.name AS track_name and g.name AS genre. Join on genre_id.
Summary
You learned how SQL combines tables. A join matches rows from two tables on a shared column, and the inner join — written INNER JOIN or simply JOIN — keeps only the rows that match in both. The ON clause specifies the matching condition, and you qualify columns with table_name.column_name to remove ambiguity. Table aliases keep multi-table queries short and readable.
Key Concepts
- Join — a SQL operation that combines rows from two or more tables based on a related column.
- Inner join — keeps only rows that have a matching value in both tables;
JOINdefaults to this. ONclause — defines how rows are matched, for exampleON a.id = b.id.- Primary key / foreign key — the column that uniquely identifies a row, and the column in another table that points back to it.
- Qualified column — a column written as
table.columnto say exactly which table it comes from. - Table alias — a short name for a table set with
AS, such ascustomer AS c.
Why This Matters
Almost no real database keeps everything in one table. Customers, orders, products, and employees each get their own table, connected by keys. Joins are how you reassemble that data into the reports people actually ask for: sales by customer, orders by product, tickets by support agent. Master the inner join and you unlock the single most common operation in day-to-day SQL work.
Next Steps
You can now combine two tables and pull exactly the columns you need. Next, you will use joins alongside the filtering, grouping, and sorting clauses you already know — and join more than two tables at once.
Continue to Lesson 2 - Joins with Filtering, Grouping, and Sorting
Join more than two tables and combine joins with WHERE, GROUP BY, and ORDER BY
Back to Module Overview
Return to the Combining Tables with Joins module overview
Continue Building Your Skills
You just took the step that turns single-table SQL into real database work. Every report you build from here on will lean on joins. Practice writing them with aliases and qualified columns until the pattern feels automatic — your future, more complex queries will be far easier to read because of it.