Lesson 1 - Introduction to Joins

Welcome to Joins

Up to now, every query you have written has pulled from a single table. Real databases do not work that way. They spread information across many tables on purpose, and the most useful questions almost always need data from more than one of them. The tool that brings those tables back together is the join.

Imagine you need a list of sales by customer that includes each customer’s email address. The invoice table records sales, but it only stores a customer_id — not the email. The email lives in the customer table. A join lets you connect the two so you can see them side by side in a single result.

By the end of this lesson, you will be able to:

  • Combine data from two tables using an inner join
  • Explain, step by step, how SQL processes a join
  • Select specific columns from different tables and resolve ambiguous column names
  • Alias tables (and columns) to keep your queries short and readable

You will work with the Chinook database, a realistic model of a digital music store. No prior join experience is needed. Let’s begin.

Data for this lesson

Database: chinook.db — a digital music store (SQLite).

Tables used: customer, invoice

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


Fetching Data from Different Tables

Some tables are linked through a shared column. Look at small samples of the customer and invoice tables. The customer table holds one row per customer:

customer_id | first_name | last_name | email
------------+------------+-----------+----------------------
2           | Leonie     | Köhler    | [email protected]
4           | Bjørn      | Hansen    | [email protected]
8           | Daan       | Peeters   | [email protected]

The invoice table holds one row per sale, and each sale records which customer it belongs to through a customer_id:

invoice_id | customer_id | invoice_date        | total
-----------+-------------+---------------------+-------
1          | 2           | 2021-01-01 00:00:00 | 1.98
2          | 4           | 2021-01-02 00:00:00 | 3.96
3          | 8           | 2021-01-03 00:00:00 | 5.94
12         | 2           | 2021-02-11 00:00:00 | 13.86

Notice that customer_id appears in both tables. That shared column is the link between them. The column that uniquely identifies a row in its own table (customer_id in customer) is the primary key; the column that points back to another table (customer_id in invoice) is a foreign key.

Suppose you want to know who placed invoices 1 and 12. You could work it out manually:

  1. Find invoices 1 and 12 in the invoice table.
  2. Read their customer_id value — both are 2.
  3. Look up the row in customer whose customer_id is 2. That is Leonie Köhler.
  4. Record the combined information in one table.

The result of that manual lookup would be:

customer_id | first_name | email                 | invoice_id | invoice_date        | total
------------+------------+-----------------------+------------+---------------------+-------
2           | Leonie     | [email protected] | 1          | 2021-01-01 00:00:00 | 1.98
2           | Leonie     | [email protected] | 12         | 2021-02-11 00:00:00 | 13.86

That works for two invoices. It would be hopeless for thousands. A join tells SQL to do exactly this matching for the entire table at once.


Your First Inner Join

The most common join is the inner join: it keeps only the rows where the link matches in both tables. Here is how you combine customer and invoice:

SELECT *
  FROM customer
 INNER JOIN invoice
    ON customer.customer_id = invoice.customer_id;

This selects every column from both tables. Showing just a few columns for the earliest invoices (sorted by invoice_id), the result looks like this:

customer_id | first_name | email                 | invoice_id | invoice_date        | total
------------+------------+-----------------------+------------+---------------------+-------
2           | Leonie     | [email protected] | 1          | 2021-01-01 00:00:00 | 1.98
4           | Bjørn      | [email protected] | 2          | 2021-01-02 00:00:00 | 3.96
8           | Daan       | [email protected] | 3          | 2021-01-03 00:00:00 | 5.94
2           | Leonie     | [email protected] | 12         | 2021-02-11 00:00:00 | 13.86

Each sale now sits next to the customer who made it. Leonie (customer 2) appears twice because she has two invoices in this slice.

Reading the Syntax

The query introduced two new clauses: INNER JOIN and ON. Read it piece by piece:

  1. SELECT * selects all columns.
  2. FROM customer names the first table.
  3. INNER JOIN invoice names the second table to combine with it.
  4. ON customer.customer_id = invoice.customer_id says how to match rows.

The ON clause is the heart of the join. It tells SQL to pair a customer row with an invoice row only when their customer_id values are equal. Notice the pattern table_name.column_name: because customer_id exists in both tables, you must say which one you mean.

When you write only JOIN without a type, SQL defaults to INNER JOIN. So the query above is identical to:

SELECT *
  FROM customer
  JOIN invoice
    ON customer.customer_id = invoice.customer_id;

You will see plain JOIN used most often in practice.


How a Join Actually Works

To see the matching process clearly, it helps to use two tiny made-up tables, hue and palette. They are not part of Chinook — they exist only to make the mechanics obvious.

hue                  palette
-----------          ---------------
color | no           colour | number
------+----          -------+-------
Red   | 1            Purple | 2
Green | 2            Green  | 2
Green | 3            Green  | 4
Blue  | 1            Blue   | 5
                     Orange | 4

Consider this join, matching hue.color to palette.colour:

SELECT *
  FROM hue
  JOIN palette
    ON hue.color = palette.colour;

Conceptually, SQL walks through the rows like this:

  1. Take the first row of hue (Red).
  2. Compare it against every row of palette, checking whether hue.color = palette.colour.
  3. For every match, add a combined row to the result.
  4. Move to the next row of hue and repeat.

Working through each hue row:

  • Red matches nothing in palette, so it contributes no rows.
  • Green (no 2) matches the two Green rows in palette, producing two result rows.
  • Green (no 3) also matches both Green rows, producing two more.
  • Blue matches the single Blue row, producing one row.

The final result:

color | no | colour | number
------+----+--------+-------
Green | 2  | Green  | 2
Green | 2  | Green  | 4
Green | 3  | Green  | 2
Green | 3  | Green  | 4
Blue  | 1  | Blue   | 5

Red disappears because an inner join keeps only matched rows.

Note

This row-by-row description is a useful mental model, but it is not literally how a database engine runs the query. Modern engines use sophisticated algorithms to find matches efficiently. The result is the same; the path to it is faster.

Table Order Does Not Matter for Inner Joins

Swapping the two tables in an inner join returns the same data (the column and row order may shift, but the values are identical):

SELECT *
  FROM palette
  JOIN hue
    ON hue.color = palette.colour;

This symmetry is specific to inner joins. As you will see in a later lesson, the outer joins do care about which table comes first.


Selecting Specific Columns

Selecting * returns every column from both tables, which is rarely what you want. Usually you need just a handful. Try fetching only invoice_id, invoice_date, total, customer_id, and email:

SELECT invoice_id, invoice_date, total, customer_id, email
  FROM invoice
  JOIN customer
    ON customer.customer_id = invoice.customer_id;

This raises an error:

Error: ambiguous column name: customer_id

Because customer_id exists in both tables, SQL cannot tell which one you mean. Even though the join condition guarantees they hold the same value, most SQL dialects will not guess. You resolve the ambiguity by qualifying the column with its table name:

SELECT invoice_id, invoice_date, total, customer.customer_id, email
  FROM invoice
  JOIN customer
    ON customer.customer_id = invoice.customer_id;

Writing customer.customer_id instead of bare customer_id tells SQL exactly where to pull that column from, and the query runs.

Selecting All Columns from One Table

You can mix a table-wide * with individual columns. To attach the customer’s email to every column of invoice:

SELECT invoice.*, customer.email
  FROM invoice
  JOIN customer
    ON invoice.customer_id = customer.customer_id;

invoice.* pulls every column from invoice, and customer.email adds the one column you need from customer.

A Best Practice for Readability

Even when there is no ambiguity, it is good practice to qualify every column with its table name in a join. It makes the origin of each column obvious to anyone reading the query later. It also helps to place columns from different tables on separate lines:

SELECT invoice.invoice_id, invoice.invoice_date, invoice.total,
       customer.customer_id, customer.email
  FROM invoice
  JOIN customer
    ON customer.customer_id = invoice.customer_id;

These are guidelines, not rules, but following them keeps multi-table queries legible.


Aliasing Tables

Qualifying every column with its full table name quickly gets verbose. Look at how often invoice and customer repeat above. SQL lets you give each table a short alias using AS, just as you alias columns.

Alias invoice as i and customer as c:

SELECT i.invoice_id, i.invoice_date, i.total,
       c.customer_id, c.email
  FROM invoice AS i
  JOIN customer AS c
    ON c.customer_id = i.customer_id;

Once a table has an alias, you use that alias everywhere — in SELECT, in ON, anywhere you would otherwise type the full name. The common convention is to use the table’s first letter (i for invoice, c for customer), which makes it easy to see at a glance where each column comes from.

You can alias columns at the same time. Here total is renamed to invoice_total:

SELECT i.invoice_id, i.invoice_date, i.total AS invoice_total,
       c.customer_id, c.email
  FROM invoice AS i
  JOIN customer AS c
    ON c.customer_id = i.customer_id;

Three rows of the result (for the earliest invoices):

invoice_id | invoice_date        | invoice_total | customer_id | email
-----------+---------------------+---------------+-------------+-----------------------
1          | 2021-01-01 00:00:00 | 1.98          | 2           | [email protected]
2          | 2021-01-02 00:00:00 | 3.96          | 4           | [email protected]
3          | 2021-01-03 00:00:00 | 5.94          | 8           | [email protected]

This is the readable, professional form of a join: short table aliases, qualified columns, and one logical group per line.


Practice Exercises

Try these against the Chinook database before checking the hints.

Exercise 1: Join Invoice Lines to Tracks

The invoice_line table records each track sold on an invoice, but it stores only a track_id. The track’s name lives in the track table. Write a query that inner-joins invoice_line and track on track_id so each result row shows the invoice line together with the track it refers to.

-- Your code here

Hint

Both tables share a track_id column. Join on it and qualify the column with its table name, like invoice_line.track_id = track.track_id.

Exercise 2: Customer and Invoice Details

Join customer and invoice so the result header reads, in order: customer_id, first_name, last_name, email, invoice_id, invoice_date, total.

-- Your code here

Hint

Join on the common customer_id column. Qualify customer_id with customer. so it is not ambiguous, and list the columns in the order requested.

Exercise 3: Tracks and Genres with Aliases

Join track to genre on genre_id. Select track_id, the track’s name (aliased as track_name), composer, and the genre’s name (aliased as genre). Use table aliases and qualify every column.

-- Your code here

Hint

Alias track as t and genre as g. Both tables have a name column, so you must qualify them: t.name AS track_name and g.name AS genre. Join on genre_id.


Summary

You learned how SQL combines tables. A join matches rows from two tables on a shared column, and the inner join — written INNER JOIN or simply JOIN — keeps only the rows that match in both. The ON clause specifies the matching condition, and you qualify columns with table_name.column_name to remove ambiguity. Table aliases keep multi-table queries short and readable.

Key Concepts

  • Join — a SQL operation that combines rows from two or more tables based on a related column.
  • Inner join — keeps only rows that have a matching value in both tables; JOIN defaults to this.
  • ON clause — defines how rows are matched, for example ON a.id = b.id.
  • Primary key / foreign key — the column that uniquely identifies a row, and the column in another table that points back to it.
  • Qualified column — a column written as table.column to say exactly which table it comes from.
  • Table alias — a short name for a table set with AS, such as customer AS c.

Why This Matters

Almost no real database keeps everything in one table. Customers, orders, products, and employees each get their own table, connected by keys. Joins are how you reassemble that data into the reports people actually ask for: sales by customer, orders by product, tickets by support agent. Master the inner join and you unlock the single most common operation in day-to-day SQL work.


Next Steps

You can now combine two tables and pull exactly the columns you need. Next, you will use joins alongside the filtering, grouping, and sorting clauses you already know — and join more than two tables at once.

Continue to Lesson 2 - Joins with Filtering, Grouping, and Sorting

Join more than two tables and combine joins with WHERE, GROUP BY, and ORDER BY

Back to Module Overview

Return to the Combining Tables with Joins module overview


Continue Building Your Skills

You just took the step that turns single-table SQL into real database work. Every report you build from here on will lean on joins. Practice writing them with aliases and qualified columns until the pattern feels automatic — your future, more complex queries will be far easier to read because of it.