Lesson 3 - Self Joins, Outer Joins, and Cross Joins
Welcome to the Less Common Joins
The inner join handles most of your work, but some real questions need more. What if the rows you want to match live in the same table? What if you must keep rows that have no match? What if you need every possible pairing of two tables? This lesson covers the joins that answer those questions.
By the end of this lesson, you will be able to:
- Join a table to itself with a self join
- Keep unmatched rows using
LEFT JOIN, and understandRIGHT JOINandFULL JOIN - Generate every combination of rows with
CROSS JOIN - Join on conditions that are not simple equalities (non-equi joins)
You will keep working with the Chinook database, plus the small hue and palette example tables from the previous lesson. Let’s begin.
Data for this lesson
Database: chinook.db — a digital music store (SQLite).
Tables used: employee, customer, track, invoice, invoice_line
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
Self Joins
A self join joins a table to itself. It sounds odd until you meet a table that relates to its own rows.
The Chinook employee table has a column called reports_to, which holds the employee_id of each employee’s manager — and that manager is another row in the same employee table. To list each employee next to their manager, you join employee to employee:
SELECT e1.employee_id,
e2.employee_id AS manager_id
FROM employee AS e1
JOIN employee AS e2
ON e1.reports_to = e2.employee_id;Two details make this work:
- You give the table two different aliases,
e1ande2. Without distinct aliases, SQL cannot tell the “employee” copy from the “manager” copy. Ine2.employee_id AS manager_id, usinge1by mistake would just repeat the first column. - You join on different columns:
e1.reports_tomatchese2.employee_id. That is the relationship the schema describes.
The result:
employee_id | manager_id
------------+-----------
2 | 1
3 | 2
4 | 2
5 | 2
6 | 1
7 | 6
8 | 6The employee table has eight rows, but the result has only seven. The general manager (employee 1) reports to no one, so their reports_to value is missing (NULL). The condition e1.reports_to = e2.employee_id is never true for that row, so an inner join drops it. Hold on to that observation — it is exactly the problem the next join solves.
LEFT JOIN: Keeping Unmatched Rows
An inner join discards rows that have no match. Often you want to keep them. The LEFT JOIN does exactly that: it keeps every row from the left table (the one after FROM), and fills the right table’s columns with NULL when there is no match.
Return to the hue and palette tables:
hue palette
----------- ---------------
color | no colour | number
------+---- -------+-------
Red | 1 Purple | 2
Green | 2 Green | 2
Green | 3 Green | 4
Blue | 1 Blue | 5
Orange | 4A left join from hue to palette:
SELECT *
FROM hue
LEFT JOIN palette
ON hue.color = palette.colour;The process mirrors the inner join, with one addition. For each hue row, SQL checks every palette row:
- If it matches, include the combined row (just like an inner join).
- If nothing matches, still include the
huerow, withNULLfor everypalettecolumn.
The result:
color | no | colour | number
------+----+--------+-------
Red | 1 | |
Green | 2 | Green | 2
Green | 2 | Green | 4
Green | 3 | Green | 2
Green | 3 | Green | 4
Blue | 1 | Blue | 5Red had no match in palette, so under an inner join it would vanish. The left join keeps it, leaving colour and number empty.
Table Order Now Matters
With inner joins, swapping the two tables changed nothing. With a LEFT JOIN, order is everything. “Left” means the table after FROM — all of its rows are kept. Swap the tables and you keep a different set of rows. The same applies to RIGHT JOIN and FULL JOIN below.
Solving the Missing Manager
Recall that the inner-join self join dropped the general manager. Switch it to a LEFT JOIN and that employee comes back, with no manager:
SELECT e1.first_name || ' ' || e1.last_name AS employee,
e2.first_name || ' ' || e2.last_name AS manager
FROM employee AS e1
LEFT JOIN employee AS e2
ON e1.reports_to = e2.employee_id;Here || concatenates strings, so e1.first_name || ' ' || e1.last_name builds a full name with a space between. Because it is a left join, every employee from e1 is kept; the general manager appears with a NULL manager. The result now has all eight employees.
RIGHT JOIN and FULL JOIN
LEFT JOIN keeps unmatched rows from the left table. Two relatives extend the idea:
RIGHT JOINkeeps every row from the right table (the one afterJOIN), filling the left table’s columns withNULLwhen there is no match. It is the mirror image ofLEFT JOIN.FULL JOINkeeps every unmatched row from both tables.
A RIGHT JOIN on hue and palette:
SELECT *
FROM hue AS h
RIGHT JOIN palette AS p
ON h.color = p.colour;color | no | colour | number
------+----+--------+-------
| | Purple | 2
Green | 2 | Green | 2
Green | 3 | Green | 2
Green | 2 | Green | 4
Green | 3 | Green | 4
Blue | 1 | Blue | 5
| | Orange | 4The unmatched palette rows (Purple and Orange) are kept, with empty hue columns.
A FULL JOIN keeps unmatched rows from both sides at once:
SELECT *
FROM hue AS h
FULL JOIN palette AS p
ON h.color = p.colour;color | no | colour | number
------+----+--------+-------
Red | 1 | |
Green | 2 | Green | 2
Green | 2 | Green | 4
Green | 3 | Green | 2
Green | 3 | Green | 4
Blue | 1 | Blue | 5
| | Orange | 4
| | Purple | 2Read it as the inner-join rows in the middle, plus the row a LEFT JOIN would add (Red), plus the rows a RIGHT JOIN would add (Orange, Purple).
Caution
RIGHT JOIN and FULL JOIN arrived in SQLite 3.39 (2022), so the version bundled with the Chinook download (3.51) runs both queries above. Older SQLite builds — and some embedded environments — support only LEFT JOIN. When that is all you have, you can reproduce a RIGHT JOIN by swapping the table order, and a FULL JOIN by combining a left and a right join with the set operators you will meet in the next lesson.
CROSS JOIN: Every Combination
A CROSS JOIN pairs every row of one table with every row of the other — the Cartesian product. It takes no ON clause, because there is no condition; SQL treats every pairing as a match.
SELECT *
FROM hue
CROSS JOIN palette;With 4 rows in hue and 5 in palette, the result has 4 × 5 = 20 rows. The first several:
color | no | colour | number
------+----+--------+-------
Red | 1 | Purple | 2
Red | 1 | Green | 2
Red | 1 | Green | 4
Red | 1 | Blue | 5
Red | 1 | Orange | 4
Green | 2 | Purple | 2
... | | ... |Cross Join Plus WHERE Equals Inner Join
Add a WHERE clause that matches the colors:
SELECT *
FROM hue AS h
CROSS JOIN palette AS p
WHERE h.color = p.colour;color | no | colour | number
------+----+--------+-------
Green | 2 | Green | 2
Green | 2 | Green | 4
Green | 3 | Green | 2
Green | 3 | Green | 4
Blue | 1 | Blue | 5That is identical to the inner join of hue and palette. This is no accident: an inner join is conceptually a cross join followed by a filter. Understanding this connection demystifies what a join really does — it considers all combinations and keeps the ones that satisfy the condition.
A practical use: imagine the music store wants to test a feature that pairs each customer with every other customer for shared listening. A cross join of customer with itself produces every pairing; a WHERE clause then removes the rows where a customer is paired with themselves:
SELECT c1.first_name, c1.last_name, c1.email,
c2.first_name AS first_name_2,
c2.last_name AS last_name_2,
c2.email AS email_2
FROM customer AS c1
CROSS JOIN customer AS c2
WHERE c1.customer_id <> c2.customer_id;Warning
Cross joins grow fast. Two tables of 1,000 rows each produce a million rows. Always make sure a cross join is really what you want before running it on large tables.
Joining on Conditions Other Than Equality
Every join so far matched rows with column = column. Joins of that form are equi-joins. But the ON clause accepts any condition WHERE accepts, including inequalities and compound expressions. These are non-equi joins.
Why You Cannot Just Filter Afterward
Suppose you left-join hue and palette and then want to exclude rows where number is 2. The obvious attempt:
SELECT *
FROM hue AS h
LEFT JOIN palette AS p
ON h.color = p.colour
WHERE p.number <> 2;gives:
color | no | colour | number
------+----+--------+-------
Green | 2 | Green | 4
Green | 3 | Green | 4
Blue | 1 | Blue | 5The Red row vanished — even though it was the whole point of using a left join. The reason: Red has a NULL for number, and NULL <> 2 is not true, so WHERE filters it out. The fix is to move the extra condition into the ON clause, so it is applied during matching rather than after:
SELECT *
FROM hue AS h
LEFT JOIN palette AS p
ON h.color = p.colour AND p.number <> 2;Now the join keeps the unmatched Red row as a left join should. The lesson: a condition in ON shapes how rows are matched, while a condition in WHERE filters the finished result — and with outer joins, that difference matters.
A Real Use: Counting Purchases per Track
This pattern shines when counting events that may be zero. To list every track and how many times it was purchased in 2021 — including tracks never purchased — use left joins and push the year filter into ON:
SELECT t.track_id, t.name,
COUNT(i.invoice_id) AS no_of_purchases
FROM track AS t
LEFT JOIN invoice_line AS il
ON t.track_id = il.track_id
LEFT JOIN invoice AS i
ON il.invoice_id = i.invoice_id AND i.invoice_date LIKE '2021%'
GROUP BY t.track_id, t.name;The LEFT JOINs keep every track even if it was never sold, and i.invoice_date LIKE '2021%' lives in ON so it does not throw away the unsold tracks. COUNT(i.invoice_id) counts only the non-NULL matches, so unsold tracks get a count of 0.
A Real Use: Running Totals with >=
Non-equi joins can compute a running total — a cumulative sum where each row adds the totals of all rows up to and including it. The trick is to join invoice to itself on >= so that each invoice is matched with itself and every earlier invoice:
SELECT i1.invoice_id, i1.invoice_date, i1.total,
ROUND(SUM(i2.total), 2) AS running_total
FROM invoice AS i1
JOIN invoice AS i2
ON i1.invoice_id >= i2.invoice_id
GROUP BY i1.invoice_id, i1.invoice_date, i1.total;For each invoice i1, the join collects every invoice i2 with an id at or below it. Grouping by i1 and summing i2.total adds up all those earlier totals. The first rows:
invoice_id | invoice_date | total | running_total
-----------+---------------------+-------+--------------
1 | 2021-01-01 00:00:00 | 1.98 | 1.98
2 | 2021-01-02 00:00:00 | 3.96 | 5.94
3 | 2021-01-03 00:00:00 | 5.94 | 11.88
4 | 2021-01-06 00:00:00 | 8.91 | 20.79
5 | 2021-01-11 00:00:00 | 13.86 | 34.65Each running_total is the sum of all invoice totals up to that point: 1.98, then 1.98 + 3.96 = 5.94, then + 5.94 = 11.88, and so on.
Note
This self-join trick is the classic way to compute running totals before window functions existed. Modern SQL offers SUM(...) OVER (ORDER BY ...), which is far more efficient — a topic for a later course. Seeing the self-join version first makes window functions click when you reach them.
Practice Exercises
Work these against the Chinook database.
Exercise 1: Names of Employees and Managers
Using a self join, list each employee’s full name and their manager’s full name. Alias the first as employee and the second as manager, building each name by concatenating first and last name with a space.
-- Your code hereHint
Join employee AS e1 to employee AS e2 on e1.reports_to = e2.employee_id. Use e1.first_name || ' ' || e1.last_name AS employee for each side.
Exercise 2: Include the Manager-less Employee
Rewrite Exercise 1 so the general manager — who reports to no one — is included, with NULL for the manager columns.
-- Your code hereHint
Change the JOIN to a LEFT JOIN so every e1 row is kept even when reports_to is NULL.
Exercise 3: Pair Every Customer with Every Other
Produce every pairing of two different customers, showing each one’s first name, last name, and email. Exclude rows where a customer is paired with themselves.
-- Your code hereHint
CROSS JOIN customer AS c1 with customer AS c2, then add WHERE c1.customer_id <> c2.customer_id.
Exercise 4: Running Total of Invoices
Display one row per invoice with its invoice_id, invoice_date, total, and a running_total rounded to two decimals.
-- Your code hereHint
Self-join invoice on i1.invoice_id >= i2.invoice_id, group by the i1 columns, and select ROUND(SUM(i2.total), 2) AS running_total.
Summary
You moved beyond the basic inner join. A self join joins a table to itself using two aliases, perfect for hierarchies like employees and managers. Outer joins keep unmatched rows: LEFT JOIN keeps the left table’s rows, RIGHT JOIN keeps the right table’s, and FULL JOIN keeps both. A CROSS JOIN produces every combination of rows, and an inner join is really a cross join plus a filter. Finally, join conditions need not be equalities — pushing extra conditions into ON (rather than WHERE) preserves unmatched rows and unlocks patterns like running totals.
Key Concepts
- Self join — joining a table to itself with two distinct aliases.
LEFT JOIN— keeps all rows from the left (FROM) table; unmatched right columns becomeNULL.RIGHT JOIN/FULL JOIN— keep unmatched rows from the right table, or from both tables.CROSS JOIN— every combination of rows from two tables, with noONclause.- Equi-join vs. non-equi join — matching on equality versus matching on any condition, such as
>=. ONvs.WHERE— a condition inONshapes matching and preserves unmatched rows in outer joins; a condition inWHEREfilters the finished result.
Why This Matters
Hierarchies, optional relationships, and “include the zeros” reports appear constantly in real data — an org chart, customers who have never ordered, products with no sales. Inner joins silently hide exactly the rows those questions are about. Knowing self joins, outer joins, and the ON-versus-WHERE distinction is what separates a query that looks right from one that is right.
Next Steps
You have combined tables side by side in every way joins allow. Next, you will combine query results a different way — stacking and comparing whole result sets with the set operators UNION, INTERSECT, and EXCEPT.
Continue to Lesson 4 - Combining Results with Set Operators
Stack and compare query results using UNION, INTERSECT, and EXCEPT
Back to Module Overview
Return to the Combining Tables with Joins module overview
Continue Building Your Skills
The joins in this lesson cover the cases that trip up most analysts. When a result has too few rows, ask yourself whether an inner join quietly dropped the unmatched ones — and whether a LEFT JOIN or a condition moved into ON is what you really need. That instinct will save you again and again.