Lesson 3 - Self Joins, Outer Joins, and Cross Joins

Welcome to the Less Common Joins

The inner join handles most of your work, but some real questions need more. What if the rows you want to match live in the same table? What if you must keep rows that have no match? What if you need every possible pairing of two tables? This lesson covers the joins that answer those questions.

By the end of this lesson, you will be able to:

  • Join a table to itself with a self join
  • Keep unmatched rows using LEFT JOIN, and understand RIGHT JOIN and FULL JOIN
  • Generate every combination of rows with CROSS JOIN
  • Join on conditions that are not simple equalities (non-equi joins)

You will keep working with the Chinook database, plus the small hue and palette example tables from the previous lesson. Let’s begin.

Data for this lesson

Database: chinook.db — a digital music store (SQLite).

Tables used: employee, customer, track, invoice, invoice_line

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


Self Joins

A self join joins a table to itself. It sounds odd until you meet a table that relates to its own rows.

The Chinook employee table has a column called reports_to, which holds the employee_id of each employee’s manager — and that manager is another row in the same employee table. To list each employee next to their manager, you join employee to employee:

SELECT e1.employee_id,
       e2.employee_id AS manager_id
  FROM employee AS e1
  JOIN employee AS e2
    ON e1.reports_to = e2.employee_id;

Two details make this work:

  • You give the table two different aliases, e1 and e2. Without distinct aliases, SQL cannot tell the “employee” copy from the “manager” copy. In e2.employee_id AS manager_id, using e1 by mistake would just repeat the first column.
  • You join on different columns: e1.reports_to matches e2.employee_id. That is the relationship the schema describes.

The result:

employee_id | manager_id
------------+-----------
2           | 1
3           | 2
4           | 2
5           | 2
6           | 1
7           | 6
8           | 6

The employee table has eight rows, but the result has only seven. The general manager (employee 1) reports to no one, so their reports_to value is missing (NULL). The condition e1.reports_to = e2.employee_id is never true for that row, so an inner join drops it. Hold on to that observation — it is exactly the problem the next join solves.


LEFT JOIN: Keeping Unmatched Rows

An inner join discards rows that have no match. Often you want to keep them. The LEFT JOIN does exactly that: it keeps every row from the left table (the one after FROM), and fills the right table’s columns with NULL when there is no match.

Return to the hue and palette tables:

hue                  palette
-----------          ---------------
color | no           colour | number
------+----          -------+-------
Red   | 1            Purple | 2
Green | 2            Green  | 2
Green | 3            Green  | 4
Blue  | 1            Blue   | 5
                     Orange | 4

A left join from hue to palette:

SELECT *
  FROM hue
  LEFT JOIN palette
    ON hue.color = palette.colour;

The process mirrors the inner join, with one addition. For each hue row, SQL checks every palette row:

  • If it matches, include the combined row (just like an inner join).
  • If nothing matches, still include the hue row, with NULL for every palette column.

The result:

color | no | colour | number
------+----+--------+-------
Red   | 1  |        |
Green | 2  | Green  | 2
Green | 2  | Green  | 4
Green | 3  | Green  | 2
Green | 3  | Green  | 4
Blue  | 1  | Blue   | 5

Red had no match in palette, so under an inner join it would vanish. The left join keeps it, leaving colour and number empty.

Table Order Now Matters

With inner joins, swapping the two tables changed nothing. With a LEFT JOIN, order is everything. “Left” means the table after FROMall of its rows are kept. Swap the tables and you keep a different set of rows. The same applies to RIGHT JOIN and FULL JOIN below.

Solving the Missing Manager

Recall that the inner-join self join dropped the general manager. Switch it to a LEFT JOIN and that employee comes back, with no manager:

SELECT e1.first_name || ' ' || e1.last_name AS employee,
       e2.first_name || ' ' || e2.last_name AS manager
  FROM employee AS e1
  LEFT JOIN employee AS e2
    ON e1.reports_to = e2.employee_id;

Here || concatenates strings, so e1.first_name || ' ' || e1.last_name builds a full name with a space between. Because it is a left join, every employee from e1 is kept; the general manager appears with a NULL manager. The result now has all eight employees.


RIGHT JOIN and FULL JOIN

LEFT JOIN keeps unmatched rows from the left table. Two relatives extend the idea:

  • RIGHT JOIN keeps every row from the right table (the one after JOIN), filling the left table’s columns with NULL when there is no match. It is the mirror image of LEFT JOIN.
  • FULL JOIN keeps every unmatched row from both tables.

A RIGHT JOIN on hue and palette:

SELECT *
  FROM hue AS h
 RIGHT JOIN palette AS p
    ON h.color = p.colour;
color | no | colour | number
------+----+--------+-------
      |    | Purple | 2
Green | 2  | Green  | 2
Green | 3  | Green  | 2
Green | 2  | Green  | 4
Green | 3  | Green  | 4
Blue  | 1  | Blue   | 5
      |    | Orange | 4

The unmatched palette rows (Purple and Orange) are kept, with empty hue columns.

A FULL JOIN keeps unmatched rows from both sides at once:

SELECT *
  FROM hue AS h
 FULL JOIN palette AS p
    ON h.color = p.colour;
color | no | colour | number
------+----+--------+-------
Red   | 1  |        |
Green | 2  | Green  | 2
Green | 2  | Green  | 4
Green | 3  | Green  | 2
Green | 3  | Green  | 4
Blue  | 1  | Blue   | 5
      |    | Orange | 4
      |    | Purple | 2

Read it as the inner-join rows in the middle, plus the row a LEFT JOIN would add (Red), plus the rows a RIGHT JOIN would add (Orange, Purple).

Caution

RIGHT JOIN and FULL JOIN arrived in SQLite 3.39 (2022), so the version bundled with the Chinook download (3.51) runs both queries above. Older SQLite builds — and some embedded environments — support only LEFT JOIN. When that is all you have, you can reproduce a RIGHT JOIN by swapping the table order, and a FULL JOIN by combining a left and a right join with the set operators you will meet in the next lesson.


CROSS JOIN: Every Combination

A CROSS JOIN pairs every row of one table with every row of the other — the Cartesian product. It takes no ON clause, because there is no condition; SQL treats every pairing as a match.

SELECT *
  FROM hue
 CROSS JOIN palette;

With 4 rows in hue and 5 in palette, the result has 4 × 5 = 20 rows. The first several:

color | no | colour | number
------+----+--------+-------
Red   | 1  | Purple | 2
Red   | 1  | Green  | 2
Red   | 1  | Green  | 4
Red   | 1  | Blue   | 5
Red   | 1  | Orange | 4
Green | 2  | Purple | 2
...   |    | ...    |

Cross Join Plus WHERE Equals Inner Join

Add a WHERE clause that matches the colors:

SELECT *
  FROM hue AS h
 CROSS JOIN palette AS p
 WHERE h.color = p.colour;
color | no | colour | number
------+----+--------+-------
Green | 2  | Green  | 2
Green | 2  | Green  | 4
Green | 3  | Green  | 2
Green | 3  | Green  | 4
Blue  | 1  | Blue   | 5

That is identical to the inner join of hue and palette. This is no accident: an inner join is conceptually a cross join followed by a filter. Understanding this connection demystifies what a join really does — it considers all combinations and keeps the ones that satisfy the condition.

A practical use: imagine the music store wants to test a feature that pairs each customer with every other customer for shared listening. A cross join of customer with itself produces every pairing; a WHERE clause then removes the rows where a customer is paired with themselves:

SELECT c1.first_name, c1.last_name, c1.email,
       c2.first_name AS first_name_2,
       c2.last_name AS last_name_2,
       c2.email AS email_2
  FROM customer AS c1
 CROSS JOIN customer AS c2
 WHERE c1.customer_id <> c2.customer_id;

Warning

Cross joins grow fast. Two tables of 1,000 rows each produce a million rows. Always make sure a cross join is really what you want before running it on large tables.


Joining on Conditions Other Than Equality

Every join so far matched rows with column = column. Joins of that form are equi-joins. But the ON clause accepts any condition WHERE accepts, including inequalities and compound expressions. These are non-equi joins.

Why You Cannot Just Filter Afterward

Suppose you left-join hue and palette and then want to exclude rows where number is 2. The obvious attempt:

SELECT *
  FROM hue AS h
  LEFT JOIN palette AS p
    ON h.color = p.colour
 WHERE p.number <> 2;

gives:

color | no | colour | number
------+----+--------+-------
Green | 2  | Green  | 4
Green | 3  | Green  | 4
Blue  | 1  | Blue   | 5

The Red row vanished — even though it was the whole point of using a left join. The reason: Red has a NULL for number, and NULL <> 2 is not true, so WHERE filters it out. The fix is to move the extra condition into the ON clause, so it is applied during matching rather than after:

SELECT *
  FROM hue AS h
  LEFT JOIN palette AS p
    ON h.color = p.colour AND p.number <> 2;

Now the join keeps the unmatched Red row as a left join should. The lesson: a condition in ON shapes how rows are matched, while a condition in WHERE filters the finished result — and with outer joins, that difference matters.

A Real Use: Counting Purchases per Track

This pattern shines when counting events that may be zero. To list every track and how many times it was purchased in 2021 — including tracks never purchased — use left joins and push the year filter into ON:

SELECT t.track_id, t.name,
       COUNT(i.invoice_id) AS no_of_purchases
  FROM track AS t
  LEFT JOIN invoice_line AS il
    ON t.track_id = il.track_id
  LEFT JOIN invoice AS i
    ON il.invoice_id = i.invoice_id AND i.invoice_date LIKE '2021%'
 GROUP BY t.track_id, t.name;

The LEFT JOINs keep every track even if it was never sold, and i.invoice_date LIKE '2021%' lives in ON so it does not throw away the unsold tracks. COUNT(i.invoice_id) counts only the non-NULL matches, so unsold tracks get a count of 0.

A Real Use: Running Totals with >=

Non-equi joins can compute a running total — a cumulative sum where each row adds the totals of all rows up to and including it. The trick is to join invoice to itself on >= so that each invoice is matched with itself and every earlier invoice:

SELECT i1.invoice_id, i1.invoice_date, i1.total,
       ROUND(SUM(i2.total), 2) AS running_total
  FROM invoice AS i1
  JOIN invoice AS i2
    ON i1.invoice_id >= i2.invoice_id
 GROUP BY i1.invoice_id, i1.invoice_date, i1.total;

For each invoice i1, the join collects every invoice i2 with an id at or below it. Grouping by i1 and summing i2.total adds up all those earlier totals. The first rows:

invoice_id | invoice_date        | total | running_total
-----------+---------------------+-------+--------------
1          | 2021-01-01 00:00:00 | 1.98  | 1.98
2          | 2021-01-02 00:00:00 | 3.96  | 5.94
3          | 2021-01-03 00:00:00 | 5.94  | 11.88
4          | 2021-01-06 00:00:00 | 8.91  | 20.79
5          | 2021-01-11 00:00:00 | 13.86 | 34.65

Each running_total is the sum of all invoice totals up to that point: 1.98, then 1.98 + 3.96 = 5.94, then + 5.94 = 11.88, and so on.

Note

This self-join trick is the classic way to compute running totals before window functions existed. Modern SQL offers SUM(...) OVER (ORDER BY ...), which is far more efficient — a topic for a later course. Seeing the self-join version first makes window functions click when you reach them.


Practice Exercises

Work these against the Chinook database.

Exercise 1: Names of Employees and Managers

Using a self join, list each employee’s full name and their manager’s full name. Alias the first as employee and the second as manager, building each name by concatenating first and last name with a space.

-- Your code here

Hint

Join employee AS e1 to employee AS e2 on e1.reports_to = e2.employee_id. Use e1.first_name || ' ' || e1.last_name AS employee for each side.

Exercise 2: Include the Manager-less Employee

Rewrite Exercise 1 so the general manager — who reports to no one — is included, with NULL for the manager columns.

-- Your code here

Hint

Change the JOIN to a LEFT JOIN so every e1 row is kept even when reports_to is NULL.

Exercise 3: Pair Every Customer with Every Other

Produce every pairing of two different customers, showing each one’s first name, last name, and email. Exclude rows where a customer is paired with themselves.

-- Your code here

Hint

CROSS JOIN customer AS c1 with customer AS c2, then add WHERE c1.customer_id <> c2.customer_id.

Exercise 4: Running Total of Invoices

Display one row per invoice with its invoice_id, invoice_date, total, and a running_total rounded to two decimals.

-- Your code here

Hint

Self-join invoice on i1.invoice_id >= i2.invoice_id, group by the i1 columns, and select ROUND(SUM(i2.total), 2) AS running_total.


Summary

You moved beyond the basic inner join. A self join joins a table to itself using two aliases, perfect for hierarchies like employees and managers. Outer joins keep unmatched rows: LEFT JOIN keeps the left table’s rows, RIGHT JOIN keeps the right table’s, and FULL JOIN keeps both. A CROSS JOIN produces every combination of rows, and an inner join is really a cross join plus a filter. Finally, join conditions need not be equalities — pushing extra conditions into ON (rather than WHERE) preserves unmatched rows and unlocks patterns like running totals.

Key Concepts

  • Self join — joining a table to itself with two distinct aliases.
  • LEFT JOIN — keeps all rows from the left (FROM) table; unmatched right columns become NULL.
  • RIGHT JOIN / FULL JOIN — keep unmatched rows from the right table, or from both tables.
  • CROSS JOIN — every combination of rows from two tables, with no ON clause.
  • Equi-join vs. non-equi join — matching on equality versus matching on any condition, such as >=.
  • ON vs. WHERE — a condition in ON shapes matching and preserves unmatched rows in outer joins; a condition in WHERE filters the finished result.

Why This Matters

Hierarchies, optional relationships, and “include the zeros” reports appear constantly in real data — an org chart, customers who have never ordered, products with no sales. Inner joins silently hide exactly the rows those questions are about. Knowing self joins, outer joins, and the ON-versus-WHERE distinction is what separates a query that looks right from one that is right.


Next Steps

You have combined tables side by side in every way joins allow. Next, you will combine query results a different way — stacking and comparing whole result sets with the set operators UNION, INTERSECT, and EXCEPT.

Continue to Lesson 4 - Combining Results with Set Operators

Stack and compare query results using UNION, INTERSECT, and EXCEPT

Back to Module Overview

Return to the Combining Tables with Joins module overview


Continue Building Your Skills

The joins in this lesson cover the cases that trip up most analysts. When a result has too few rows, ask yourself whether an inner join quietly dropped the unmatched ones — and whether a LEFT JOIN or a condition moved into ON is what you really need. That instinct will save you again and again.