Lesson 4 - Combining Results with Set Operators

Welcome to Set Operators

Joins combine tables side by side, matching rows column-by-column. Set operators combine query results a completely different way: they stack rows vertically, lining up columns. When the data you need is split across tables that share the same shape — say, sales for 2021 in one table and sales for 2022 in another — set operators are the tool that brings them back into a single result.

Imagine you are asked for total sales across 2021 and 2022, only to discover each year lives in its own table. Older data often sits in separate, cheaper-to-maintain tables precisely because it is queried less often. To total it, you first have to stack the tables. That is what set operators do.

By the end of this lesson, you will be able to:

  • Stack query results with UNION and UNION ALL
  • Keep only rows common to two results with INTERSECT
  • Subtract one result from another with EXCEPT
  • Write valid compound select statements and respect their rules

You will work with versions of the Chinook invoice table split by year. Let’s begin.

Data for this lesson

Database: chinook.db — a digital music store (SQLite).

Tables used: track

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


Joins vs. Set Operators

It helps to picture the difference:

  • A join aligns tables horizontally. It adds columns, matching rows on a condition. The result is wider.
  • A set operator aligns results vertically. It adds rows, lining up columns by position. The result is taller.

The name comes from mathematics: the union, intersection, and difference of sets, the same ideas you may have drawn as Venn diagrams. SQL gives you one operator for each.

For these examples, the Chinook database has been split into year-based invoice tables:

  • invoice_2021q1_2022q2 — invoices from the start of 2021 through the second quarter of 2022
  • invoice_2022q1_2022q4 — invoices from the year 2022
  • invoice_2023q1_2023q4 — invoices from the year 2023
  • invoice_2024q1_2024q4 — invoices from the year 2024

Splitting data this way is common in practice, so learning to recombine it is a real skill, not just an exercise.


UNION: Stacking Results

UNION stacks the rows of two results into one. Take two small tables:

table1                 table2
------------           ------------
number | name          num | name
-------+--------       ----+--------
1      | Zito          3   | Arthur
2      | Miguel        4   | Therese
3      | Arthur        5   | Francisca
4      | Therese       6   | Bruno

Combine them with UNION:

SELECT *
  FROM table1

 UNION

SELECT *
  FROM table2;

The result contains every row from both — but each unique row only once:

number | name
-------+----------
1      | Zito
2      | Miguel
3      | Arthur
4      | Therese
5      | Francisca
6      | Bruno

Arthur and Therese appear in both tables, yet show up once. In Venn-diagram terms, UNION is everything in either circle.

So, to fetch every invoice from either 2021 or 2022, you stack the two relevant tables:

SELECT *
  FROM invoice_2021q1_2022q2

 UNION

SELECT *
  FROM invoice_2022q1_2022q4;

Compound Select Statements

That query is your first compound select statement: two or more SELECT statements joined by a set operator. It is more precise to say set operators combine the results of SELECT statements than to say they combine tables. The general shape:

SELECT statement

SET_OPERATOR

SELECT statement;

Compound select statements come with strict rules:

  • Same number of columns. Every SELECT must return the same count of columns.
  • Aligned data types. The first column of each SELECT must share a type, the second columns must share a type, and so on. Columns line up by position, not by name.
  • ORDER BY and LIMIT come last. They may appear only after the final SELECT, and they apply to the whole combined result.

The first SELECT also sets the column headers for the result. If table1’s first column is number and table2’s is num, the combined result uses number.

Two illegal examples make the rules concrete. This one breaks the type-alignment rule, lining up a name against a number:

SELECT name, number
  FROM table1

 UNION

SELECT num, name
  FROM table2;

This one breaks the ORDER BY rule by sorting inside the first SELECT:

SELECT *
  FROM table1
 ORDER BY name

 UNION

SELECT *
  FROM table1;

Repeating table1 is perfectly fine; the ORDER BY in the first SELECT is what makes it invalid.


UNION ALL: Keeping Duplicates

UNION quietly removes duplicate rows, because mathematical sets do not track repetition. When you want to keep every row — including duplicates — use UNION ALL.

With the same table1 and table2, a plain UNION collapses the shared Arthur and Therese to one copy each:

number | name
-------+----------
1      | Zito
2      | Miguel
3      | Arthur
4      | Therese
5      | Francisca
6      | Bruno

UNION ALL keeps both copies:

SELECT *
  FROM table1

 UNION ALL

SELECT *
  FROM table2;
number | name
-------+----------
1      | Zito
2      | Miguel
3      | Arthur
4      | Therese
3      | Arthur
4      | Therese
5      | Francisca
6      | Bruno

Two practical notes:

  • UNION ALL is usually faster than UNION, because removing duplicates takes work. If you know there are no duplicates, or you want to keep them, prefer UNION ALL.
  • When tables genuinely overlap — like two invoice tables that share the first half of 2022 — UNION ALL will produce those shared invoices twice. Choose the operator that matches what you actually want.

INTERSECT: Rows in Both

INTERSECT keeps only the rows that appear in both results — the overlap of the two sets. The syntax matches UNION:

   SELECT *
     FROM table1

INTERSECT

   SELECT *
     FROM table2;

(The SELECT statements are indented so the operator forms a “river” down the middle — a common style choice for readability.)

Because invoice_2021q1_2022q2 runs through mid-2022 and invoice_2022q1_2022q4 covers all of 2022, their overlap is the first half of 2022. INTERSECT extracts exactly those shared invoices:

   SELECT *
     FROM invoice_2021q1_2022q2

INTERSECT

   SELECT *
     FROM invoice_2022q1_2022q4;

Like UNION, INTERSECT returns distinct rows.


EXCEPT: Rows in One but Not the Other

EXCEPT returns the rows in the first result that are not in the second — the set difference, “the left set minus the right set.”

SELECT *
  FROM table1

EXCEPT

SELECT *
  FROM table2;

Since invoice_2021q1_2022q2 covers 2021 through mid-2022, and invoice_2022q1_2022q4 covers all of 2022, subtracting the second from the first leaves only the 2021 invoices. Add ORDER BY after the last SELECT to sort newest-first:

SELECT *
  FROM invoice_2021q1_2022q2

EXCEPT

SELECT *
  FROM invoice_2022q1_2022q4
 ORDER BY invoice_date DESC;

The ORDER BY sits after the final SELECT, where the rules require it, and sorts the whole combined result.


Rewriting WHERE Conditions with Set Operators

Set operators connect back to logic you already use. In many cases an OR condition can be rewritten with UNION, and an AND condition with INTERSECT.

An OR query like this:

SELECT *
  FROM some_table
 WHERE condition1 OR condition2;

can be rewritten as:

SELECT *
  FROM some_table
 WHERE condition1

UNION

SELECT *
  FROM some_table
 WHERE condition2;

UNION removes duplicates, which matches OR: a row satisfying both conditions still appears only once. If the two conditions are mutually exclusive (no row satisfies both), UNION ALL produces the same result and runs faster.

Similarly, an AND query:

SELECT *
  FROM some_table
 WHERE condition1 AND condition2;

is equivalent, in set terms, to:

SELECT *
  FROM some_table
 WHERE condition1

INTERSECT

SELECT *
  FROM some_table
 WHERE condition2;

For example, finding tracks between 60,000 and 120,000 milliseconds long can be written either with AND or as two SELECTs joined by INTERSECT:

   SELECT *
     FROM track
    WHERE milliseconds > 60000

INTERSECT

   SELECT *
     FROM track
    WHERE milliseconds < 120000;

Caution

These rewrites are not always perfectly equivalent. INTERSECT and UNION return distinct rows, while AND and OR preserve duplicates — so they match only when duplicates do not matter. The rewrite can sometimes be faster (a query planner may use a different index for each branch), but whether it actually helps depends on the database and the data. Treat it as a tool to know about, applied carefully.


Practice Exercises

Use the year-based invoice tables described earlier.

Exercise 1: Invoices from 2021 or 2022

Write a compound select statement returning every invoice from either the 2021–mid-2022 table or the 2022 table, without duplicates.

-- Your code here

Hint

SELECT * FROM invoice_2021q1_2022q2 UNION SELECT * FROM invoice_2022q1_2022q4;UNION removes any duplicate rows automatically.

Exercise 2: Keep the Duplicates

Return the same 2021-or-2022 invoices, but keep duplicate rows, and select only invoice_id, customer_id, invoice_date, and total.

-- Your code here

Hint

Use UNION ALL, and list the same four columns in the same order in both SELECT statements so they align by position.

Exercise 3: The First Half of 2022

Write a compound select statement that returns only the invoices belonging to the first half of 2022 — the overlap of the two tables.

-- Your code here

Hint

The overlap of invoice_2021q1_2022q2 and invoice_2022q1_2022q4 is the rows in both. Use INTERSECT.

Exercise 4: Only 2021, Newest First

Return the invoices that belong to 2021 only, ordered from most recent to oldest.

-- Your code here

Hint

Subtract the 2022 table from the 2021–mid-2022 table with EXCEPT, then add ORDER BY invoice_date DESC after the final SELECT.


Summary

You learned to combine query results vertically. UNION stacks two results and removes duplicates; UNION ALL keeps them and runs faster. INTERSECT returns the rows common to both results, and EXCEPT returns the rows in the first result that are absent from the second. All of these build compound select statements, which must have matching column counts and aligned data types, with ORDER BY and LIMIT only after the final SELECT. You also saw how OR and AND conditions can sometimes be rewritten with UNION and INTERSECT.

Key Concepts

  • Set operator — combines query results by stacking rows vertically and aligning columns by position.
  • UNION / UNION ALL — stack two results; UNION removes duplicates, UNION ALL keeps them.
  • INTERSECT — keep only rows present in both results.
  • EXCEPT — keep rows in the first result that are not in the second.
  • Compound select statement — two or more SELECTs joined by a set operator, with matching column counts and aligned types.
  • Placement ruleORDER BY and LIMIT apply to the whole result and must follow the last SELECT.

Why This Matters

Data is constantly split across tables of the same shape — by year, by region, by archive tier — and stacking it back together is everyday work for analysts and engineers. Set operators also give you a clean, set-based way to answer “what is in both,” “what is in only one,” and “what is in either,” questions that are awkward to phrase with joins alone. Knowing both joins and set operators means you can always pick the right tool for combining data, whichever direction it needs to go.


Next Steps

You can now combine tables in every direction — horizontally with joins and vertically with set operators. The next module turns to writing queries inside queries: subqueries that compute a value or a set on the fly, common table expressions that name and reuse those results, and views that save a query for later.

Continue to Subqueries, CTEs, and Views

Nest queries inside queries, name intermediate results with CTEs, and save queries as reusable views

Back to Module Overview

Return to the Combining Tables with Joins module overview


Continue Building Your Skills

You have completed the Combining Tables with Joins module. You can now reassemble data however a real database scatters it — matching rows across tables, keeping the unmatched ones, generating combinations, and stacking results that share a shape. These are the operations behind nearly every report and dashboard you will ever build. Keep practicing on real schemas, and the right way to combine any two pieces of data will become second nature.