Lesson 4 - Combining Results with Set Operators
On this page
Welcome to Set Operators
Joins combine tables side by side, matching rows column-by-column. Set operators combine query results a completely different way: they stack rows vertically, lining up columns. When the data you need is split across tables that share the same shape — say, sales for 2021 in one table and sales for 2022 in another — set operators are the tool that brings them back into a single result.
Imagine you are asked for total sales across 2021 and 2022, only to discover each year lives in its own table. Older data often sits in separate, cheaper-to-maintain tables precisely because it is queried less often. To total it, you first have to stack the tables. That is what set operators do.
By the end of this lesson, you will be able to:
- Stack query results with
UNIONandUNION ALL - Keep only rows common to two results with
INTERSECT - Subtract one result from another with
EXCEPT - Write valid compound select statements and respect their rules
You will work with versions of the Chinook invoice table split by year. Let’s begin.
Data for this lesson
Database: chinook.db — a digital music store (SQLite).
Tables used: track
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
Joins vs. Set Operators
It helps to picture the difference:
- A join aligns tables horizontally. It adds columns, matching rows on a condition. The result is wider.
- A set operator aligns results vertically. It adds rows, lining up columns by position. The result is taller.
The name comes from mathematics: the union, intersection, and difference of sets, the same ideas you may have drawn as Venn diagrams. SQL gives you one operator for each.
For these examples, the Chinook database has been split into year-based invoice tables:
invoice_2021q1_2022q2— invoices from the start of 2021 through the second quarter of 2022invoice_2022q1_2022q4— invoices from the year 2022invoice_2023q1_2023q4— invoices from the year 2023invoice_2024q1_2024q4— invoices from the year 2024
Splitting data this way is common in practice, so learning to recombine it is a real skill, not just an exercise.
UNION: Stacking Results
UNION stacks the rows of two results into one. Take two small tables:
table1 table2
------------ ------------
number | name num | name
-------+-------- ----+--------
1 | Zito 3 | Arthur
2 | Miguel 4 | Therese
3 | Arthur 5 | Francisca
4 | Therese 6 | BrunoCombine them with UNION:
SELECT *
FROM table1
UNION
SELECT *
FROM table2;The result contains every row from both — but each unique row only once:
number | name
-------+----------
1 | Zito
2 | Miguel
3 | Arthur
4 | Therese
5 | Francisca
6 | BrunoArthur and Therese appear in both tables, yet show up once. In Venn-diagram terms, UNION is everything in either circle.
So, to fetch every invoice from either 2021 or 2022, you stack the two relevant tables:
SELECT *
FROM invoice_2021q1_2022q2
UNION
SELECT *
FROM invoice_2022q1_2022q4;Compound Select Statements
That query is your first compound select statement: two or more SELECT statements joined by a set operator. It is more precise to say set operators combine the results of SELECT statements than to say they combine tables. The general shape:
SELECT statement
SET_OPERATOR
SELECT statement;Compound select statements come with strict rules:
- Same number of columns. Every
SELECTmust return the same count of columns. - Aligned data types. The first column of each
SELECTmust share a type, the second columns must share a type, and so on. Columns line up by position, not by name. ORDER BYandLIMITcome last. They may appear only after the finalSELECT, and they apply to the whole combined result.
The first SELECT also sets the column headers for the result. If table1’s first column is number and table2’s is num, the combined result uses number.
Two illegal examples make the rules concrete. This one breaks the type-alignment rule, lining up a name against a number:
SELECT name, number
FROM table1
UNION
SELECT num, name
FROM table2;This one breaks the ORDER BY rule by sorting inside the first SELECT:
SELECT *
FROM table1
ORDER BY name
UNION
SELECT *
FROM table1;Repeating table1 is perfectly fine; the ORDER BY in the first SELECT is what makes it invalid.
UNION ALL: Keeping Duplicates
UNION quietly removes duplicate rows, because mathematical sets do not track repetition. When you want to keep every row — including duplicates — use UNION ALL.
With the same table1 and table2, a plain UNION collapses the shared Arthur and Therese to one copy each:
number | name
-------+----------
1 | Zito
2 | Miguel
3 | Arthur
4 | Therese
5 | Francisca
6 | BrunoUNION ALL keeps both copies:
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2;number | name
-------+----------
1 | Zito
2 | Miguel
3 | Arthur
4 | Therese
3 | Arthur
4 | Therese
5 | Francisca
6 | BrunoTwo practical notes:
UNION ALLis usually faster thanUNION, because removing duplicates takes work. If you know there are no duplicates, or you want to keep them, preferUNION ALL.- When tables genuinely overlap — like two invoice tables that share the first half of 2022 —
UNION ALLwill produce those shared invoices twice. Choose the operator that matches what you actually want.
INTERSECT: Rows in Both
INTERSECT keeps only the rows that appear in both results — the overlap of the two sets. The syntax matches UNION:
SELECT *
FROM table1
INTERSECT
SELECT *
FROM table2;(The SELECT statements are indented so the operator forms a “river” down the middle — a common style choice for readability.)
Because invoice_2021q1_2022q2 runs through mid-2022 and invoice_2022q1_2022q4 covers all of 2022, their overlap is the first half of 2022. INTERSECT extracts exactly those shared invoices:
SELECT *
FROM invoice_2021q1_2022q2
INTERSECT
SELECT *
FROM invoice_2022q1_2022q4;Like UNION, INTERSECT returns distinct rows.
EXCEPT: Rows in One but Not the Other
EXCEPT returns the rows in the first result that are not in the second — the set difference, “the left set minus the right set.”
SELECT *
FROM table1
EXCEPT
SELECT *
FROM table2;Since invoice_2021q1_2022q2 covers 2021 through mid-2022, and invoice_2022q1_2022q4 covers all of 2022, subtracting the second from the first leaves only the 2021 invoices. Add ORDER BY after the last SELECT to sort newest-first:
SELECT *
FROM invoice_2021q1_2022q2
EXCEPT
SELECT *
FROM invoice_2022q1_2022q4
ORDER BY invoice_date DESC;The ORDER BY sits after the final SELECT, where the rules require it, and sorts the whole combined result.
Rewriting WHERE Conditions with Set Operators
Set operators connect back to logic you already use. In many cases an OR condition can be rewritten with UNION, and an AND condition with INTERSECT.
An OR query like this:
SELECT *
FROM some_table
WHERE condition1 OR condition2;can be rewritten as:
SELECT *
FROM some_table
WHERE condition1
UNION
SELECT *
FROM some_table
WHERE condition2;UNION removes duplicates, which matches OR: a row satisfying both conditions still appears only once. If the two conditions are mutually exclusive (no row satisfies both), UNION ALL produces the same result and runs faster.
Similarly, an AND query:
SELECT *
FROM some_table
WHERE condition1 AND condition2;is equivalent, in set terms, to:
SELECT *
FROM some_table
WHERE condition1
INTERSECT
SELECT *
FROM some_table
WHERE condition2;For example, finding tracks between 60,000 and 120,000 milliseconds long can be written either with AND or as two SELECTs joined by INTERSECT:
SELECT *
FROM track
WHERE milliseconds > 60000
INTERSECT
SELECT *
FROM track
WHERE milliseconds < 120000;Caution
These rewrites are not always perfectly equivalent. INTERSECT and UNION return distinct rows, while AND and OR preserve duplicates — so they match only when duplicates do not matter. The rewrite can sometimes be faster (a query planner may use a different index for each branch), but whether it actually helps depends on the database and the data. Treat it as a tool to know about, applied carefully.
Practice Exercises
Use the year-based invoice tables described earlier.
Exercise 1: Invoices from 2021 or 2022
Write a compound select statement returning every invoice from either the 2021–mid-2022 table or the 2022 table, without duplicates.
-- Your code hereHint
SELECT * FROM invoice_2021q1_2022q2 UNION SELECT * FROM invoice_2022q1_2022q4; — UNION removes any duplicate rows automatically.
Exercise 2: Keep the Duplicates
Return the same 2021-or-2022 invoices, but keep duplicate rows, and select only invoice_id, customer_id, invoice_date, and total.
-- Your code hereHint
Use UNION ALL, and list the same four columns in the same order in both SELECT statements so they align by position.
Exercise 3: The First Half of 2022
Write a compound select statement that returns only the invoices belonging to the first half of 2022 — the overlap of the two tables.
-- Your code hereHint
The overlap of invoice_2021q1_2022q2 and invoice_2022q1_2022q4 is the rows in both. Use INTERSECT.
Exercise 4: Only 2021, Newest First
Return the invoices that belong to 2021 only, ordered from most recent to oldest.
-- Your code hereHint
Subtract the 2022 table from the 2021–mid-2022 table with EXCEPT, then add ORDER BY invoice_date DESC after the final SELECT.
Summary
You learned to combine query results vertically. UNION stacks two results and removes duplicates; UNION ALL keeps them and runs faster. INTERSECT returns the rows common to both results, and EXCEPT returns the rows in the first result that are absent from the second. All of these build compound select statements, which must have matching column counts and aligned data types, with ORDER BY and LIMIT only after the final SELECT. You also saw how OR and AND conditions can sometimes be rewritten with UNION and INTERSECT.
Key Concepts
- Set operator — combines query results by stacking rows vertically and aligning columns by position.
UNION/UNION ALL— stack two results;UNIONremoves duplicates,UNION ALLkeeps them.INTERSECT— keep only rows present in both results.EXCEPT— keep rows in the first result that are not in the second.- Compound select statement — two or more
SELECTs joined by a set operator, with matching column counts and aligned types. - Placement rule —
ORDER BYandLIMITapply to the whole result and must follow the lastSELECT.
Why This Matters
Data is constantly split across tables of the same shape — by year, by region, by archive tier — and stacking it back together is everyday work for analysts and engineers. Set operators also give you a clean, set-based way to answer “what is in both,” “what is in only one,” and “what is in either,” questions that are awkward to phrase with joins alone. Knowing both joins and set operators means you can always pick the right tool for combining data, whichever direction it needs to go.
Next Steps
You can now combine tables in every direction — horizontally with joins and vertically with set operators. The next module turns to writing queries inside queries: subqueries that compute a value or a set on the fly, common table expressions that name and reuse those results, and views that save a query for later.
Continue to Subqueries, CTEs, and Views
Nest queries inside queries, name intermediate results with CTEs, and save queries as reusable views
Back to Module Overview
Return to the Combining Tables with Joins module overview
Continue Building Your Skills
You have completed the Combining Tables with Joins module. You can now reassemble data however a real database scatters it — matching rows across tables, keeping the unmatched ones, generating combinations, and stacking results that share a shape. These are the operations behind nearly every report and dashboard you will ever build. Keep practicing on real schemas, and the right way to combine any two pieces of data will become second nature.