Lesson 3 - Grouping Data with GROUP BY
Welcome to GROUP BY
So far, every aggregate query you have written treated the whole table — or one filtered slice of it — as a single group, producing one summary row. But a question like “how many invoices come from each country?” needs a separate answer per country. You could write one filtered query per country, but with 24 billing countries that is 24 queries. There is a far better way.
The GROUP BY clause splits your rows into groups and computes the aggregate once for each group, all in a single query. It is the single most important clause in this module.
By the end of this lesson, you will be able to:
- Explain what the
GROUP BYclause does and when to use it - Compute aggregate statistics for each group in one query
- Combine
GROUP BYwithWHEREto summarize filtered groups - Order grouped results with
ORDER BY, including by aggregated columns - Place
GROUP BYcorrectly in SQL’s order of clauses and order of execution
You will continue with the Chinook database, using the invoice and invoice_line tables. Let’s begin.
Data for this lesson
Database: chinook.db — a digital music store (SQLite).
Tables used: invoice, invoice_line
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
The Problem GROUP BY Solves
Here are the first five rows of the invoice table, with more of its columns shown:
| invoice_id | customer_id | billing_city | billing_state | billing_country | total |
|---|---|---|---|---|---|
| 1 | 2 | Stuttgart | Germany | 1.98 | |
| 2 | 4 | Oslo | Norway | 3.96 | |
| 3 | 8 | Brussels | Belgium | 5.94 | |
| 4 | 14 | Edmonton | AB | Canada | 8.91 |
| 5 | 23 | Boston | MA | USA | 13.86 |
To count the U.S. invoices, you already know how to use WHERE:
SELECT COUNT(*) AS num_row
FROM invoice
WHERE billing_country = 'USA';num_row
-------
91To also count Canada’s and France’s invoices, you would change the condition and run the query again — once per country. With 24 countries in the table, that approach simply does not scale. GROUP BY lets you get all 24 counts at once.
Counting Rows by Group
The GROUP BY clause tells SQL which column to split the rows on. Imagine a small illustrative example table holding five invoices — three billed to USA, one to Canada, and one to France. Grouping by billing_country and counting gives:
SELECT COUNT(*) AS num_row
FROM example
GROUP BY billing_country;num_row
-------
1
1
3There are three counts — one per country — but you cannot tell which is which. The fix is to add the grouping column to the SELECT list so each row is labeled:
SELECT billing_country, COUNT(*) AS num_row
FROM example
GROUP BY billing_country;billing_country | num_row
--------------- | -------
Canada | 1
France | 1
USA | 3Now run the same idea on the full invoice table to count invoices for every country at once:
SELECT billing_country, COUNT(*) AS num_row
FROM invoice
GROUP BY billing_country;One query, one row per country. That is the whole point of GROUP BY.
Always SELECT the grouping column
If you GROUP BY a column but forget to put it in the SELECT list, you will get the right counts but no labels. As a rule, the column you group by belongs in your SELECT so each result row identifies its group.
How Grouping Works
Without GROUP BY, an aggregate treats the entire table as one big group. With GROUP BY, SQL works in three steps:
- Split the rows into groups based on the distinct values of the grouping column.
- Apply the aggregate function to each group separately.
- Combine the per-group results into one output table — one row per group.
Summary statistics per group
Any aggregate works with GROUP BY, not just COUNT. Consider the invoice_line table, which lists every individual track sold. Here are its first rows:
| invoice_line_id | invoice_id | track_id | unit_price | quantity |
|---|---|---|---|---|
| 1 | 1 | 2 | 0.99 | 1 |
| 2 | 1 | 4 | 0.99 | 1 |
| 3 | 2 | 6 | 0.99 | 1 |
| 4 | 2 | 8 | 0.99 | 1 |
| 5 | 2 | 10 | 0.99 | 1 |
Each row is one line on a sale. If you add up unit_price * quantity for every line of a given invoice, you should recover that invoice’s total from the invoice table. Group by invoice_id and sum the line totals:
SELECT invoice_id, SUM(unit_price * quantity) AS total
FROM invoice_line
GROUP BY invoice_id
LIMIT 5;invoice_id | total
---------- | -----
1 | 1.98
2 | 3.96
3 | 5.94
4 | 8.91
5 | 13.86Compare these to the total column of the invoice table — they match exactly. Grouping the line items reproduces each invoice’s stored total. Notice you can aggregate a computed column (unit_price * quantity) per group, just as you did over a whole table in Lesson 2.
Where GROUP BY Fits
Adding a new clause means knowing where it goes. SQL is strict about clause order, and getting it right prevents errors.
Order of clauses (how you write them)
Before this lesson you knew:
SELECT > FROM > WHERE > ORDER BY > LIMITGROUP BY slots in after WHERE and before ORDER BY:
SELECT > FROM > WHERE > GROUP BY > ORDER BY > LIMITYou must write the clauses in this order or SQL will report a syntax error.
Order of execution (how SQL runs them)
The order in which SQL executes clauses differs from the order you write them. In particular, although SELECT is written first, SQL runs it after grouping:
FROM > WHERE > GROUP BY > SELECT > ORDER BY > LIMITTwo facts follow from this and are worth memorizing:
- SQL filters rows with
WHEREbefore grouping them. - SQL groups rows before ordering the result and limiting how many rows appear.
This is why WHERE conditions apply to raw rows, while ORDER BY can sort by an aggregated column — by the time ordering runs, the aggregates already exist.
Grouping with a Condition
Because WHERE runs before GROUP BY, you can filter rows first and then group only what survives. Recall the per-country counts:
SELECT billing_country, COUNT(*) AS num_row
FROM invoice
GROUP BY billing_country;Now restrict to invoices of dollars or less by adding WHERE before GROUP BY:
SELECT billing_country, COUNT(*) AS num_row
FROM invoice
WHERE total <= 10
GROUP BY billing_country;For the five-row example, the USA’s count drops from 3 to 1, because two of its invoices exceed dollars and were filtered out before grouping.
A practical example: U.S. sales by state
Let’s compute, for U.S. invoices only, the number of invoices and average sale per state:
SELECT billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
WHERE billing_country = 'USA'
GROUP BY billing_state;billing_state | num_row | avg_sale
------------- | ------- | --------
AZ | 7 | 5.374286
CA | 21 | 5.517143
FL | 7 | 5.660000
IL | 7 | 6.231429
MA | 7 | 5.374286
NV | 7 | 5.374286
NY | 7 | 5.374286
TX | 7 | 6.802857
UT | 7 | 6.231429
WA | 7 | 5.660000
WI | 7 | 6.088571The WHERE clause kept only U.S. rows; GROUP BY then produced one summary line per state.
Ordering Grouped Results
A grouped result is still a table, so you can sort it with ORDER BY. The interesting part is that you can sort by an aggregated column — because ordering runs after grouping.
To find the states with the most customers, sort the previous query by num_row descending:
SELECT billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
WHERE billing_country = 'USA'
GROUP BY billing_state
ORDER BY num_row DESC;billing_state | num_row | avg_sale
------------- | ------- | --------
CA | 21 | 5.517143
WI | 7 | 6.088571
WA | 7 | 5.660000
UT | 7 | 6.231429
TX | 7 | 6.802857
NY | 7 | 5.374286
... | ... | ...California leads clearly with 21 invoices. But the remaining states all tie at 7, so the order among them is arbitrary. You can break the tie with a second sort key — here, average sale descending — and keep just the top three with LIMIT:
SELECT billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
WHERE billing_country = 'USA'
GROUP BY billing_state
ORDER BY num_row DESC, avg_sale DESC
LIMIT 3;billing_state | num_row | avg_sale
------------- | ------- | --------
CA | 21 | 5.517143
TX | 7 | 6.802857
IL | 7 | 6.231429ORDER BY num_row DESC, avg_sale DESC sorts first by count, then by average within ties. This works precisely because SQL executes ORDER BY after GROUP BY, so the aggregated columns already exist when sorting happens.
A Complete Grouped Query
You can now combine WHERE, GROUP BY, several aggregates, ORDER BY, and LIMIT in one query. Suppose you want the top three sales cities across France and Canada, with a full set of statistics for each:
SELECT billing_city,
COUNT(*) AS num_row,
SUM(total) AS overall_sale,
MIN(total) AS min_sale,
AVG(total) AS avg_sale,
MAX(total) AS max_sale
FROM invoice
WHERE billing_country = 'Canada'
OR billing_country = 'France'
GROUP BY billing_city
ORDER BY overall_sale DESC, num_row DESC
LIMIT 3;billing_city | num_row | overall_sale | min_sale | avg_sale | max_sale
------------ | ------- | ------------ | -------- | -------- | --------
Paris | 14 | 77.24 | 0.99 | 5.517143 | 13.86
Dijon | 7 | 40.62 | 0.99 | 5.802857 | 16.86
Bordeaux | 7 | 39.62 | 0.99 | 5.660000 | 13.86Read top to bottom, this query: filters to two countries, groups by city, computes five statistics per city, sorts by total sales (then by count to break ties), and keeps the top three. Paris leads with the most sales and the highest overall purchase amount; in Dijon, the average customer spends the most of the three.
Practice Exercises
Work these against the Chinook database.
Exercise 1: Invoices per country
Write a query on the invoice table that returns each billing_country and the number of invoices for it as num_invoices, sorted from most invoices to fewest.
-- Your code hereHint
GROUP BY billing_country, select billing_country and COUNT(*) AS num_invoices, then add ORDER BY num_invoices DESC.
Exercise 2: Best-selling tracks
Using the invoice_line table, write a query that returns each track_id, the number of line items for it as num_row, and the total revenue as overall_sale (computed as SUM(unit_price * quantity)). Order by overall_sale descending, then num_row descending, and show only the top five.
-- Your code hereHint
Group by track_id. Use SUM(unit_price * quantity) AS overall_sale, then ORDER BY overall_sale DESC, num_row DESC LIMIT 5.
Exercise 3: Average sale per U.S. state
Write a query that, for U.S. invoices only, returns each billing_state and its average sale as avg_sale, sorted from highest average to lowest.
-- Your code hereHint
Filter with WHERE billing_country = 'USA' before the GROUP BY billing_state, then ORDER BY avg_sale DESC.
Summary
You learned the GROUP BY clause — the tool that turns a single summary into one summary per group. You saw how grouping works in three steps, how to label groups by selecting the grouping column, and how WHERE and ORDER BY cooperate with grouping thanks to SQL’s order of execution.
Key Concepts
- GROUP BY — splits rows into groups by the distinct values of a column, then applies the aggregate to each group.
- Three-step grouping — SQL splits rows into groups, applies the aggregate per group, then combines the results.
- Labeling groups — include the grouping column in
SELECTso each result row identifies its group. - Order of clauses —
SELECT > FROM > WHERE > GROUP BY > ORDER BY > LIMIT. - Order of execution —
FROM > WHERE > GROUP BY > SELECT > ORDER BY > LIMIT;WHEREfilters before grouping,ORDER BYsorts after grouping (so you can sort by aggregates).
Why This Matters
GROUP BY is where SQL becomes an analytics language. “Revenue by month,” “users by plan,” “errors by service” — every one of those phrases maps directly to a GROUP BY query. Once you can group, filter, sort, and limit in a single statement, you can answer the comparative questions that drive almost every real decision.
Next Steps
You can now group by one column. Next you will group by several columns at once and learn the HAVING clause — the right way to filter on an aggregated value.
Continue to Lesson 4 - Grouping by Multiple Columns and Filtering Groups
Group by several criteria at once and filter aggregated results with HAVING
Back to Module Overview
Return to the Summarizing Data with SQL module overview
Continue Building Your Skills
Grouping is a turning point. You have gone from describing a whole table to comparing every slice of it side by side, all in one query. Keep practicing on the Chinook tables — try grouping invoices by year, customer, or country and see what stories the numbers tell.