Lesson 4 - Grouping by Multiple Columns and Filtering Groups
Welcome to Multi-Column Grouping and HAVING
You can already group by a single column and compute a summary for each group. In this final lesson of the module you push grouping further in two ways. First, you will group by several columns at once, creating finer breakdowns like “sales by country and state.” Second, you will meet the HAVING clause — the correct way to filter groups based on an aggregated value, something WHERE cannot do.
By the end of this lesson, you will be able to:
- Group rows by two or more columns to create combined groups
- Explain why non-aggregate, non-grouping columns should be avoided in aggregate queries
- Filter aggregated results with the
HAVINGclause - Distinguish
HAVINGfromWHEREand place each correctly - Combine
WHEREandHAVINGin the same query
You will keep working with the Chinook invoice table. Let’s begin.
Data for this lesson
Database: chinook.db — a digital music store (SQLite).
Tables used: invoice
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
Grouping by Several Columns
When you group by one column, you get one group per distinct value. When you group by several columns, you get one group per distinct combination of those columns’ values.
Consider this tiny illustration table:
| col1 | col2 |
|---|---|
| A | 1 |
| A | 1 |
| A | 1 |
| A | 2 |
| B | 2 |
| B | 2 |
| C | 1 |
col1 has distinct values A, B, C; col2 has 1 and 2. Grouping by both columns means each group is a unique pair: the possible pairs are A1, A2, B1, B2, C1, C2 — but only the combinations that actually occur appear in the result. Here B1 and C2 never appear in the data, so they are not in the output. You would get groups for A1, A2, B2, and C1.
Country and city in the invoice table
To count invoices and average the sale per country and city, list both columns after GROUP BY, separated by a comma:
SELECT billing_country, billing_city, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
GROUP BY billing_country, billing_city;billing_country | billing_city | num_row | avg_sale
--------------- | ------------ | ------- | --------
Argentina | Buenos Aires | 7 | 5.374286
Australia | Sidney | 7 | 5.374286
Austria | Vienne | 7 | 6.088571
Belgium | Brussels | 7 | 5.374286
... | ... | ... | ...Each row is now one country-city combination. You can group by billing_country and billing_state the same way — just swap the second column:
SELECT billing_country, billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
GROUP BY billing_country, billing_state;A Caution: Non-Aggregate Columns
There is a subtle trap worth knowing about. Look back at what you can safely put in SELECT:
Aggregate functions over the whole table return a single summary row:
SELECT COUNT(*) AS num_row, AVG(total) AS avg_sale FROM invoice;Aggregate functions with
GROUP BYreturn one row per group, and theSELECTlists the grouping columns plus the aggregates:SELECT billing_country, billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale FROM invoice GROUP BY billing_country, billing_state;
Both are standard and behave identically in every database. But what happens if you add a column that is neither aggregated nor in the GROUP BY?
SELECT billing_country, billing_state, billing_city, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
GROUP BY billing_country, billing_state
LIMIT 3;billing_country | billing_state | billing_city | num_row | avg_sale
--------------- | ------------- | ------------ | ------- | --------
Argentina | | Buenos Aires | 7 | 5.374
Australia | NSW | Sidney | 7 | 5.374
Austria | | Vienne | 7 | 6.089Each group may contain several cities, yet only one city appears. SQLite simply picked one value arbitrarily. (Countries with no state recorded show a blank billing_state.)
Avoid non-aggregate columns in grouped queries
Picking an arbitrary value for a non-grouped column is a flexibility specific to SQLite. Other database systems (PostgreSQL, MySQL in strict mode, SQL Server) will raise an error instead. To write queries that work everywhere, only SELECT columns that are either in the GROUP BY or wrapped in an aggregate function.
Refining Multi-Column Groups
All the clauses you learned in Lesson 3 still work when grouping by several columns. You can sort the result with ORDER BY and filter input rows with WHERE.
Order by num_row descending to find the biggest country-state groups:
SELECT billing_country, billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
GROUP BY billing_country, billing_state
ORDER BY num_row DESC
LIMIT 3;billing_country | billing_state | num_row | avg_sale
--------------- | ------------- | ------- | --------
France | | 35 | 5.574286
Germany | | 28 | 5.588571
Brazil | SP | 21 | 5.469524Add a WHERE condition to focus on U.S. states:
SELECT billing_country, billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
WHERE billing_country = 'USA'
GROUP BY billing_country, billing_state
ORDER BY num_row DESC
LIMIT 3;billing_country | billing_state | num_row | avg_sale
--------------- | ------------- | ------- | --------
USA | CA | 21 | 5.517143
USA | AZ | 7 | 5.374286
USA | FL | 7 | 5.660000This works because WHERE filters raw rows before grouping. But that timing creates a problem when your condition is about an aggregated value.
Why WHERE Cannot Filter Aggregates
Suppose you want only the groups with more than 20 invoices. It is tempting to write:
SELECT billing_country, billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
WHERE COUNT(*) > 20
GROUP BY billing_country, billing_state;Running it produces an error:
OperationalError: misuse of aggregate: COUNT()The reason is execution order. Recall:
FROM > WHERE > GROUP BY > SELECT > ORDER BY > LIMITWHERE runs before GROUP BY. At that point the rows have not been grouped yet, so COUNT(*) for each group does not exist. You cannot filter on a number that has not been computed. WHERE is for raw-row conditions only.
Filtering Groups with HAVING
SQL solves this with the HAVING clause. HAVING is like WHERE, except it runs after grouping, so it can filter on aggregated values. To keep only groups with more than 20 invoices, replace WHERE with HAVING and move it after GROUP BY:
SELECT billing_country, billing_state, COUNT(*) AS num_row, AVG(total) AS avg_sale
FROM invoice
GROUP BY billing_country, billing_state
HAVING COUNT(*) > 20;billing_country | billing_state | num_row | avg_sale
--------------- | ------------- | ------- | --------
Brazil | SP | 21 | 5.469524
France | | 35 | 5.574286
Germany | | 28 | 5.588571
USA | CA | 21 | 5.517143
United Kingdom | | 21 | 5.374286No error this time — by the time HAVING runs, each group’s COUNT(*) already exists.
Where HAVING fits
HAVING slots in right after GROUP BY. The full order of clauses is now:
SELECT > FROM > WHERE > GROUP BY > HAVING > ORDER BY > LIMITAnd the order of execution:
FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY > LIMITTwo things to remember:
WHEREfilters rows before grouping;HAVINGfilters groups after grouping.HAVINGis always used together withGROUP BYand is placed immediately after it.
Filtering on an aggregate you do not display
A nice property of HAVING is that the aggregate it filters on does not need to appear in SELECT — because SQL computes the statistics before deciding what to display. This query shows the same five groups as before but drops the num_row column, filtering on a count it never shows:
SELECT billing_country, billing_state, AVG(total) AS avg_sale
FROM invoice
GROUP BY billing_country, billing_state
HAVING COUNT(*) > 20;You can also filter on a different aggregate. To keep only the country-state groups whose average sale is below dollars, and report their min and max:
SELECT billing_country, billing_state, MIN(total) AS min_sale, MAX(total) AS max_sale
FROM invoice
GROUP BY billing_country, billing_state
HAVING AVG(total) < 6;billing_country | billing_state | min_sale | max_sale
--------------- | ------------- | -------- | --------
Argentina | | 0.99 | 13.86
Australia | NSW | 0.99 | 13.86
Austria | | 0.99 | 18.86
Belgium | | 0.99 | 13.86
... | ... | ... | ...Combining WHERE and HAVING
WHERE and HAVING are not rivals — they do different jobs and often appear together. WHERE filters raw rows before grouping; HAVING filters groups after.
In the result above, some billing_state values are blank — those are countries with no state recorded. Since billing_state is a plain (non-aggregated) column, you filter it with WHERE, while still filtering the group’s average with HAVING:
SELECT billing_country, billing_state, MIN(total) AS min_sale, MAX(total) AS max_sale
FROM invoice
WHERE billing_state <> ''
GROUP BY billing_country, billing_state
HAVING AVG(total) < 6;The <> operator means not equal to. Read in execution order, this query: drops rows with no state (WHERE), groups the rest by country and state (GROUP BY), keeps only groups whose average sale is under dollars (HAVING), and reports each group’s minimum and maximum sale.
WHERE vs. HAVING — a simple rule
If your condition is about a single row’s value (like billing_state <> ''), use WHERE. If your condition is about a group’s summary (like AVG(total) < 6 or COUNT(*) > 20), use HAVING.
Practice Exercises
Work these against the Chinook invoice table.
Exercise 1: Sales by country and city
Write a query that returns each billing_country, billing_city, the number of invoices as num_row, and the average sale as avg_sale, grouped by both country and city.
-- Your code hereHint
List both columns after GROUP BY, separated by a comma, and include both in your SELECT alongside COUNT(*) and AVG(total).
Exercise 2: Busy groups only
Write a query that groups invoices by billing_country and billing_state and returns only the groups with more than 30 invoices, showing the country, state, and the count as num_row.
-- Your code hereHint
Filter the count with HAVING COUNT(*) > 30, placed after GROUP BY. WHERE cannot filter an aggregate.
Exercise 3: Low-average groups, states only
Write a query that, for groups of country and state where the state is not blank, returns the country, state, and average sale as avg_sale, keeping only the groups whose average sale is greater than dollars.
-- Your code hereHint
Use WHERE billing_state <> '' for the row-level condition and HAVING AVG(total) > 6 for the group-level condition. WHERE comes before GROUP BY; HAVING comes after it.
Summary
You completed the module by grouping on several columns at once and by learning HAVING, the clause that filters groups on their aggregated values. You also saw why mixing non-aggregate, non-grouping columns into a grouped query is unsafe, and how WHERE and HAVING divide the work of filtering.
Key Concepts
- Multi-column GROUP BY — grouping by several columns creates one group per distinct combination of their values; only combinations that exist in the data appear.
- Non-aggregate columns — only
SELECTcolumns that are in theGROUP BYor wrapped in an aggregate; other columns behave unpredictably across databases. - HAVING — filters groups after aggregation, so it can test aggregated values like
COUNT(*) > 20. - WHERE vs. HAVING —
WHEREfilters raw rows before grouping;HAVINGfilters groups after grouping. - Final clause order —
SELECT > FROM > WHERE > GROUP BY > HAVING > ORDER BY > LIMIT. - Final execution order —
FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY > LIMIT.
Why This Matters
Multi-column grouping and HAVING are what separate a casual query writer from a confident analyst. “Revenue by region and product line, but only for categories that cleared a sales threshold” is a single statement once you know these tools. The WHERE-versus-HAVING distinction in particular trips up many beginners, and understanding it through execution order means you will reason about queries correctly rather than guessing.
Next Steps
You can now summarize data from a single table in almost any way a report could ask for. The natural next step is to bring multiple tables together — connecting invoices to customers, tracks to albums, and more — using joins.
Continue to Combining Tables with Joins
Connect related tables together so you can summarize and analyze across your whole database
Back to Module Overview
Return to the Summarizing Data with SQL module overview
Continue Building Your Skills
You have completed Summarizing Data with SQL. You can total a column, build full summary tables, group by one or many columns, and filter groups with HAVING. These are the everyday tools of data analysis, and you now own all of them. Carry that momentum into joins, where your summaries will start spanning the whole database.