Lesson 2 - Building Summary Statistics
Welcome to Building Summary Statistics
In the last lesson you learned the five core aggregate functions and used them one at a time. That is a great start, but real analysis rarely stops at a single number. You usually want several statistics together — a total and an average and a count — and you often want them computed only over the rows that matter, or expressed in friendlier units.
In this lesson you will combine aggregates with the tools you already know: the WHERE clause, scalar functions like ROUND, arithmetic operators, and computed columns. Put together, these let you build the kind of compact summary tables that real reports are made of.
By the end of this lesson, you will be able to:
- Compute several summary statistics in a single query
- Apply aggregate functions to computed (calculated) columns
- Combine aggregate functions with scalar functions like
ROUND - Combine aggregate functions with arithmetic operators
- Summarize only the rows that satisfy a
WHEREcondition
You will keep working with the Chinook digital music store database, focusing on the track and invoice tables. Let’s begin.
Data for this lesson
Database: chinook.db — a digital music store (SQLite).
Tables used: track, invoice
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
Several Statistics in One Query
You already know that you can SELECT several columns at once by separating them with commas. The same is true for aggregate functions — you can compute several summaries side by side in one query.
The track table stores each song’s runtime in a milliseconds column. Suppose you want the total runtime, the average runtime, and the number of tracks, all at once:
SELECT SUM(milliseconds) AS total_runtime,
AVG(milliseconds) AS avg_runtime,
COUNT(*) AS num_row
FROM track;total_runtime | avg_runtime | num_row
------------- | -------------- | -------
1378778040 | 393599.212104 | 3503That one query tells a small story: it would take 1378778040 milliseconds — more than 15 days of nonstop listening — to play every track in the store, because the average track is about 393599 milliseconds (roughly 6.5 minutes) long, and there are 3503 tracks.
The pattern generalizes to any mix of functions. For example, the lowest and highest track runtimes together:
SELECT MIN(milliseconds) AS min_runtime,
MAX(milliseconds) AS max_runtime
FROM track;This returns a single row with two columns — the shortest and longest track runtimes in the store.
Aggregating Computed Columns
You can do arithmetic on columns inside a query. You can multiply a column by a number (column * 3.14) or combine two columns (column_1 + column_2). These are computed columns — columns that do not exist in the table but are calculated as the query runs.
Convert first, then look
Milliseconds are awkward to read. Convert the milliseconds column to seconds by dividing by 1000.0:
SELECT milliseconds / 1000.0 AS seconds
FROM track
LIMIT 3;seconds
-------
343.719
342.562
230.619Why 1000.0 and not 1000
Dividing by 1000.0 (with a decimal) forces SQL to do floating-point division, keeping the fractional part. Dividing by 1000 (a whole number) can trigger integer division and throw away the decimals. The .0 is your safeguard.
Aggregating the expression
Here is the powerful part: you can put a computed column inside an aggregate function. SQL first evaluates the expression for every row, then applies the aggregate to the results.
So instead of converting to seconds and then averaging in two steps, you do it in one:
SELECT AVG(milliseconds / 1000.0) AS avg_runtime_seconds
FROM track;avg_runtime_seconds
-------------------
393.599212You can chain more arithmetic to get other units. Average runtime in minutes, and average file size in megabytes (the track table stores file size in a bytes column):
SELECT AVG(milliseconds / 1000.0 / 60) AS avg_runtime_minutes,
AVG(bytes / 1024.0 / 1024) AS avg_size_megabyte
FROM track;Dividing by 1000.0 then 60 turns milliseconds into minutes; dividing by 1024.0 twice turns bytes into megabytes. The aggregate runs on the converted values.
Combining Aggregate and Scalar Functions
Remember the average runtime in minutes:
SELECT AVG(milliseconds / 1000.0 / 60) AS avg_runtime_minutes
FROM track;avg_runtime_minutes
-------------------
6.559987A value like 6.559987 has more decimal places than anyone wants to read. This is where a scalar function helps: you can wrap an aggregate inside ROUND to clean up the result. ROUND(value, n) keeps n digits after the decimal point.
To round the average to one decimal place:
SELECT ROUND(AVG(milliseconds / 1000.0 / 60), 1) AS avg_runtime_minutes_rounded
FROM track;avg_runtime_minutes_rounded
---------------------------
6.6You can show the raw and rounded values together to compare them. Here the second column is rounded to two decimal places:
SELECT AVG(milliseconds / 1000.0 / 60) AS avg_runtime_minutes,
ROUND(AVG(milliseconds / 1000.0 / 60), 2) AS avg_runtime_minutes_rounded
FROM track;The general idea: a scalar function like ROUND wraps around an aggregate function to refine its output. The aggregate computes the value; the scalar function formats it.
Combining Aggregates with Arithmetic Operators
An average is, by definition, a sum divided by a count. That means AVG is really a shortcut. You could compute the same thing with SUM and COUNT and a division operator.
For the three-row example table from Lesson 1, the average of the total column is (15.84 + 9.90 + 1.98) / 3 = 9.24. In SQL, that is SUM(total) / COUNT(total):
SELECT SUM(total) / COUNT(total) AS avg_total
FROM example;You can verify this equivalence on the track table by computing the average runtime two ways in one query:
SELECT AVG(milliseconds) AS avg_runtime,
SUM(milliseconds) * 1.0 / COUNT(milliseconds) AS another_avg_runtime
FROM track;Both columns return the same value. The * 1.0 forces floating-point division so the result keeps its decimals, just like the .0 trick earlier. The takeaway: aggregate functions can be combined with arithmetic operators, which gives you the freedom to build any metric you need, even when no single built-in function does exactly what you want.
Summary Statistics Under a Condition
Everything so far summarized the whole table. But often you want statistics for only part of it — and that is exactly what the WHERE clause is for. SQL applies WHERE to filter rows before the aggregate runs.
First, notice that tracks come in only two prices. The DISTINCT keyword lists the unique values in a column:
SELECT DISTINCT(unit_price) AS prices
FROM track;prices
------
0.99
1.99Now suppose you want to know how many minutes of music you would get from only the premium 1.99 tracks. Add a WHERE clause so the SUM runs over just those rows:
SELECT SUM(milliseconds / 1000.0 / 60) AS total_runtime_minutes
FROM track
WHERE unit_price = 1.99;total_runtime_minutes
---------------------
8351.582617About 8,352 minutes — the aggregate counted only the rows that passed the filter.
Several statistics under several conditions
You can combine many aggregates with several conditions at once. Suppose you want a summary of all U.S. invoices over ten dollars: how many there are, the smallest, the largest, and the rounded average. The WHERE clause filters on two conditions with AND:
SELECT COUNT(*) AS num_row,
MIN(total) AS min_total,
MAX(total) AS max_total,
ROUND(AVG(total), 2) AS avg_total_rounded
FROM invoice
WHERE total > 10
AND billing_country = 'USA';num_row | min_total | max_total | avg_total_rounded
------- | --------- | --------- | -----------------
15 | 10.91 | 23.86 | 14.67In one query you produced a complete summary: among U.S. invoices over dollars, there are 15, ranging from 10.91 to 23.86, averaging 14.67. This is exactly the kind of compact result a business report needs — and it combines aggregates, a scalar function (ROUND), and a multi-condition WHERE, all the pieces from this lesson working together.
Practice Exercises
Work these against the Chinook database.
Exercise 1: A track runtime summary
Write one query against the track table that returns the number of tracks as num_tracks, the total runtime in minutes as total_minutes, and the average runtime in minutes (rounded to one decimal place) as avg_minutes. Use the conversion milliseconds / 1000.0 / 60.
-- Your code hereHint
List three aggregates separated by commas. Wrap the average in ROUND(..., 1), and apply SUM/AVG directly to the computed expression milliseconds / 1000.0 / 60.
Exercise 2: Premium tracks only
Write a query that, for only the tracks priced at 1.99, returns the count of those tracks as num_premium and their average runtime in minutes (rounded to two decimals) as avg_minutes.
-- Your code hereHint
Add WHERE unit_price = 1.99. The WHERE clause filters the rows before the aggregates run.
Exercise 3: Average two ways
Compute the average total of the invoice table twice in one query: once with AVG(total) named avg_total, and once with SUM(total) * 1.0 / COUNT(total) named manual_avg. Confirm they match.
-- Your code hereHint
The * 1.0 forces floating-point division so you keep the decimals. Without it, integer division could lose precision.
Summary
You moved from single aggregates to full summary statistics. You learned to compute several aggregates at once, to run them over computed columns, to refine them with scalar functions like ROUND, to build them from arithmetic, and to restrict them to the rows that matter with WHERE.
Key Concepts
- Multiple aggregates — list several aggregate functions in one
SELECT, separated by commas, to get many statistics in one row. - Computed column — an expression like
milliseconds / 1000.0 / 60calculated as the query runs; you can put it inside an aggregate. - Aggregate + scalar — wrap an aggregate in a scalar function, e.g.
ROUND(AVG(column), 2), to format the result. - Aggregate + arithmetic — combine aggregates with operators, e.g.
SUM(column) / COUNT(column), to build custom metrics. - Conditional summaries —
WHEREfilters rows before aggregation, so your statistics describe only the matching rows.
Why This Matters
Real reports almost never ask for one number. A sales dashboard wants count, total, average, min, and max together — and usually only for a region, a time window, or a product line. The ability to stack aggregates, convert units, round for readability, and filter with WHERE is what turns SQL into a genuine analysis tool. These are the moves you will repeat in nearly every report you build.
Next Steps
So far every summary has covered one slice of data at a time. Next you will learn GROUP BY, which computes a separate summary for every group in a single query — letting you compare countries, states, or products side by side.
Continue to Lesson 3 - Grouping Data with GROUP BY
Split data into groups and compute a summary for each using GROUP BY with aggregate functions
Back to Module Overview
Return to the Summarizing Data with SQL module overview
Continue Building Your Skills
You can now assemble a complete summary in a single query — counts, totals, averages, extremes, all filtered and formatted exactly how you want them. That is a real analyst’s skill. Keep experimenting: try swapping in different conditions and units on the Chinook tables and watch how each summary shifts.