Lesson 1 - Aggregate Functions: COUNT, SUM, AVG, and More

Welcome to Aggregate Functions

In earlier lessons you learned the basics of querying. You can display columns with the SELECT clause, filter rows with the WHERE clause, and refine values with functions like ROUND. Those tools all work one row at a time. In this lesson you take a different step: instead of looking at individual rows, you summarize a whole column into a single number.

That is what aggregate functions do. They let you answer questions about your entire dataset at once — how many invoices are there?, what did we sell in total?, what is the average order size? By the end of this lesson you will be reaching for them naturally.

By the end of this lesson, you will be able to:

  • Explain the difference between scalar and aggregate functions
  • Use SUM and AVG to total and average a numeric column
  • Use MIN and MAX to find extreme values in numeric and text columns
  • Use COUNT to count rows and non-empty values
  • Rename a result column with the AS clause

No prior experience with aggregates is needed — only the SELECT, FROM, WHERE, and LIMIT clauses you already know. Let’s begin.

Data for this lesson

Database: chinook.db — a digital music store (SQLite).

Tables used: invoice, track

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


Scalar vs. Aggregate Functions

You have already met a scalar function: ROUND. A scalar function operates on each row of a column independently and returns one output per row. If a column has 600 rows, ROUND gives you 600 rounded values.

An aggregate function is different. It operates on a whole collection of rows at once and returns a single value. If a column has 600 rows, SUM gives you one total.

Scalar functionsAggregate functions
Applies toeach row or value of a columna collection of a column’s rows
Outputone output per rowa single output

Aggregate functions let you summarize a table with statistics like the sum, mean, minimum, and maximum. Those statistics turn a long table into insight.

The dataset for this lesson

Every example here runs against the Chinook database, which models a fictional digital music store — think of an online store like iTunes. One of its core tables is the invoice table, which stores the total purchase amount of each customer’s invoice. Here are the first three rows for a few columns:

invoice_idcustomer_idinvoice_datebilling_countrytotal
122021-01-01 00:00:00Germany1.98
242021-01-02 00:00:00Norway3.96
382021-01-03 00:00:00Belgium5.94

As a quick review, here is the query that displays those rows. It uses SELECT, FROM, and LIMIT — nothing new yet:

SELECT invoice_id, customer_id, invoice_date, billing_country, total
  FROM invoice
 LIMIT 3;

Throughout this lesson we will sometimes refer to a tiny three-row illustrative table called example, holding three made-up invoices with the totals 15.84, 9.90, and 1.98 and the billing countries USA, Canada, and France. Using a small table with round numbers makes it easy to check the math by hand before running the same query on the full invoice table.


Summing a Column with SUM

The first aggregate function is SUM. It adds up every value in a numeric column and returns the total as a single number.

A worked example

Using the three-row example table above, you can find the overall purchase amount by summing the total column:

SELECT SUM(total)
  FROM example;

SQL computes 15.84 + 9.90 + 1.98 and returns one value:

SUM(total)
----------
27.72

Notice the column heading is SUM(total) — SQL names the result after the operation you performed. That is rarely what you want in a report. To give the result a clear name, use the AS clause:

SELECT SUM(total) AS overall_sale
  FROM example;
overall_sale
------------
27.72

Summing the real table

The same query works on the full invoice table — you just point it at invoice instead of example:

SELECT SUM(total) AS overall_sale
  FROM invoice;
overall_sale
------------
2328.6

This adds up the total column across all 412 invoices in the store and returns a single grand total of 2328.60. The point to remember: SUM collapses many rows into one number.


Averaging a Column with AVG

The AVG aggregate function computes the average (also called the mean) of a numeric column. It works just like SUM: you can apply it to any numeric column.

Under the hood, an average is the sum of the values divided by how many there are. For the example table, that is (15.84 + 9.90 + 1.98) / 3, which equals 9.24.

To compute the average purchase amount across the whole invoice table:

SELECT AVG(total) AS avg_sale
  FROM invoice;
avg_sale
--------
5.651...

SUM and AVG are close siblings — both reduce a numeric column to one value. The difference is simply whether you want the total or the typical value.


Finding Extremes with MIN and MAX

The MIN and MAX aggregate functions return the smallest and largest values in a column.

Numeric columns

Looking at the three invoices in the example table, the lowest total purchase amount is 1.98. You can find it with MIN:

SELECT MIN(total) AS min_sale
  FROM example;
min_sale
--------
1.98

MAX works the same way and would return the highest value, 15.84. To find the largest invoice total across the entire store:

SELECT MAX(total) AS max_sale
  FROM invoice;

This returns the single biggest purchase amount in the total column.

Text columns

MIN and MAX are not limited to numbers — they also work on text columns, where “smallest” and “largest” mean alphabetical order.

This is useful when you want the first or last value alphabetically. In the example table, the billing countries are USA, Canada, and France. Sorted alphabetically, the first is Canada — even though it is not the first row in the table. MIN finds it:

SELECT MIN(billing_country) AS first_billing_country
  FROM example;
first_billing_country
---------------------
Canada

MAX returns the last country alphabetically. Running it on the full invoice table:

SELECT MAX(billing_country) AS last_billing_country
  FROM invoice;

This returns the billing country that comes last in alphabetical order across every invoice in the store.

MIN and MAX on text

For text, MIN and MAX compare strings alphabetically. The “minimum” is the value closest to the start of the alphabet (A), and the “maximum” is the value closest to the end (Z). This has nothing to do with the order the rows appear in the table.


Counting Rows with COUNT

COUNT is the odd one out. The other four functions operate on a column’s values; COUNT is most often used to count rows.

Counting all rows

To count every row in a table, use COUNT(*). The * is a wildcard meaning “all rows,” so you do not name a column:

SELECT COUNT(*) AS num_row
  FROM example;
num_row
-------
3

Run the same idea on the real invoice table and you learn how many invoices the store has:

SELECT COUNT(*) AS num_rows
  FROM invoice;
num_rows
--------
412

Counting non-empty values

Sometimes a column has missing data. SQL stores a missing value as NULL, which means empty. Consider the track table, which lists every song available in the store:

track_idnamealbum_idcomposermillisecondsunit_price
1For Those About To Rock (We Salute You)1Angus Young, Malcolm Young, Brian Johnson3437190.99
63Desafinado81853380.99
64Garota De Ipanema82850480.99

Notice the composer value is empty in the second and third rows — that is a NULL. Counting all rows in this table gives the full track count:

SELECT COUNT(*) AS num_track
  FROM track;
num_track
---------
3503

But what if you want to count only the rows where a particular column actually has a value? Then you put the column name inside COUNT instead of *. Counting the name column returns the same number:

SELECT COUNT(name) AS num_track_name
  FROM track;
num_track_name
--------------
3503

Because name is never empty, the count matches COUNT(*). The composer column is a different story — many tracks have no listed composer:

SELECT COUNT(composer) AS num_composers
  FROM track;

This returns a number smaller than 3503, because COUNT(composer) skips the rows where composer is NULL.

COUNT(*) vs. COUNT(column)

COUNT(*) counts every row in the table. COUNT(column) counts only the rows where that specific column is not NULL. When the two numbers differ, you have found missing values in that column.


The General Pattern

Every query in this lesson follows the same simple template:

SELECT aggregate_function(column) AS new_name
  FROM table;

You pick a function (SUM, AVG, MIN, MAX, or COUNT), apply it to a column, and optionally rename the result with AS. That one pattern unlocks a huge range of questions about your data.


Practice Exercises

Try these on the Chinook database before checking the hints. Each uses a single aggregate function.

Exercise 1: Total runtime of every track

The track table has a milliseconds column holding each track’s runtime. Write a query that sums that column and names the result total_runtime.

-- Your code here

Hint

Use SUM(milliseconds) and rename it with AS total_runtime. The table is track.

Exercise 2: Cheapest and most expensive track

Using the unit_price column of the track table, write a single query that returns both the minimum price as min_price and the maximum price as max_price.

-- Your code here

Hint

You can list two aggregates in one SELECT, separated by a comma: SELECT MIN(unit_price) AS min_price, MAX(unit_price) AS max_price FROM track;

Exercise 3: How many tracks have a composer?

Write a query that counts how many rows in the track table have a non-empty composer value, naming the result num_composers. Then write a second query counting all rows. Are the numbers the same? Why or why not?

-- Your code here

Hint

Use COUNT(composer) for non-empty values and COUNT(*) for all rows. The difference is the number of tracks missing a composer.


Summary

In this lesson you learned to summarize an entire column into a single value using SQL’s five core aggregate functions. You saw how they differ from scalar functions, how to rename results with AS, and how COUNT behaves differently from the rest.

Key Concepts

  • Aggregate function — a function that operates on a whole collection of rows and returns a single value.
  • Scalar function — a function that operates on each row independently and returns one output per row (e.g., ROUND).
  • SUM — adds all values in a numeric column.
  • AVG — computes the average (mean) of a numeric column.
  • MIN / MAX — return the smallest/largest value; for text columns this means first/last alphabetically.
  • COUNT(*) — counts every row in a table.
  • COUNT(column) — counts only the rows where that column is not NULL.
  • AS — renames a result column so reports are readable.

Why This Matters

Aggregate functions are the bridge from raw data to insight. A table of 600 invoices is just a list until you ask “what did we sell in total?” or “what is the average order?” — and those answers come from a single aggregate query. Almost every dashboard, report, and KPI you will ever build starts with COUNT, SUM, or AVG. Mastering them now sets you up for everything that follows.


Next Steps

You can now compute one summary value at a time. Next you will combine several aggregates in one query and mix them with WHERE, scalar functions, and arithmetic to build richer statistics.

Continue to Lesson 2 - Building Summary Statistics

Combine aggregates with WHERE, scalar functions, and arithmetic to compute richer summary metrics

Back to Module Overview

Return to the Summarizing Data with SQL module overview


Continue Building Your Skills

You have taken your first step from reading rows to reading insight. The five functions you learned here — SUM, AVG, MIN, MAX, and COUNT — are among the most-used tools in all of SQL. Keep practicing them on the Chinook tables, and notice how naturally a business question turns into a one-line query.