Lesson 2 - Reading Query Plans with EXPLAIN

Welcome to Query Plans

A query that returns the right answer is only half the battle. When a database grows to tens of thousands of rows, the how of a query starts to matter as much as the what. The same result can be reached through fast paths and slow paths, and Postgres chooses one for you every time you run a query.

In this lesson you learn to see that choice. The EXPLAIN command takes any SQL query and, instead of running it, shows you the step-by-step plan Postgres intends to follow. Its companion option ANALYZE goes further and reports the real time each step actually took. Together they are the foundation of every performance investigation you will do in this module.

By the end of this lesson, you will be able to:

  • Run EXPLAIN to view the plan Postgres builds for a query
  • Read a query plan as a tree of nodes and understand their execution order
  • Interpret startup cost, total cost, and the units Postgres measures them in
  • Use EXPLAIN ANALYZE to capture real runtimes, and roll back destructive queries
  • Recognize why joins are expensive and what their plans look like

You will work with the homeless_by_coc table from Lesson 1. Let’s run our first plan.

Data for this lesson

Engine: PostgreSQL. Dataset: homeless-by-CoC U.S. homelessness counts.

Tables used: homeless_by_coc, state_info, state_household_incomes

Real-world source: HUD — PIT and HIC Data Since 2007 (the HUD homelessness point-in-time counts this homeless_by_coc data is based on). Load it into your own PostgreSQL instance and run the queries in psql or a GUI client.


The EXPLAIN Command

EXPLAIN examines a query without executing it. You place the keyword in front of any statement:

EXPLAIN
SELECT * FROM example_table;

Like other SQL, it can fit on one line, but multiple lines are easier to read. Run it against homeless_by_coc from Python:

import psycopg2

conn = psycopg2.connect(dbname='hud', user='hud_admin', password='hud123')
cur = conn.cursor()

cur.execute("""
    EXPLAIN
    SELECT * FROM homeless_by_coc;
""")

print(cur.fetchall())

The output is a single line that, cleaned up, reads:

[('Seq Scan on homeless_by_coc  (cost=0.00..2145.29 rows=86529 width=480)',)]

This is confusing at first sight. To make sense of it, you need to know where in a query’s life EXPLAIN is looking.


The Path of a Query

Every query in Postgres travels through four steps:

  1. Parse. The query is checked for correct syntax. If it is error-free, it becomes a query tree.
  2. Rewrite. A rewrite system checks the query tree against the system catalogs for any special rules and applies them.
  3. Plan / optimize. The planner (also called the optimizer) turns the rewritten tree into a query plan, choosing what it believes is the fastest route. For any query there are many possible routes, and the number grows as the query gets more complex.
  4. Execute. The executor runs each step of the plan and returns the rows it finds.

EXPLAIN shows you the result of step three — the plan the optimizer produced. In the simple query above there was nothing to optimize: no WHERE filter, no join, no ORDER BY. So the plan is a basic Seq Scan (sequential scan), meaning the executor will loop through every row one at a time and return all of them.

A More Complex Plan

Add a filter and an aggregate, and the plan grows:

import psycopg2

conn = psycopg2.connect(dbname='hud', user='hud_admin', password='hud123')
cur = conn.cursor()

cur.execute("""
    EXPLAIN
    SELECT COUNT(*) FROM homeless_by_coc
    WHERE year > '2012-01-01';
""")

query_plan = cur.fetchall()
for row in query_plan:
    print(row[0])

This produces a plan with two indented lines:

Aggregate  (cost=2433.72..2433.73 rows=1 width=0)
  ->  Seq Scan on homeless_by_coc  (cost=0.00..2361.61 rows=28843 width=0)
        Filter: (year > '2012-01-01'::date)

The indentation matters. The plan starts with an Aggregate node (the COUNT), which depends on a Seq Scan node, which applies a Filter. But the execution order is the reverse of how you read it: Postgres first scans the table, then filters by year, then aggregates the survivors. More nested steps run first.


Output Formats and the Plan Tree

The default text output is compact but awkward to parse in code. EXPLAIN accepts a FORMAT option — text, xml, json, or yaml — wrapped in parentheses:

EXPLAIN (FORMAT json)
SELECT * FROM example_table;

JSON is the most convenient to work with from Python, and it comes back as a single row, so you fetch it with fetchone():

import psycopg2
import json

conn = psycopg2.connect(dbname='hud', user='hud_admin', password='hud123')
cur = conn.cursor()

cur.execute("""
    EXPLAIN (FORMAT json)
    SELECT COUNT(*) FROM homeless_by_coc
    WHERE year > '2012-01-01';
""")

query_plan = cur.fetchone()
print(json.dumps(query_plan[0], indent=2))

The JSON makes the nesting explicit:

[
  {
    "Plan": {
      "Node Type": "Aggregate",
      "Strategy": "Plain",
      "Startup Cost": 2433.72,
      "Total Cost": 2433.73,
      "Plan Rows": 1,
      "Plan Width": 8,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Relation Name": "homeless_by_coc",
          "Startup Cost": 0.0,
          "Total Cost": 2361.61,
          "Plan Rows": 28843,
          "Plan Width": 0,
          "Filter": "(year > '2012-01-01'::date)"
        }
      ]
    }
  }
]

Nodes and the Query Plan Tree

Each step is a node. Nesting expresses dependency: the Aggregate node contains the Seq Scan node in its Plans list, which means the aggregate depends on the scan. Drawn as a tree, the Aggregate sits above the Seq Scan with an arrow pointing up — to count the rows, Postgres first needs the rows from the scan. Postgres always executes a query plan tree from the bottom up, so that every node’s dependencies are resolved before the node itself runs.


Understanding Cost Estimates

The two numbers that matter most in each node are Startup Cost and Total Cost. In the text output they appear together as cost=<startup>..<total>.

These costs are estimates, measured in an arbitrary unit, not in milliseconds. You cannot read a cost of 2500 as “2.5 seconds.” What you can say is relative: a node with cost 2500 will likely run about twice as fast as a node with cost 5000.

  • Startup Cost is the estimated work before the first row can be returned (for example, sorting or aggregating).
  • Total Cost includes the startup cost and represents the work to run the node to completion.

For the Seq Scan above, the startup cost is 0.0 — unsurprising, since a sequential scan just starts iterating immediately with no preparation.

The Sequential Scan Cost Formula

Postgres estimates the run cost of a sequential scan with this formula:

(cpu_tuple_cost+cpu_operator_cost)×Ntuple+seq_page_cost×Npage (\mathtt{cpu\_tuple\_cost} + \mathtt{cpu\_operator\_cost}) \times N_{tuple} + \mathtt{seq\_page\_cost} \times N_{page}

where the default planner parameters are:

  • cpu_tuple_cost=0.01 \mathtt{cpu\_tuple\_cost} = 0.01 — cost of processing one row.
  • cpu_operator_cost=0.0025 \mathtt{cpu\_operator\_cost} = 0.0025 — cost of processing each operator or function.
  • seq_page_cost=1.0 \mathtt{seq\_page\_cost} = 1.0 — cost of fetching one disk page.
  • Ntuple N_{tuple} — number of rows (tuples) in the table.
  • Npage N_{page} — number of disk pages the table occupies.

The values of Ntuple N_{tuple} and Npage N_{page} live in the reltuples and relpages columns of the pg_class catalog table. You can reproduce the estimate by hand:

import psycopg2

cpu_tuple_cost = 0.01
cpu_operator_cost = 0.0025
seq_page_cost = 1.0

conn = psycopg2.connect(dbname='hud', user='hud_admin', password='hud123')
cur = conn.cursor()

cur.execute("""
    SELECT reltuples, relpages
    FROM pg_class
    WHERE relname = 'homeless_by_coc';
""")

n_tuple, n_page = cur.fetchone()
total_cost = (cpu_tuple_cost + cpu_operator_cost) * n_tuple + seq_page_cost * n_page
print(total_cost)

This prints 2363.6125, which matches the 2361.61 from the plan to within rounding — confirming the formula really is what the planner uses.


Getting Real Runtimes with ANALYZE

Plain EXPLAIN has two limits. First, costs are arbitrary units, not time. Second, the row counts are estimates. Look back at the filtered query: the plan estimated 28843 rows, but actually running it returns 50589 rows. The discrepancy exists because EXPLAIN reads estimated statistics from pg_class rather than executing the query. Those statistics are updated periodically and can drift out of date.

To get real numbers, add the ANALYZE option. With ANALYZE, EXPLAIN actually executes the query and reports the recorded times in milliseconds:

EXPLAIN ANALYZE
SELECT * FROM example_table;

Combine ANALYZE and FORMAT by separating the options with a comma:

import psycopg2
import json

conn = psycopg2.connect(dbname='hud', user='hud_admin', password='hud123')
cur = conn.cursor()

cur.execute("""
    EXPLAIN (ANALYZE, FORMAT json)
    SELECT COUNT(*)
    FROM homeless_by_coc
    WHERE year > '2012-01-01';
""")

query_plan = cur.fetchone()
print(json.dumps(query_plan[0], indent=2))

The plan now carries actual measurements alongside the estimates:

[
  {
    "Plan": {
      "Node Type": "Aggregate",
      "Startup Cost": 2433.72,
      "Total Cost": 2433.73,
      "Actual Startup Time": 13.569,
      "Actual Total Time": 13.57,
      "Actual Rows": 1,
      "Actual Loops": 1,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Relation Name": "homeless_by_coc",
          "Plan Rows": 28843,
          "Actual Startup Time": 3.566,
          "Actual Total Time": 10.539,
          "Actual Rows": 50589,
          "Filter": "(year > '2012-01-01'::date)",
          "Rows Removed by Filter": 35940
        }
      ]
    },
    "Planning Time": 0.233,
    "Execution Time": 13.644
  }
]

Now you can see the truth: the scan actually returned 50589 rows (not the estimated 28843), removed 35940 rows with the filter, and the whole query took 13.644 ms to execute.


Safely Analyzing Destructive Queries

Because ANALYZE executes the query, it will change your data if the query is destructive. Consider timing how long it takes to add a column:

EXPLAIN ANALYZE
ALTER TABLE state_info
ADD COLUMN notes text;

That ALTER really happens. The same is true for DELETE, UPDATE, and INSERT. Since you usually run EXPLAIN ANALYZE only to measure, you do not want the side effects.

The fix is to roll back the transaction afterward with conn.rollback(), which reverts every change made in the current transaction block:

import psycopg2
import json

conn = psycopg2.connect(dbname='hud', user='hud_admin', password='hud123')
cur = conn.cursor()

cur.execute("""
    EXPLAIN (ANALYZE, FORMAT json)
    DELETE FROM state_household_incomes;
""")

conn.rollback()
query_plan = cur.fetchone()
print(json.dumps(query_plan[0], indent=2))

You get a complete timing report for the DELETE, but because of conn.rollback(), the state_household_incomes table is left exactly as it was.

Always Roll Back Destructive Analyses

EXPLAIN ANALYZE on a DELETE, UPDATE, INSERT, or ALTER will modify the database. If you are only benchmarking, call conn.rollback() immediately after cur.execute() and before you commit anything else. Forgetting this can destroy real data.


Analyzing a Join

The single most expensive common operation is the JOIN. When a query is slow, a join is very often the culprit. Here is an inner join that attaches state names to the homelessness data:

SELECT homeless_by_coc.state, homeless_by_coc.coc_number,
       homeless_by_coc.coc_name, state_info.name
FROM homeless_by_coc
INNER JOIN state_info
ON homeless_by_coc.state = state_info.postal;

Postgres also lets you write the same join by selecting from both tables and matching them in a WHERE clause:

SELECT homeless_by_coc.state, homeless_by_coc.coc_number,
       homeless_by_coc.coc_name, state_info.name
FROM homeless_by_coc, state_info
WHERE homeless_by_coc.state = state_info.postal;

Analyze it and inspect the plan:

import psycopg2
import json

conn = psycopg2.connect(dbname='hud', user='hud_admin', password='hud123')
cur = conn.cursor()

cur.execute("""
    EXPLAIN (ANALYZE, FORMAT json)
    SELECT homeless_by_coc.state, homeless_by_coc.coc_number,
           homeless_by_coc.coc_name, state_info.name
    FROM homeless_by_coc, state_info
    WHERE homeless_by_coc.state = state_info.postal;
""")
query_plan = cur.fetchone()
print(json.dumps(query_plan[0], indent=2))

The plan is much taller than anything you have seen so far:

[
  {
    "Plan": {
      "Node Type": "Hash Join",
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Relation Name": "homeless_by_coc"
        },
        {
          "Node Type": "Hash",
          "Plans": [
            {
              "Node Type": "Seq Scan",
              "Relation Name": "state_info"
            }
          ]
        }
      ]
    }
  }
]

The top node is a Hash Join that depends on two children: a Seq Scan over homeless_by_coc and a Hash built from a Seq Scan over state_info. A join requires a full sequential scan of each table before the rows can be matched. You will learn what Hash Join and Hash mean precisely in later lessons, but the lesson for now is clear: joins demand more work, and that work grows with table size. The next lesson shows how an index can change those Seq Scan nodes into something far cheaper.


Practice Exercises

Exercise 1: Explain a Simple Filter

Run EXPLAIN (without ANALYZE) on a query that selects all rows from homeless_by_coc where state = 'CA'. Identify the node type and read off the estimated total cost.

Hint

With no index on state, expect a Seq Scan. The cost appears as cost=<startup>..<total> in the text output, or as Total Cost in JSON format.

Exercise 2: Compare Estimated and Actual Rows

Run EXPLAIN (ANALYZE, FORMAT json) on SELECT COUNT(*) FROM homeless_by_coc WHERE year > '2012-01-01'. Compare Plan Rows against Actual Rows in the Seq Scan node. By how much did the estimate miss?

Hint

The plan estimates around 28843 rows but the actual count is 50589. The gap exists because the estimate comes from pg_class statistics, not from running the query.

Exercise 3: Time a Reversible Delete

Use EXPLAIN (ANALYZE, FORMAT json) to measure how long it takes to delete all rows from state_household_incomes, then call conn.rollback(). Confirm afterward, with a SELECT COUNT(*), that the rows are still there.

Hint

Call conn.rollback() immediately after the cur.execute() for the delete. The Execution Time field in the plan tells you how long the delete would have taken.


Summary

You can now look inside any query and see how Postgres plans to run it. EXPLAIN reveals the plan; EXPLAIN ANALYZE reveals reality. You learned to read the plan as a tree of nodes executed bottom-up, to interpret startup and total costs as relative arbitrary units, and to safely time even destructive queries by rolling them back.

Key Concepts

  • EXPLAIN — shows the query plan the optimizer produced, without running the query.
  • Seq Scan — a sequential scan that inspects every row in a table; the default plan when no index helps.
  • Query plan tree — nodes nested by dependency; Postgres executes from the bottom up so children resolve first.
  • Startup cost / Total cost — estimated work in arbitrary units (not time); total includes startup, and the two let you compare nodes relatively.
  • EXPLAIN ANALYZE — actually executes the query and reports real runtimes and row counts; pair destructive queries with conn.rollback().
  • Hash Join — a join strategy whose plan requires a full scan of each joined table, making joins the most common source of slowness.

Why This Matters

Optimization without measurement is guesswork. EXPLAIN ANALYZE is the instrument every database professional reaches for first when a query is slow — it turns “I think this is slow because of the join” into “the Seq Scan on this 86,000-row table took 10 ms and removed 36,000 rows with a filter.” With that evidence in hand, you can fix the right problem. In the next lesson you will use these same plans to prove that an index makes a query dramatically faster.


Next Steps

Continue to Lesson 3 - Speeding Up Queries with Indexes

Build your first index, measure the speedup, and understand the trade-offs an index introduces.

Back to Module Overview

Return to the Optimizing PostgreSQL module overview


Continue Building Your Skills

Reading query plans is a skill that compounds. The more plans you read, the faster you will spot the expensive node in a slow query and know exactly where to intervene. You now have the diagnostic half of optimization. Next you will learn the most powerful cure: the index.