Lesson 2 - From SQL Results to pandas DataFrames
On this page
Welcome to SQL Results in pandas
In the last lesson you queried a SQLite database and got back a list of tuples. That works, but tuples make for clumsy analysis — there are no column names, and every calculation means looping and indexing by position. In this lesson you fix that by routing your query results straight into a pandas DataFrame, the table-shaped data structure at the heart of data analysis in Python.
A DataFrame gives you named columns, vectorized operations, and one-line plotting. Once your SQL results land in a DataFrame, the full power of pandas — filtering, grouping, sorting, charting — is available to you. SQLite is excellent at storing and querying data, but it cannot draw a chart; pandas and matplotlib can, so the natural move is to query in SQL and analyze in Python.
By the end of this lesson, you will be able to:
- Connect to a SQLite database and run a query with
pd.read_sql_query() - Receive query results directly as a labeled pandas DataFrame
- Write readable multi-line SQL queries inside Python
- Build a DataFrame manually from cursor results with
pd.DataFrame() - Join tables in SQL and visualize the results with matplotlib
You should be comfortable with the connect-cursor-execute-fetch-close workflow from Lesson 1. Let’s begin.
Data for this lesson
Database: world_population.db — country population data (SQLite).
Tables used: population
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
The Dataset
In this lesson you will work with the World Population Database, a public dataset describing the population of countries (and World Bank country groupings) around the world. It is stored in a SQLite file named world_population.db. The table, population, has one row per country per year, with three columns:
CountryName— the country’s full nameYear— the year of the measurementPopulation— the country’s population that year
The data covers the years 1960 through 2024. Alongside individual countries, the table also contains a few World Bank aggregates such as World, Arab World, and income groups — keep that in mind when you sort by population.
Your goal across the lesson is concrete: pull population-by-year data into Python and chart it. Because SQLite has no plotting of its own, you will query the data, hand it to pandas, and draw the chart there.
Querying Straight into a DataFrame
In Lesson 1 you fetched rows manually. pandas offers a more direct route with pd.read_sql_query(), which runs a SQL query and returns the results as a DataFrame in a single call. It takes two arguments: the query string and an open connection.
A first query
import pandas as pd
import sqlite3
# Open the connection
conn = sqlite3.connect("world_population.db")
# Run the query and read the results into a DataFrame
query = "SELECT CountryName, Population FROM population WHERE Year = 2020 LIMIT 10;"
results = pd.read_sql_query(query, conn)
print(results)
# Always close the connection when finished
conn.close()Output:
CountryName Population
0 Aruba 108587
1 Africa Eastern and Southern 694446100
2 Afghanistan 39068979
3 Africa Western and Central 474569351
4 Angola 33451132
5 Albania 2528480
6 Andorra 77380
7 Arab World 453723239
8 United Arab Emirates 9401038
9 Argentina 45191965Look at what you got back. Instead of a list of bare tuples, results is a DataFrame with named columns (CountryName, Population) and a numeric index down the left side. Every pandas tool now applies. For example:
print(type(results))
print(results.shape)
print(results["Population"].max())Output:
<class 'pandas.core.frame.DataFrame'>
(10, 2)
694446100That largest value belongs to Africa Eastern and Southern, a World Bank regional aggregate rather than a single country — a reminder that this dataset mixes individual countries with grouped totals.
Why this is better than tuples
With a list of tuples you would write row[1] and have to remember that index 1 meant population. With a DataFrame you write results["Population"] and the code explains itself. You also get free access to .head(), .describe(), .sort_values(), .groupby(), and hundreds of other methods — none of which exist on a plain list.
Note
The same pd.read_sql_query() call works against many databases, not just SQLite — you simply pass a connection created by a different library (for example, a PostgreSQL connection). The pandas side of your code stays the same.
Writing Readable Queries with Multi-line Strings
Short queries fit on one line, but real queries have several clauses and quickly become hard to read. Python’s multi-line strings — text wrapped in triple quotes (""" or ''') — let you spread a query across lines with indentation, so each clause stands on its own:
query = """
SELECT CountryName, Year, Population
FROM population
WHERE CountryName = 'United States';
"""This is exactly the same string as a one-line version with \n characters in it; SQLite ignores the extra whitespace. The payoff is readability — you can scan the SELECT, FROM, and WHERE clauses at a glance.
Putting it to work:
import pandas as pd
import sqlite3
conn = sqlite3.connect("world_population.db")
query = """
SELECT CountryName, Year, Population
FROM population
WHERE CountryName = 'United States';
"""
results = pd.read_sql_query(query, conn)
print(results.head(10))
conn.close()Output:
CountryName Year Population
0 United States 1960 180671000
1 United States 1961 183691000
2 United States 1962 186538000
3 United States 1963 189242000
4 United States 1964 191889000
5 United States 1965 194303000
6 United States 1966 196560000
7 United States 1967 198712000
8 United States 1968 200706000
9 United States 1969 202677000Notice results.head(10) shows just the first ten rows — a quick way to peek at a DataFrame without printing the whole thing.
Tip
Although SQLite does not require a semicolon at the end of a query, ending with one is good practice and keeps your queries portable to databases that do expect it.
Building a DataFrame from Cursor Results
pd.read_sql_query() is the easiest path, but it is worth knowing how to build a DataFrame yourself from the raw cursor results you learned to fetch in Lesson 1. This is useful when you have already fetched rows, or when you want full control over column names.
The constructor pd.DataFrame() accepts a list of tuples as rows; you supply the column names separately with the columns argument:
import pandas as pd
import sqlite3
conn = sqlite3.connect("world_population.db")
cursor = conn.cursor()
# Fetch rows the Lesson 1 way: a list of tuples
cursor.execute("SELECT CountryName, Population FROM population WHERE Year = 2020 LIMIT 5;")
rows = cursor.fetchall()
print("Raw rows:", rows)
# Turn the tuples into a labeled DataFrame
df = pd.DataFrame(rows, columns=["CountryName", "Population"])
print(df)
conn.close()Output:
Raw rows: [('Aruba', 108587), ('Africa Eastern and Southern', 694446100), ('Afghanistan', 39068979), ('Africa Western and Central', 474569351), ('Angola', 33451132)]
CountryName Population
0 Aruba 108587
1 Africa Eastern and Southern 694446100
2 Afghanistan 39068979
3 Africa Western and Central 474569351
4 Angola 33451132The same five tuples that were awkward to work with on their own become a tidy, labeled table.
Pulling column names from the cursor
You can even ask the cursor for the column names instead of typing them by hand. After running a query, the cursor’s description attribute holds metadata about each column, with the name in the first position:
cursor.execute("SELECT CountryName, Year, Population FROM population LIMIT 5;")
rows = cursor.fetchall()
# Extract column names from cursor.description
column_names = [col[0] for col in cursor.description]
print("Columns:", column_names)
df = pd.DataFrame(rows, columns=column_names)
print(df)Output:
Columns: ['CountryName', 'Year', 'Population']
CountryName Year Population
0 Aruba 1960 54922
1 Aruba 1961 55578
2 Aruba 1962 56320
3 Aruba 1963 57002
4 Aruba 1964 57619This is essentially what pd.read_sql_query() does for you under the hood — it runs the query, fetches the rows, and reads the column names from cursor.description.
Visualizing the Results
Once your data is in a DataFrame, plotting it is straightforward with matplotlib. Since SQLite cannot draw charts, this is the real payoff of bringing data into Python.
A bar chart of the largest countries
Let’s query the ten most populous countries in 2022 and chart them. Because the table also stores World Bank aggregates like World and Arab World, we list the individual countries we care about in an IN (...) filter so only real countries reach the chart:
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
conn = sqlite3.connect("world_population.db")
query = """
SELECT CountryName, Population
FROM population
WHERE Year = 2022
AND CountryName IN ('India', 'China', 'United States', 'Indonesia',
'Pakistan', 'Nigeria', 'Brazil', 'Bangladesh',
'Russian Federation', 'Mexico', 'Japan', 'Ethiopia')
ORDER BY Population DESC
LIMIT 10;
"""
data = pd.read_sql_query(query, conn)
conn.close()
print(data)
# Build a bar chart from the DataFrame columns
plt.bar(data["CountryName"], data["Population"])
plt.xticks(rotation=45, ha="right")
plt.xlabel("Country")
plt.ylabel("Population")
plt.title("Top 10 Countries by Population in 2022")
plt.show()Output:
CountryName Population
0 India 1425423212
1 China 1412175000
2 United States 334017321
3 Indonesia 278830529
4 Pakistan 243700667
5 Nigeria 223150896
6 Brazil 210306415
7 Bangladesh 169384897
8 Russian Federation 144236933
9 Mexico 128613117The query sorts countries by population in descending order and keeps the top ten, so data arrives ready to plot. plt.bar() draws the bars, plt.xticks(rotation=45, ha="right") angles the country labels so they do not overlap, the xlabel/ylabel/title calls label the chart, and plt.show() displays it.
Charting one country over time
You can reproduce the lesson’s original task — population by year for a single country — the same way. Query the rows, then plot Year against Population:
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
conn = sqlite3.connect("world_population.db")
query = """
SELECT Year, Population
FROM population
WHERE CountryName = 'United States';
"""
data = pd.read_sql_query(query, conn)
conn.close()
plt.bar(data["Year"], data["Population"])
plt.xlabel("Year")
plt.ylabel("Population")
plt.title("Population of the United States by Year")
plt.show()Because the data is already a DataFrame, data["Year"] and data["Population"] plug directly into plt.bar(). The result is a column chart showing steady population growth across the decades.
Putting It All Together: Joining Tables
Real questions often compare rows that live in different parts of a table. A self-join treats one table as if it were two — here you join population to itself so that each country’s 2010 row sits next to its 2020 row, letting you compute the population change over the decade in a single query. Because the SQL JOIN happens inside the database, pandas receives a single, already-combined result set.
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
conn = sqlite3.connect("world_population.db")
query = """
SELECT old.CountryName,
new.Population - old.Population AS PopChange
FROM population old
JOIN population new ON old.CountryName = new.CountryName
WHERE old.Year = 2010 AND new.Year = 2020
AND old.CountryName IN ('India', 'China', 'United States', 'Indonesia',
'Pakistan', 'Nigeria', 'Brazil', 'Bangladesh',
'Russian Federation', 'Mexico', 'Japan', 'Ethiopia',
'Philippines', 'Egypt, Arab Rep.')
ORDER BY PopChange DESC
LIMIT 10;
"""
df = pd.read_sql_query(query, conn)
conn.close()
print(df.head())
# Horizontal bar chart: CountryName on the y-axis, population change on the x-axis
plt.barh(df["CountryName"], df["PopChange"])
plt.title("Top 10 Countries by Population Change from 2010 to 2020")
plt.xlabel("Population Change")
plt.ylabel("Country")
plt.show()Output:
CountryName PopChange
0 India 159136131
1 China 73395000
2 Nigeria 47353295
3 Pakistan 35762699
4 Indonesia 28509544A few things to notice in the query:
- The
JOIN ... ONclause links the table to a second copy of itself, matching rows on the sharedCountryName. The aliasesoldandnewkeep the two copies straight. WHERE old.Year = 2010 AND new.Year = 2020pins each side of the join to one year, so every matched pair spans exactly one decade.new.Population - old.Populationcomputes the change, and the aliasPopChangenames that computed column, so it appears asdf["PopChange"]in pandas.- The
IN (...)filter keeps the chart to individual countries, andORDER BY ... DESC LIMIT 10keeps the ten largest.
Caution
SQL keywords and column names are case-insensitive inside the database, but pandas column labels are case-sensitive. If your query aliases a column PopChange, you must write df["PopChange"] — df["popchange"] would raise a KeyError.
This single workflow — join in SQL, load with pandas, chart with matplotlib — is the everyday rhythm of data analysis that draws on a database.
Practice Exercises
Try these against world_population.db. Assume the population table described above (columns CountryName, Year, Population).
Exercise 1: Query into a DataFrame
Connect to world_population.db and use pd.read_sql_query() to retrieve the CountryName and Population of all countries in the year 2020, limited to 10 rows. Print the DataFrame and close the connection.
# Your code hereHint
Filter with WHERE Year = 2020 and trim with LIMIT 10. Pass the query string and conn to pd.read_sql_query(), then call conn.close() at the end.
Exercise 2: A Readable Multi-line Query
Write a SELECT query as a multi-line triple-quoted string that returns CountryName, Year, and Population for your own country across all years. Read it into a DataFrame and print the first ten rows with .head(10).
# Your code hereHint
Wrap the query in """ so each clause sits on its own line. Use WHERE CountryName = '...' to filter, matching the exact name in the database.
Exercise 3: Build a DataFrame by Hand
Use a cursor to fetch the CountryName and Population for the year 2022 (limit 5) as a list of tuples, then turn that list into a DataFrame with pd.DataFrame(), giving it the column names CountryName and Population.
# Your code hereHint
After cursor.execute(...), call fetchall() to get the tuples, then pd.DataFrame(rows, columns=["CountryName", "Population"]).
Exercise 4: Chart Population by Year
Query the population of your country for every year, read it into a DataFrame, and use plt.bar() to draw a column chart with Year on the x-axis and Population on the y-axis. Add axis labels and a title.
# Your code hereHint
After loading the DataFrame, pass data["Year"] and data["Population"] to plt.bar(), then call plt.xlabel(), plt.ylabel(), plt.title(), and finish with plt.show().
Summary
You have moved from raw tuples to fully labeled pandas DataFrames. The fastest route is pd.read_sql_query(query, conn), which runs a SQL query and hands back a DataFrame in one step. When you have already fetched rows yourself, pd.DataFrame(rows, columns=[...]) builds the same structure, and cursor.description can supply the column names automatically. Once the data is in a DataFrame, you used matplotlib to turn database queries into charts, and you used a self-join in SQL so that pandas received a single combined result.
Key Concepts
- pandas DataFrame — a labeled, table-shaped data structure that makes filtering, grouping, and plotting easy.
pd.read_sql_query(query, conn)— runs a SQL query and returns the results directly as a DataFrame.- Multi-line strings — triple-quoted strings that let you write readable, multi-clause SQL inside Python.
pd.DataFrame(rows, columns=[...])— builds a DataFrame from a list of tuples with column names you specify.cursor.description— metadata about a query’s columns; the column name is the first item of each entry.JOIN ... ON— combines rows from two tables (or a self-join of one table) in the database before pandas ever sees them.- matplotlib — the plotting library that turns DataFrame columns into bar and column charts.
Why This Matters
Databases store the data; pandas and matplotlib make sense of it. By querying in SQL and loading straight into a DataFrame, you let each tool do what it does best — the database filters and joins efficiently over large tables, and pandas handles the analysis and visualization. This combination is the daily workflow of analysts, data scientists, and engineers everywhere, and it scales from a tiny SQLite file on your laptop to a production database serving an entire company.
Next Steps
You can now query a database from Python and analyze the results with pandas. The natural next step is to work with a full production-grade database. PostgreSQL is the open-source database that powers countless real systems, and learning it opens the door to data engineering.
Continue to PostgreSQL for Data Engineers
Step up to a production-grade database and start building data engineering skills with PostgreSQL
Back to Module Overview
Return to the Querying Databases from Python module overview
Continue Building Your Skills
You have closed the loop between SQL and Python: query in the database, analyze in pandas, visualize with matplotlib. Practice this workflow on your own database files and datasets — pick a table, ask a question, write the query, and chart the answer. The more you repeat the cycle, the more naturally you will reach for a database whenever real data work begins.