Lesson 3 - Prepared Statements and Preventing SQL Injection

Welcome to Safe Queries

Every query that takes input from a user is a potential security hole. In this lesson you will see, step by step, how a careless query lets an attacker read an entire table — and then you will learn the two reliable ways to slam that door shut: parameterized queries and prepared statements.

This is one of the most important lessons you will study as a data engineer. The attack you will explore here, called SQL injection, has caused some of the largest data breaches in history.

By the end of this lesson, you will be able to:

  • Explain what a SQL injection attack is and how it works
  • Recognize the unsafe query patterns that make injection possible
  • Defend against injection using psycopg2 parameterized queries
  • Create and run prepared statements with PREPARE and EXECUTE
  • Describe why prepared statements can also speed up repeated queries

You will keep working with the user accounts data. Let’s make your queries safe.

Data for this lesson

Engine: PostgreSQL. Dataset: a small users table.

Tables used: users

Download: user_accounts.csv — a 10,000-row generated (not real) sample with the schema above. Load it into your PostgreSQL instance (the lesson walks through the import) and query it in psql or a GUI client. Prefer your own? Regenerate with any fake-data tool such as Python’s Faker.


Placeholders Recap

In Lesson 1 you learned to pass values to cursor.execute() as a second argument instead of formatting strings by hand. There are two placeholder styles.

Positional placeholders use %s, filled in order:

cur.execute("INSERT INTO users VALUES (%s, %s, %s, %s);", (
    1,
    '[email protected]',
    'John',
    '124, Fake Street'
))

Named placeholders use %(key)s and a dictionary, so order no longer matters:

cur.execute(
    "INSERT INTO users VALUES (%(id)s, %(email)s, %(name)s, %(address)s);",
    {
        'id': 1000,
        'email': '[email protected]',
        'name': 'John',
        'address': '124, Fake Street',
    }
)

Named placeholders shine in long queries with many values, where counting %s markers gets error-prone. The dictionary keys do not have to match column names — they just have to match the placeholders.

Now let’s see why this matters so much.


How SQL Injection Works

Imagine a website with a text box where someone types a user’s name to look up that user’s email. A naive implementation builds the query by gluing the input directly into the string:

def get_email(name):
    import psycopg2
    conn = psycopg2.connect("dbname=datatweets user=datatweets")
    cur = conn.cursor()
    # UNSAFE: the name is concatenated straight into the query
    query_string = "SELECT email FROM users WHERE name = '" + name + "';"
    cur.execute(query_string)
    res = cur.fetchall()
    conn.close()
    return res

For an honest user named Joseph Kirby, this builds a perfectly reasonable query:

SELECT email FROM users WHERE name = 'Joseph Kirby';

But the input is dropped into the query exactly as typed. Watch what happens if someone types this as their “name”:

Joseph Kirby' OR 1 = 1; --

Substituting it in produces:

SELECT email FROM users WHERE name = 'Joseph Kirby' OR 1 = 1; --';

In SQL, -- starts a comment, so everything after it is ignored. The query is now effectively:

SELECT email FROM users WHERE name = 'Joseph Kirby' OR 1 = 1;

Since 1 = 1 is always true, the WHERE clause is always true. The query collapses to:

SELECT email FROM users;

The attacker just dumped every email in the table. This trick — replacing a value with SQL code that the server then executes — is a SQL injection.

Security Warning

Never build SQL queries by concatenating or formatting untrusted input into the query string. Doing so exposes your database to SQL injection, where an attacker can read, modify, or delete data they should never be able to touch. Always pass values as the second argument to cursor.execute() (or use prepared statements). This single habit prevents an entire class of devastating attacks.

It gets worse

Reading emails is bad, but an attacker is not limited to the column you queried. Using a UNION, they can pull data from other columns too. Recall that UNION combines the results of two queries that return the same number of columns. By crafting the input below, an attacker turns the email lookup into an address lookup:

name = "Joseph Kirby' UNION SELECT address FROM users WHERE name = 'Joseph Kirby"
address_and_email = get_email(name)
print(address_and_email)
[('3594 Fox Ford Apt. 192 West Kristen GA 22838-8977',), ('[email protected]',)]

The function meant only to return emails just leaked a home address. From here it is a short step to extracting passwords or worse. This is why injection is treated as a critical vulnerability.


Defense #1: Parameterized Queries

The fix is the habit you have already been practicing. Instead of concatenating the value into the string, pass it as the second argument to cursor.execute() and let psycopg2 handle it:

cur.execute("SELECT email FROM users WHERE name = %s;", (name,))

Now psycopg2 treats the entire input as a value, never as SQL code. If an attacker passes the malicious string, Postgres simply looks for a user literally named:

Joseph Kirby' UNION SELECT address FROM users WHERE name = 'Joseph Kirby

No such user exists, so the function safely returns an empty list. Every character of the input is interpreted as data, with the quoting added automatically.

Here is the safe version of the lookup function:

def get_email_fixed(name):
    import psycopg2
    conn = psycopg2.connect("dbname=datatweets user=datatweets")
    cur = conn.cursor()
    # SAFE: value passed as the second argument
    cur.execute("SELECT email FROM users WHERE name = %s;", (name,))
    res = cur.fetchall()
    conn.close()
    return res

print(get_email_fixed("Joseph Kirby"))

Note the trailing comma in (name,). The second argument must be a sequence; (name,) is a one-element tuple, while (name) is just name in parentheses and would be wrong.

This works for any command, not just INSERTSELECT, UPDATE, and DELETE all accept parameters the same way.


Defense #2: Prepared Statements

Postgres also offers a SQL-native defense: the prepared statement, created with the PREPARE command.

Think of a prepared statement like a function definition. You define it once with named placeholders, then call it many times with different arguments:

PREPARE insert_user(integer, text, text, text) AS
    INSERT INTO users VALUES ($1, $2, $3, $4);

This creates a statement named insert_user that takes four typed arguments. The placeholders use $1, $2, $3, $4 — numbered, and filled in order. (Note this is different from psycopg2’s %s style.) You run it with EXECUTE:

EXECUTE insert_user(10002, '[email protected]', 'Bob', '101 Fake Street');

Do not confuse the SQL EXECUTE command with the cursor.execute() Python method. The Python method sends any SQL to the server; the SQL EXECUTE command specifically runs a prepared statement. From psycopg2, you send both through cursor.execute():

import psycopg2
conn = psycopg2.connect("dbname=datatweets user=datatweets")
cur = conn.cursor()
cur.execute("""
    PREPARE insert_user(integer, text, text, text) AS
        INSERT INTO users VALUES ($1, $2, $3, $4);
""")
cur.execute("""
    EXECUTE insert_user(%s, %s, %s, %s);
""", (10002, '[email protected]', 'Bob', '101 Fake Street'))

Prepared statements are worth learning because they are not Postgres-specific — MySQL and other engines support them too, so the skill transfers widely.

Where prepared statements live

When you create a prepared statement, Postgres records it in an internal table named pg_prepared_statements. These statements are local to the connection that created them: they vanish when the connection closes, and other connections cannot see or use them. You can inspect them:

cur.execute("""
    PREPARE get_email(text) AS
        SELECT email FROM users WHERE name = $1;
""")
cur.execute("EXECUTE get_email(%s);", ('Anna Carter',))
anna_email = cur.fetchone()

cur.execute("SELECT * FROM pg_prepared_statements;")
rows = cur.fetchall()
print(rows)

Each row records the statement’s name, its SQL text, when it was created, its argument types, and whether it came from a PREPARE command.


A Bonus: Prepared Statements Can Be Faster

Although the main purpose of a prepared statement is to prevent injection, it can also speed up queries you run many times. Every SQL query normally goes through four steps:

  1. The query is parsed for correct syntax.
  2. It is transformed into something the engine can execute.
  3. A query plan is built to find the most efficient execution path.
  4. The plan is executed on the database.

When you PREPARE a statement, steps 1, 2, and 3 happen once, at preparation time. Every later EXECUTE jumps straight to step 4. This has two consequences: repeated executions skip the planning overhead, and — because no parsing happens at execution time — no value can ever be reinterpreted as SQL code, which is exactly why injection is impossible.

You can measure the difference with Python’s timeit module. The pattern is:

import timeit
runtime = timeit.timeit(function_name, number=1)

Comparing a function that prepares once and executes in a loop against one that issues a fresh INSERT each time, the prepared version is typically faster when inserting many rows — though for one-off queries the difference is negligible.

Even so, the standard advice is to pass values through cursor.execute() as your everyday defense, because it sidesteps quoting problems (like that comma in an address) more gracefully than hand-writing EXECUTE calls.


Practice Exercises

Exercise 1: Spot the vulnerability

Look at this function. Explain in one sentence why it is unsafe, and rewrite the cur.execute line to make it safe.

def find_user(username):
    cur.execute("SELECT * FROM users WHERE name = '" + username + "';")
    return cur.fetchall()

Hint

Replace the concatenation with a %s placeholder and pass (username,) as the second argument — note the trailing comma.

Exercise 2: Trace an injection

If a user passes the input x' OR 1=1; -- to the unsafe find_user function above, write out the final SQL string that Postgres would receive and explain what it returns.

Hint

Remember that -- begins a comment and 1=1 is always true.

Exercise 3: Prepare and execute

Write the SQL to prepare a statement named find_by_email that takes one text argument and selects all columns from users where the email matches. Then write the EXECUTE call for the email [email protected].

-- Your code here

Hint

Prepared statements use $1 for the first placeholder, not %s.


Summary

You saw how concatenating untrusted input into a query lets an attacker dump entire tables, and you learned two defenses: parameterized queries through psycopg2 and prepared statements in SQL. Both ensure that user input is always treated as data, never as code.

Key Concepts

  • SQL injection — an attack where user input is interpreted as SQL code, letting an attacker read or alter data they should not access.
  • Parameterized query — passing values as the second argument to cursor.execute() so psycopg2 safely escapes them; your everyday defense.
  • Prepared statement — a named, pre-planned query (PREPAREAS …) run with EXECUTE; safe and reusable across many calls.
  • $1 vs. %s — prepared statements use numbered $1 placeholders; psycopg2 uses %s.
  • pg_prepared_statements — the internal table listing the prepared statements on the current connection.

Why This Matters

SQL injection remains one of the most common and most damaging vulnerabilities in real software, and it is entirely preventable. As a data engineer, you will often build the interfaces and pipelines that accept outside input, and the habit you formed here — always pass values as parameters — is what keeps your organization’s data out of the wrong hands. It is a small discipline with enormous consequences.


Next Steps

Continue to Lesson 4 - Loading and Copying Data

Bulk-load data into Postgres and copy data efficiently between tables

Back to Module Overview

Return to the PostgreSQL for Data Engineers module overview


Continue Building Your Skills

You now write queries that an attacker cannot hijack — a skill that separates hobby scripts from production-grade systems. In the next lesson you will focus on speed and scale: the fastest, most robust ways to move large amounts of data into and out of Postgres tables. Onward!