Lesson 1 - Introduction to PostgreSQL

Welcome to PostgreSQL

You already know how to write SQL against SQLite. In this lesson you take a big step up: you meet PostgreSQL, the open-source database engine that powers a huge share of the world’s data platforms, and you connect to it from Python.

The shift from SQLite to Postgres is the shift from analyst to engineer. Instead of a single file on your laptop, you will be talking to a database server that many people can use at the same time. That changes how you connect, how you save your work, and how you think about safety.

By the end of this lesson, you will be able to:

Explain how PostgreSQL differs from SQLite and why data engineers choose it
Connect to a Postgres database from Python using the psycopg2 library
Create a table and insert data with parameterized queries
Explain what a SQL transaction is and why you must commit your changes
Bulk-load a CSV file into a Postgres table

A little SQL and a little Python are all you need. Let’s begin.

Data for this lesson

Engine: PostgreSQL. Dataset: users — a 10,000-row generated sample (user_accounts.csv).

Tables used: users, accounts

Download: user_accounts.csv — a 10,000-row generated (not real) sample with the schema above. Load it into your PostgreSQL instance (the lesson walks through the import) and query it in psql or a GUI client. Prefer your own? Regenerate with any fake-data tool such as Python’s Faker.

From SQLite to PostgreSQL

SQLite is one of the most common database engines in the world. All of your data lives in a single file, which makes it portable and incredibly easy to use. Because SQLite supports most of the SQL commands you will find in larger engines, learning SQLite means you already know the basics of every other SQL database.

But SQLite has a hard limit: only one process can write to the database at a time. That makes it difficult to share with many people and services. The job of a data engineer is to open an organization’s data to a wide group of analysts, data scientists, and applications. You could hand the SQLite file to every interested user — but what happens when 10,000 people want the data at once? SQLite does not scale to that.

The client-server model

To serve that many users, you need a different kind of engine. PostgreSQL (often just called Postgres) is an open-source relational database management system (RDBMS) built as a server rather than a single file.

As a server, Postgres accepts connections from many clients at once. Each client can run queries like SELECT, INSERT, or any other SQL statement, which makes the data accessible to many people simultaneously. Imagine two users, Rose and Bruno: both can connect to the same Postgres server, and both can query the same databases without stepping on each other.

This client-server model is exactly what solves the central data-engineering challenge of making data available to everyone who needs it.

What this means for you

Because Postgres is a server, connecting to it is a little different from opening a SQLite file. You will specify which database you want and which user you are. Postgres uses multiple users and databases to improve both security and the organization of your data — without those values, the server has no idea who is connecting or where, so it refuses.

Most of the SQL commands you already know — SELECT, CREATE, INSERT — work the same in Postgres. When a command behaves differently, the official PostgreSQL documentation is your friend.

Connecting with psycopg2

To talk to a Postgres server from Python, you use the open-source psycopg2 library. If you have used Python’s built-in sqlite3 library before, psycopg2 will feel familiar — it plays the same role for Postgres.

Opening a connection

You connect with the psycopg2.connect() function, passing it a connection string that names the database and the user:

import psycopg2
conn = psycopg2.connect("dbname=datatweets user=datatweets")

Here the connection string "dbname=datatweets user=datatweets" says: connect to the database named datatweets as the user named datatweets. The function returns a connection object that lets the client interact with the server until the connection is closed.

When you are finished, you close the connection with the connection.close() method:

conn.close()

Once a connection is closed, you can no longer run commands through it.

Cursors execute your queries

The connection object does not run queries directly. Instead, you create a cursor object by calling connection.cursor(), and you run SQL through the cursor’s execute() method:

import psycopg2
conn = psycopg2.connect("dbname=datatweets user=datatweets")
cur = conn.cursor()
cur.execute("SELECT * FROM users;")

The string you pass to execute() is called a query string.

Fetching results

cursor.execute() does not return the results directly — it returns None on success, or raises an error. To read the rows, you either loop over the cursor or call one of two fetch methods:

cur.execute("SELECT * FROM users;")
for row in cur:
    print(row)

cursor.fetchone() returns the first result, or None if there are no rows.
cursor.fetchall() returns a list of every row, or an empty list [] if there are none.

cur.execute("SELECT * FROM users;")
one_result = cur.fetchone()
all_results = cur.fetchall()

The data you will load in this module is a set of randomly generated (not real) user accounts from a file named user_accounts.csv. The first few rows look like this:

id | email                    | name           | address
---+--------------------------+----------------+-----------------------------------------------------
0  | [email protected]    | Anna Carter    | 27183 Craig Shore Suite 886 New Benjamin TN 92858
1  | [email protected]   | Joseph Kirby   | 3594 Fox Ford Apt. 192 West Kristen GA 22838-8977
2  | [email protected]  | Larry Cain     | 58208 Cook Bypass West Benjaminfurt OH 25179

Creating Your First Table

Creating a table in Postgres uses the same CREATE TABLE statement you know from SQLite. For each column you specify its name and its data type:

CREATE TABLE employees (
    id integer PRIMARY KEY,
    first_name text,
    last_name text
);

The PRIMARY KEY keyword marks id as the table’s primary key. In Postgres, every table needs at least one set of columns that holds a unique value per row, and the primary key fills that role.

From Python, you wrap the statement in cursor.execute(). Using a triple-quoted string keeps the query readable across multiple lines:

import psycopg2
conn = psycopg2.connect("dbname=datatweets user=datatweets")
cur = conn.cursor()
cur.execute("""
    CREATE TABLE users (
        id integer PRIMARY KEY,
        email text,
        name text,
        address text
    );
""")

The users table has four columns and two data types: an integer for the IDs and text for everything else. Postgres, like every relational database, is type-sensitive — you must declare a type for every column.

Why the table might not be there

Here is a surprise. If you create that table, close the connection, then reconnect and run SELECT * FROM users;, you get an error:

ProgrammingError: relation "users" does not exist
LINE 1: SELECT * FROM users;
                      ^

This is not a bug. It is your first encounter with transactions.

Transactions and the Commit Method

Postgres groups your queries into transactions and only saves the changes when you explicitly ask it to. This is different from SQLite, where every data-modifying query took effect immediately.

Why transactions exist

Imagine a bank with an accounts table:

id | name | balance
---+------+--------
1  | Jim  | 100
2  | Sue  | 200

Sue gives Jim 100 dollars. You model that with two updates:

UPDATE accounts SET balance = 200 WHERE name = 'Jim';
UPDATE accounts SET balance = 100 WHERE name = 'Sue';

Now suppose the connection fails after the first update but before the second. Jim is credited his 100 dollars, but nothing is ever removed from Sue. The bank just lost money:

id | name | balance
---+------+--------
1  | Jim  | 200
2  | Sue  | 200

Transactions prevent exactly this. All the queries in a transaction block run together. If any query in the block fails, the whole block fails and no changes are made at all.

Committing your work

To tell Postgres “apply all the queries in this transaction,” you call connection.commit():

conn.commit()

Whenever you open a connection in psycopg2, a new transaction begins automatically. Every query you run is grouped together until you commit. Committing is like saving a document in a text editor: your edits are not really written until you save.

If you change your mind, connection.rollback() discards the transaction. If you call neither commit nor rollback before closing, the transaction stays pending and your changes simply vanish — which is exactly why the users table seemed to disappear earlier.

So the corrected version commits before closing:

import psycopg2
conn = psycopg2.connect("dbname=datatweets user=datatweets")
cur = conn.cursor()
cur.execute("""
    CREATE TABLE users (
        id integer PRIMARY KEY,
        email text,
        name text,
        address text
    );
""")
conn.commit()
conn.close()

Different views of the same data

Until you commit, your connection sees your changes as if they were saved, but other users see the old data. Two connections can have completely different views of the database. The moment you commit, everyone sees the same thing. This is the price — and the safety — of a multi-user server.

Inserting and Bulk-Loading Data

With a table in place, you load rows with the INSERT command.

The safe way to insert

A tempting but wrong approach is to build the query string yourself with Python formatting:

insert_string = "INSERT INTO users VALUES ({0}, {1}, {2}, {3});".format(
    1, "[email protected]", "John", "123, Fake Street")

This breaks. The result looks like this:

INSERT INTO users VALUES (1, [email protected], John, 123, Fake Street);

Two things went wrong: the comma inside the address split it into two values (so Postgres sees five values, not four), and the text values are missing their single quotes.

The recommended way is to let psycopg2 substitute the values for you. Pass the query string with %s placeholders as the first argument, and a sequence of values as the second:

import psycopg2
conn = psycopg2.connect("dbname=datatweets user=datatweets")
cur = conn.cursor()
cur.execute(
    "INSERT INTO users VALUES (%s, %s, %s, %s);",
    (1, "[email protected]", "John", "123, Fake Street")
)
conn.commit()

The %s placeholders behave like {} in Python formatting, but psycopg2 handles the quoting and escaping safely. A couple of rules to remember:

The second argument must always be a sequence, even for a single value:

cur.execute("INSERT INTO foo VALUES (%s)", "bar")     # WRONG
cur.execute("INSERT INTO foo VALUES (%s)", ("bar",))  # correct
cur.execute("INSERT INTO foo VALUES (%s)", ["bar"])   # correct

Do not quote the placeholder yourself — psycopg2 does it:

cur.execute("INSERT INTO numbers VALUES ('%s')", (10,))  # WRONG
cur.execute("INSERT INTO numbers VALUES (%s)", (10,))    # correct

Loading a whole CSV file

The user_accounts.csv file holds 10,000 users — far too many to insert by hand. Combine psycopg2 with Python’s built-in csv module to loop over every row:

import psycopg2
import csv

conn = psycopg2.connect("dbname=datatweets user=datatweets")
cur = conn.cursor()

with open("user_accounts.csv", "r") as file:
    next(file)  # skip the header row
    reader = csv.reader(file)
    for row in reader:
        cur.execute("INSERT INTO users VALUES (%s, %s, %s, %s);", row)

conn.commit()
conn.close()

Because each row from the CSV reader is already a list, you can pass it straight in as the second argument — no need to break out the individual values. And as always, you commit at the end so the inserts are saved.

The General Postgres Workflow

Pulling it all together, here is the workflow you will use again and again with psycopg2:

Connect to a database with psycopg2.connect().
Create a cursor with connection.cursor().
Execute one or more SQL queries with cursor.execute().
Commit your changes with connection.commit().
Close the connection with connection.close().

You can run as many queries as you like between commits. Everything between two commits forms a single transaction block that succeeds or fails as a unit.

Practice Exercises

Exercise 1: Connect and disconnect

Write a script that imports psycopg2, connects to a database named datatweets as the user datatweets, stores the connection in a variable named conn, and then closes it.

# Your code here

Hint

Your connection string should be "dbname=datatweets user=datatweets". Remember to call conn.close() at the end.

Exercise 2: Create a table and commit

Connect to datatweets, create a cursor, and create a table named products with columns id (integer, primary key), name (text), and price (integer). Commit your changes and close the connection.

# Your code here

Hint

If you forget to call conn.commit(), the table will not actually be saved — even though no error appears.

Exercise 3: Insert safely

Insert one product into your products table using a parameterized query with %s placeholders. Then commit.

# Your code here

Hint

Pass the query string as the first argument to cursor.execute() and a tuple of values as the second — never build the string with .format().

Summary

You connected to a PostgreSQL server, created a table, inserted data safely, and learned why your changes are not saved until you commit. You also saw why Postgres exists at all: it is a multi-user server, and that single fact shapes connections, transactions, and safety.

Key Concepts

PostgreSQL (Postgres) — an open-source relational database server that handles many simultaneous client connections, unlike SQLite’s single-file, single-writer model.
psycopg2 — the Python library used to connect to and query a Postgres server.
Connection string — the text (e.g. "dbname=datatweets user=datatweets") that tells Postgres which database and user to connect as.
Cursor — the object, created from a connection, that executes SQL and fetches results.
Transaction — a group of queries that succeed or fail together; changes become permanent only when you call connection.commit().

Why This Matters

Almost every production data platform you will ever touch runs on a database server, and Postgres is one of the most popular choices. Understanding the connection-cursor-commit cycle is the foundation for everything a data engineer does: loading pipelines, serving analysts, and keeping data consistent when many people use it at once. The transaction safety you learned here is what stands between a working bank and a lost balance.

Next Steps

Continue to Lesson 2 - Designing Tables and Choosing Data Types

Pick the right data types for your data and modify column types as requirements change

Back to Module Overview

Return to the PostgreSQL for Data Engineers module overview

Continue Building Your Skills

You have made the leap from a single-file database to a real database server, and you now know the rhythm every data engineer follows: connect, execute, commit, close. In the next lesson you will sharpen the most engineering-focused skill of all — choosing data types that save space and protect your data from bad input. Keep going!

Lesson 10 - Apply, Map, and Transform Functions

Lesson 11 - Handling Missing Data

Courses

DATATWEETS

Title here

Lesson 1 - Introduction to PostgreSQL

Welcome to PostgreSQL

From SQLite to PostgreSQL

The client-server model

What this means for you

Connecting with psycopg2

Opening a connection

Cursors execute your queries

Fetching results

Creating Your First Table

Why the table might not be there

Transactions and the Commit Method

Why transactions exist

Committing your work

Different views of the same data

Inserting and Bulk-Loading Data

The safe way to insert

Loading a whole CSV file

The General Postgres Workflow

Practice Exercises

Exercise 1: Connect and disconnect

Exercise 2: Create a table and commit

Exercise 3: Insert safely

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 2 - Designing Tables and Choosing Data Types

Back to Module Overview

Continue Building Your Skills

Lesson 1 - Introduction to PostgreSQL

Welcome to PostgreSQL#

From SQLite to PostgreSQL#

The client-server model#

What this means for you#

Connecting with psycopg2#

Opening a connection#

Cursors execute your queries#

Fetching results#

Creating Your First Table#

Why the table might not be there#

Transactions and the Commit Method#

Why transactions exist#

Committing your work#

Different views of the same data#

Inserting and Bulk-Loading Data#

The safe way to insert#

Loading a whole CSV file#

The General Postgres Workflow#

Practice Exercises#

Exercise 1: Connect and disconnect#

Exercise 2: Create a table and commit#

Exercise 3: Insert safely#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 2 - Designing Tables and Choosing Data Types

Back to Module Overview

Continue Building Your Skills#

Welcome to PostgreSQL

From SQLite to PostgreSQL

The client-server model

What this means for you

Connecting with psycopg2

Opening a connection

Cursors execute your queries

Fetching results

Creating Your First Table

Why the table might not be there

Transactions and the Commit Method

Why transactions exist

Committing your work

Different views of the same data

Inserting and Bulk-Loading Data

The safe way to insert

Loading a whole CSV file

The General Postgres Workflow

Practice Exercises

Exercise 1: Connect and disconnect

Exercise 2: Create a table and commit

Exercise 3: Insert safely

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills