Lesson 5 - Managing Users and Databases

Welcome to User and Database Management

A database server is shared by many people, and not everyone should be able to do everything. A data analyst usually just needs to read data; a data scientist may need to write it; almost no one should be able to drop a table. In this lesson you will learn to manage users, groups, privileges, and databases so that each person has exactly the access they need — and no more.

This is the heart of database security, and it rests on one simple idea: the least privilege principle.

By the end of this lesson, you will be able to:

  • Create password-protected users and superusers
  • Grant and revoke privileges on tables
  • Build reusable readonly and readwrite groups
  • Create new databases and control who can connect to them
  • Organize tables into schemas

You will use the psycopg2 workflow from earlier lessons. Let’s lock things down.

Data for this lesson

Engine: PostgreSQL. Dataset: a small users table plus roles and privileges.

Tables used: users

Download: user_accounts.csv — a 10,000-row generated (not real) sample with the schema above. Load it into your PostgreSQL instance (the lesson walks through the import) and query it in psql or a GUI client. Prefer your own? Regenerate with any fake-data tool such as Python’s Faker.


Users and the Connection String

You may have wondered why you always connect with a string like "dbname=datatweets user=datatweets". With SQLite you connected to a file; with Postgres you connect to a running server as a specific user. Because the server is shared, different users want different things: some only SELECT, some INSERT and UPDATE, and a trusted few do everything.

A superuser is the administrator of the server — full access to every database, table, and user. With that much power, a superuser must always be password-protected. You pass a password as a keyword argument:

import psycopg2
conn = psycopg2.connect(dbname='datatweets', user='postgres', password='abc123')
print(conn)

This keyword form is equivalent to the connection string. The default Postgres superuser is named postgres. Notice that the datatweets user you have used so far has no password — anyone who knows the username could connect and see everything. That is fine for learning, but a serious security flaw in production.


Creating Users and Inspecting Them

To create a user, connect as a superuser and run CREATE USER. You can attach options like PASSWORD and CREATEDB:

CREATE USER new_user WITH CREATEDB PASSWORD 'datatweets';

This makes a user who can create databases and has the password datatweets. Most options have a negative form prefixed with NO (for example, NOCREATEDB), and by default every negative option is set — so you must explicitly state what a user is allowed to do.

From Python:

import psycopg2
conn = psycopg2.connect(dbname='datatweets', user='postgres', password='abc123')
cur = conn.cursor()
cur.execute("CREATE USER data_viewer WITH SUPERUSER PASSWORD 'secret';")
conn.commit()
conn.close()

The pg_user table

Postgres keeps its users in an internal table named pg_user. The most useful columns are:

column       | meaning
-------------+-------------------------------------------------
usename      | the username
usecreatedb  | True if the user can create databases
usesuper     | True if the user is a superuser

You can list every user on the server:

cur.execute("SELECT * FROM pg_user;")
users = cur.fetchall()
for user in users:
    print(user)

As a rule, you want only one superuser — the default postgres — reserved for administration. Everyone else should get just the privileges their job requires.


Granting and Revoking Privileges

User privileges are the rules that decide which commands a user can run: SELECT, INSERT, DELETE, DROP TABLE, and so on. Most of the time you do not want users running destructive commands. The danger is not only malicious hackers — a well-meaning but inexperienced colleague who runs DROP TABLE instead of SELECT can wipe out vital data just as effectively.

You take away privileges with REVOKE and hand them out with GRANT:

REVOKE DELETE, DROP, UPDATE ON accounts FROM new_user;
GRANT  DELETE, DROP, UPDATE ON accounts TO   new_user;

The pattern is the same for both: the command name, a comma-separated list of privileges, the object they apply to, and the user. The keyword ALL is a shorthand for every privilege.

The recommended workflow is to revoke everything, then grant back only what is needed:

import psycopg2
conn = psycopg2.connect(dbname='datatweets', user='postgres', password='abc123')
cur = conn.cursor()
cur.execute("REVOKE ALL ON users FROM data_viewer;")
cur.execute("GRANT SELECT ON users TO data_viewer;")
conn.commit()
conn.close()

Checking privileges

To see what a user can do, query the internal information_schema.table_privileges table:

cur.execute("""
    SELECT grantor, grantee, privilege_type
    FROM information_schema.table_privileges
    WHERE table_name = 'users';
""")
privileges = cur.fetchall()
for row in privileges:
    print(row)
('datatweets', 'datatweets', 'INSERT')
('datatweets', 'datatweets', 'SELECT')
('datatweets', 'datatweets', 'UPDATE')
('datatweets', 'datatweets', 'DELETE')
('datatweets', 'datatweets', 'TRUNCATE')
('datatweets', 'datatweets', 'REFERENCES')
('datatweets', 'datatweets', 'TRIGGER')
('datatweets', 'data_viewer', 'SELECT')

The superuser trap

There is a catch. Even though data_viewer shows only SELECT, it can still insert data — because you created it as a SUPERUSER, and superusers ignore privileges entirely. A superuser could just create another user with full rights and connect as that one. So privileges are meaningless for superusers.

Fix it by stripping the superuser status with ALTER USER:

cur.execute("ALTER USER data_viewer WITH NOSUPERUSER;")

After this, data_viewer is held to its SELECT-only privileges, and an INSERT attempt fails. The lesson: when you set up privileges, make sure your users are not superusers.


Groups and the Least Privilege Principle

A user that can only read data is a readonly user. The guiding rule for all access is the least privilege principle: every user should have only the privileges strictly required for their job, and nothing more.

Now imagine deploying a database for 100 data analysts who all need read access. Setting privileges one user at a time would be miserable. Postgres solves this with groups: you configure the privileges once on a group, then drop users into it, and they inherit everything automatically.

A group is created just like a user, but you add NOLOGIN so that no one can log in as the group:

CREATE GROUP group_name NOLOGIN;

You assign a user to a group with GRANT:

GRANT group_name TO user_name;

A readonly group

The workflow is: create the group, revoke everything, grant SELECT, then add users:

import psycopg2
conn = psycopg2.connect(dbname='datatweets', user='postgres', password='abc123')
cur = conn.cursor()
cur.execute("CREATE GROUP readonly NOLOGIN;")
cur.execute("REVOKE ALL ON users FROM readonly;")
cur.execute("GRANT SELECT ON users TO readonly;")
cur.execute("GRANT readonly TO data_viewer;")
conn.commit()
conn.close()

This is the natural profile for a data analyst, who typically only needs to read data for analysis.

A readwrite group

A readwrite group can read, insert, update, and delete — but still cannot drop tables. This suits a data scientist, who collects, cleans, and loads data but should not be reshaping the schema (that is the data engineer’s job):

cur.execute("CREATE GROUP readwrite NOLOGIN;")
cur.execute("REVOKE ALL ON users FROM readwrite;")
cur.execute("GRANT SELECT, INSERT, DELETE, UPDATE ON users TO readwrite;")
conn.commit()

These role names are common conventions, not rigid rules — real teams blend responsibilities — but readonly and readwrite are an excellent starting point.


Creating Databases and Schemas

A single Postgres server can host many databases. If a company sells two applications, A and B, it can run both on one server but keep their data in separate databases to stay organized.

You create a database with CREATE DATABASE, optionally naming an OWNER:

CREATE DATABASE my_database OWNER postgres;

The owner is always a superuser, so for small setups it is wise to make postgres the owner rather than minting new superusers.

Autocommit for database creation

Creating a database cannot happen inside a transaction block. But recall that psycopg2 automatically wraps your queries in a transaction until you commit. To get around this, set connection.autocommit = True so the command runs immediately, then set it back:

import psycopg2
conn = psycopg2.connect(dbname='datatweets', user='datatweets')
conn.autocommit = True
cur = conn.cursor()
cur.execute("CREATE DATABASE my_database OWNER postgres;")
conn.autocommit = False

Switch autocommit back to False afterward, because transaction blocks are what keep your data consistent for everything else.

Controlling who can connect

Users and groups are shared across all databases, so unless you say otherwise, any user can connect to any database. To revoke that:

REVOKE CONNECT ON DATABASE database_name FROM user_name;

But this alone is not enough. Every user belongs to a built-in group called public, and public is granted connect rights on new databases by default. Privileges are inherited, so the user can still connect through public. You must revoke from public too:

REVOKE CONNECT ON DATABASE database_name FROM public;
REVOKE ALL ON DATABASE database_name FROM public;

Revoking everything from public follows the least privilege principle and prevents surprise inherited access.

Schemas organize tables

Within a database, schemas act like folders for tables. There is a default schema named public where all your tables have lived so far. You create one with:

CREATE SCHEMA schema_name;

And you reference a table inside it with dot notation:

SELECT * FROM user_data.users;

This is exactly why you wrote information_schema.table_privileges earlier — table_privileges is a table inside the information_schema schema. Schemas also let you grant or revoke across many tables at once:

REVOKE DELETE ON ALL TABLES IN schema_name FROM user_name;
GRANT  SELECT ON ALL TABLES IN schema_name TO   user_name;

The full hierarchy: a server has users and databases; a database contains schemas; a schema contains tables.


Practice Exercises

Exercise 1: Create a secured user

Write the SQL to create a user named report_bot with the password r3p0rt that is not a superuser and cannot create databases.

-- Your code here

Hint

Negative options are set by default, so you mainly need CREATE USER report_bot WITH PASSWORD 'r3p0rt';.

Exercise 2: Apply least privilege

You want report_bot to read the sales table but nothing else. Write the two statements that enforce this.

-- Your code here

Hint

Revoke ALL first, then grant only SELECT.

Exercise 3: Create a database safely

Write the Python code that creates a database named analytics owned by postgres, remembering the autocommit requirement.

# Your code here

Hint

Set conn.autocommit = True before the CREATE DATABASE command, then set it back to False.


Summary

You learned to create password-protected users, grant and revoke privileges, build readonly and readwrite groups around the least privilege principle, create databases (with the autocommit trick), control connection rights through the public group, and organize tables into schemas.

Key Concepts

  • Superuser — an administrator with full access who ignores privileges; keep only one (postgres) and always password-protect it.
  • Least privilege principle — give every user only the access their job requires, and nothing more.
  • GRANT / REVOKE — hand out or take away privileges; the safe pattern is revoke all, then grant what is needed.
  • Group — a NOLOGIN role (like readonly or readwrite) whose privileges users inherit when added with GRANT.
  • Schema — a folder for tables inside a database, referenced as schema_name.table_name.

Why This Matters

Managing access is where data engineering meets security and trust. A single over-privileged account can lead to an accidental DROP TABLE or a deliberate breach, while a well-designed set of groups lets a hundred analysts work safely without a hundred manual setups. The least privilege habit you built here is one of the most respected disciplines in the field.


Next Steps

Continue to Lesson 6 - Project: Installing PostgreSQL Locally

Set up and connect to a local PostgreSQL database on your own machine

Back to Module Overview

Return to the PostgreSQL for Data Engineers module overview


Continue Building Your Skills

You can now run a Postgres server like an administrator — provisioning users, enforcing least privilege, and organizing databases and schemas. Next you will take everything off the training wheels and install Postgres on your own machine, so you can practice all of this for real. Let’s get it running locally.