Lesson 7 - Guided Project: Sales Analytics for Northwind Traders

Welcome to the Northwind Guided Project

You have learned the four families of window functions and the CTE patterns that make them practical. Now you will put it all together. In this guided project you step into the role of a Data Analyst at Northwind Traders, an international gourmet food distributor, and build a sales analytics report that management can act on.

The Northwind database is a classic, realistic sample database — it models customers, orders, products, categories, and employees, just like a real distributor’s system. Working with it gives you the closest thing to on-the-job experience: real joins, real aggregations, and real business questions.

By the end of this project, you will be able to:

  • Combine window functions with CTEs and joins to answer layered business questions
  • Build employee rankings, running totals, and month-over-month growth rates
  • Identify high-value customers and top-performing products
  • Interpret query results to support data-driven decisions

Unlike earlier lessons, this is a project: each section gives you a task, the approach, and hints, and you write the query. Treat it as a portfolio piece. Let’s begin.

Data for this lesson

Database: the classic Northwind trading company, loaded into your local PostgreSQL (set up in Lesson 6 of PostgreSQL for Data Engineers).

Tables used: orders, order_details, products, categories, employees, customers

These queries use PostgreSQL syntax — run them in psql or any Postgres client.


Setting Up and Exploring the Data

You can work two ways. The recommended route is a local PostgreSQL install loaded with the Northwind database, queried from a Jupyter notebook so your work is shareable. Alternatively, any PostgreSQL environment with Northwind loaded will do. To run SQL in a Jupyter notebook, prefix each query cell with %%sql on its own line.

Start by listing the tables so you know what you are working with:

SELECT table_name AS name,
       table_type AS type
  FROM information_schema.tables
 WHERE table_schema = 'public'
   AND table_type IN ('BASE TABLE', 'VIEW');

A note for Jupyter users

The employees table has a photo column whose data type cannot render with %%sql. You do not need it for this project, so drop it near the top of your notebook: ALTER TABLE employees DROP COLUMN photo;

The tables you will use most are orders, order_details, products, categories, customers, and employees. Spend a few minutes exploring them. Joining order_details to orders and products gives you line-item revenue (unit_price * quantity * (1 - discount)), which is the basis for nearly every metric below. Consider saving useful joins as views with CREATE VIEW.

Hint

Explore tables with SELECT * FROM table_name LIMIT 10;. To get full order detail, join orders, order_details, and products. To see who handled each order, join employees and orders on the employee id.


Task 1: Rank Employee Sales Performance

Management wants to recognize top performers and spot employees who may need support. Your first task is to rank employees by their total sales.

The approach: build a CTE that computes each employee’s total sales by joining employees, orders, and order_details and summing line-item revenue. Then, in the main query, apply RANK() over the total in descending order so the top salesperson gets rank 1.

WITH employee_sales AS (
    SELECT e.employee_id,
           e.first_name || ' ' || e.last_name AS employee_name,
           SUM(od.unit_price * od.quantity * (1 - od.discount)) AS total_sales
      FROM employees e
      JOIN orders o        ON e.employee_id = o.employee_id
      JOIN order_details od ON o.order_id   = od.order_id
     GROUP BY e.employee_id, employee_name
)
SELECT employee_name,
       total_sales,
       RANK() OVER(ORDER BY total_sales DESC) AS sales_rank
  FROM employee_sales
 ORDER BY sales_rank;

RANK() is a good choice here: if two employees tie on total sales, they share a rank. The result is a clean leaderboard management can use to direct rewards and coaching.

Hint

You need three tables joined for total sales. RANK() assigns the same rank to ties and leaves gaps afterward. Order by total sales DESC so rank 1 is the top performer.


Task 2: Running Total of Monthly Sales

Now management wants the bigger picture: how sales accumulate over time. Build a running total of sales by month so they can see the company’s trajectory and spot high- and low-activity periods.

The approach: join orders and order_details, truncate order_date to the month with DATE_TRUNC, sum sales per month, then use SUM() OVER(ORDER BY month) to accumulate.

WITH monthly_sales AS (
    SELECT DATE_TRUNC('month', o.order_date) AS month,
           SUM(od.unit_price * od.quantity * (1 - od.discount)) AS monthly_total
      FROM orders o
      JOIN order_details od ON o.order_id = od.order_id
     GROUP BY DATE_TRUNC('month', o.order_date)
)
SELECT month,
       monthly_total,
       SUM(monthly_total) OVER(ORDER BY month) AS running_total
  FROM monthly_sales
 ORDER BY month;

The running_total column grows month by month, exactly the cumulative line a sales dashboard would plot. This is the running-total pattern from Lesson 2, applied to a real ledger.

Hint

DATE_TRUNC('month', order_date) collapses every date to the first of its month, giving you clean monthly buckets. The OVER(ORDER BY month) clause then accumulates the monthly totals into a running sum.


Task 3: Month-Over-Month Sales Growth

A running total shows direction; management also wants the rate of change. Calculate the month-over-month sales growth as a percentage so they can see where the business accelerated or contracted.

The approach: reuse the monthly totals, use LAG() to fetch the previous month’s total, and compute the percentage change. The growth rate is:

growth=(current monthprevious month1)×100 \text{growth} = \left( \frac{\text{current month}}{\text{previous month}} - 1 \right) \times 100
WITH monthly_sales AS (
    SELECT DATE_TRUNC('month', o.order_date) AS month,
           SUM(od.unit_price * od.quantity * (1 - od.discount)) AS monthly_total
      FROM orders o
      JOIN order_details od ON o.order_id = od.order_id
     GROUP BY DATE_TRUNC('month', o.order_date)
),
with_previous AS (
    SELECT month,
           monthly_total,
           LAG(monthly_total) OVER(ORDER BY month) AS previous_total
      FROM monthly_sales
)
SELECT month,
       monthly_total,
       previous_total,
       ROUND(((monthly_total / previous_total) - 1) * 100, 2) AS growth_pct
  FROM with_previous
 ORDER BY month;

The first month has no predecessor, so its growth_pct is NULL. Every later month shows the percentage change from the month before — the LAG() pattern from Lesson 5 turned into a growth metric.

Hint

LAG(monthly_total) OVER(ORDER BY month) gives the previous month’s figure. Compute the growth as (current / previous - 1) * 100. Chain two CTEs — one for the monthly totals, one for the lagged value — to keep the logic readable.


Task 4: Identify High-Value Customers

Management wants to target high-value customers with promotions. Identify customers with above-average order values — the ones placing larger-than-typical orders.

The approach: in a CTE, compute the value of each order per customer. In the main query, compare each order to the overall average order value with AVG() OVER() and tag it using a CASE statement.

WITH order_values AS (
    SELECT o.customer_id,
           o.order_id,
           SUM(od.unit_price * od.quantity * (1 - od.discount)) AS order_value
      FROM orders o
      JOIN order_details od ON o.order_id = od.order_id
     GROUP BY o.customer_id, o.order_id
)
SELECT customer_id,
       order_id,
       order_value,
       CASE
           WHEN order_value > AVG(order_value) OVER() THEN 'Above Average'
           ELSE 'Average/Below Average'
       END AS value_category
  FROM order_values
 ORDER BY order_value DESC;

The AVG(order_value) OVER() computes the average across all orders without collapsing rows, so each order can be compared to it directly — the empty-OVER() pattern from Lesson 1. As an extension, count how many ‘Above Average’ orders each customer placed to find your most consistently valuable buyers.

Hint

AVG(order_value) OVER() with an empty OVER() gives the average over every order while keeping all rows. The CASE statement then compares each order to that average. Recall from Lesson 3 that you cannot put the window function in WHERE — categorize in SELECT instead.


Task 5: Percentage of Sales for Each Category

To guide inventory and marketing, management wants to know each product category’s share of total sales. This reveals which categories drive the business.

The approach: in a CTE, total sales per category by joining categories, products, order_details, and orders. In the main query, divide each category’s total by the company-wide total — computed with a window SUM() OVER() — and multiply by 100.

WITH category_sales AS (
    SELECT c.category_name,
           SUM(od.unit_price * od.quantity * (1 - od.discount)) AS total_sales
      FROM categories c
      JOIN products p       ON c.category_id = p.category_id
      JOIN order_details od ON p.product_id  = od.product_id
      JOIN orders o         ON od.order_id   = o.order_id
     GROUP BY c.category_name
)
SELECT category_name,
       total_sales,
       ROUND(total_sales / SUM(total_sales) OVER() * 100, 2) AS pct_of_total
  FROM category_sales
 ORDER BY pct_of_total DESC;

SUM(total_sales) OVER() totals every category’s sales into one grand total, and dividing each category by it yields its percentage share. This is the “subtotal over grand total” pattern, done in a single query.

Hint

Compute per-category totals in the CTE, then divide each by SUM(total_sales) OVER() (the company-wide total) and multiply by 100. The window SUM over no partition gives you the denominator without a second query.


Task 6: Top Products per Category

Finally, drill into each category for the top three products by sales. These star performers should stay in stock and feature prominently in marketing.

The approach: total sales per product in a CTE, then use ROW_NUMBER() partitioned by category and ordered by sales descending. Filter to row numbers 1 through 3.

WITH product_sales AS (
    SELECT c.category_name,
           p.product_name,
           SUM(od.unit_price * od.quantity * (1 - od.discount)) AS total_sales
      FROM categories c
      JOIN products p       ON c.category_id = p.category_id
      JOIN order_details od ON p.product_id  = od.product_id
     GROUP BY c.category_name, p.product_name
),
ranked AS (
    SELECT category_name,
           product_name,
           total_sales,
           ROW_NUMBER() OVER(PARTITION BY category_name
                             ORDER BY total_sales DESC) AS product_rank
      FROM product_sales
)
SELECT category_name, product_name, total_sales, product_rank
  FROM ranked
 WHERE product_rank <= 3
 ORDER BY category_name, product_rank;

PARTITION BY category_name restarts the numbering for each category, so product_rank <= 3 keeps the top three per category. This is the top-N-per-group pattern from Lesson 4, applied to a real product catalog. ROW_NUMBER() guarantees exactly three rows per category even if products tie on sales.

Hint

PARTITION BY category_name makes the row numbering restart for each category. Because window functions cannot go in WHERE, compute the row number in a CTE and filter WHERE product_rank <= 3 in the outer query.


Summary

You built an end-to-end sales analytics report on the Northwind database, weaving together everything from this module. You ranked employees with RANK(), accumulated monthly sales with a running total, measured growth with LAG(), flagged high-value customers against a windowed average, computed category share with SUM() OVER(), and surfaced the top products per category with ROW_NUMBER() — all organized with CTEs and real joins.

Key Concepts

  • Window functions + CTEs — compute a windowed value in a CTE, then filter or transform it in the outer query.
  • Running totalSUM() OVER(ORDER BY month) over monthly buckets built with DATE_TRUNC.
  • Growth rateLAG() to fetch the prior period, then a percentage-change formula.
  • Share of total — divide a per-group total by SUM(...) OVER() (the grand total).
  • Top-N per groupROW_NUMBER() OVER(PARTITION BY ...) filtered to the first N rows.

Why This Matters

This is what analytical SQL looks like on the job. Leaderboards, cumulative trend lines, growth rates, customer segmentation, category mix, and top-seller lists are the standard contents of a real sales dashboard — and you produced every one of them with window functions and CTEs. A project like this, run end to end on a realistic database, is exactly the kind of work that demonstrates job-ready SQL skill to an employer or in a portfolio.


Next Steps

Continue to Querying Databases from Python

Move from pure SQL into Python: connect to a database and run queries with sqlite3.

Back to Module Overview

Return to the Window Functions module overview


Continue Building Your Skills

You have completed the Window Functions module — congratulations. To keep growing, extend this project with your own questions: find the top 20% of customers by purchase volume, compare each employee to the company average, or compute cumulative sales per category over time. The Northwind schema is rich enough to support dozens of analyses. Every query you write here sharpens the skills that data teams rely on every day.