Lesson 6 - Guided Project: Customer and Product Analysis

Welcome to the Guided Project

You have spent five lessons building a complete toolkit: scalar, multi-row, nested, and correlated subqueries, common table expressions, and views. Now you will use all of it on a single, realistic dataset to answer the kinds of questions a real business actually asks.

You play the role of a data analyst for a store that sells scale model cars. Sales analysis turns raw transaction data into key performance indicators that guide smarter decisions — saving time, money, and inventory. Your job is to mine the sales database and produce recommendations leadership can act on.

By the end of this lesson, you will be able to:

  • Explore an unfamiliar relational database and understand how its tables link
  • Use correlated subqueries and CTEs to compute inventory and performance metrics
  • Rank customers by profitability to guide marketing strategy
  • Estimate customer lifetime value to inform an acquisition budget
  • Tell a clear data story from question to conclusion

Let’s meet the database and the three questions that drive the project.

Data for this lesson

Database: stores.db — a scale-model car retailer (SQLite).

Tables used: customers, products, employees, offices, orders, orderdetails, payments, productlines

Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.


The Project: Three Business Questions

Good analysis starts with good questions. Leadership wants answers to three:

  1. Which products should we order more of or less of? This is an inventory question about low stock and product performance — keeping best-sellers from going out of stock.
  2. How should we tailor marketing to customer behavior? This means identifying the most valuable (VIP) customers and the least engaged ones.
  3. How much can we spend on acquiring new customers? This requires estimating how much profit an average customer generates over their lifetime.

You will answer each in turn, building on the SQL patterns from earlier lessons. Create a file named project.sql to collect your queries, and add comments describing each table as you explore — that file becomes a portfolio piece.

The Scale Model Cars Database

The database has eight tables:

  • customers — customer details
  • employees — employee information
  • offices — sales office locations
  • orders — customers’ sales orders
  • orderdetails — line items for each order
  • payments — customer payment records
  • products — the catalog of scale model cars
  • productlines — product category groupings

You can preview any table with a simple query before diving in:

SELECT *
  FROM products
 LIMIT 5;

Profiling the Whole Database

A great first step is to summarize every table at once — its name, its number of columns, and its number of rows — using UNION ALL to stack one summary row per table:

SELECT 'Customers' AS table_name,
       13 AS number_of_attributes,
       COUNT(*) AS number_of_rows
  FROM customers
 UNION ALL
SELECT 'Products', 9, COUNT(*) FROM products
 UNION ALL
SELECT 'ProductLines', 4, COUNT(*) FROM productlines
 UNION ALL
SELECT 'Orders', 7, COUNT(*) FROM orders
 UNION ALL
SELECT 'OrderDetails', 5, COUNT(*) FROM orderdetails
 UNION ALL
SELECT 'Payments', 4, COUNT(*) FROM payments
 UNION ALL
SELECT 'Employees', 8, COUNT(*) FROM employees
 UNION ALL
SELECT 'Offices', 9, COUNT(*) FROM offices;

Output:

table_namenumber_of_attributesnumber_of_rows
Customers13122
Products9110
ProductLines47
Orders7326
OrderDetails52996
Payments4273
Employees823
Offices97

Now you know the shape of the data: 110 products, 122 customers, and nearly 3,000 order lines to analyze.


Question 1: Which Products to Restock?

This question is about inventory. Two metrics matter, and we define them precisely.

Low stock measures how quickly a product sells relative to how much is on hand — total quantity ordered divided by quantity in stock:

low stock=SUM(quantityOrdered)quantityInStock \text{low stock} = \frac{\text{SUM}(\text{quantityOrdered})}{\text{quantityInStock}}

Product performance measures how much revenue a product generates:

product performance=SUM(quantityOrdered×priceEach) \text{product performance} = \text{SUM}(\text{quantityOrdered} \times \text{priceEach})

Priority restocking products are those with high performance that are also nearly out of stock (high low-stock score).

Step 1: Low Stock With a Correlated Subquery

The low-stock metric needs quantityInStock from products and SUM(quantityOrdered) from orderdetails. A correlated subquery fetches the matching stock figure per product:

SELECT productCode,
       ROUND(SUM(quantityOrdered) * 1.0 /
             (SELECT quantityInStock
                FROM products p
               WHERE p.productCode = od.productCode), 2) AS low_stock
  FROM orderdetails od
 GROUP BY productCode
 ORDER BY low_stock DESC
 LIMIT 10;

For each product, the correlated subquery looks up that product’s stock, and the outer query divides total quantity ordered by it. The top ten are the products closest to selling out.

Step 2: Product Performance

Performance is a straightforward aggregation per product:

SELECT productCode,
       SUM(quantityOrdered * priceEach) AS product_performance
  FROM orderdetails
 GROUP BY productCode
 ORDER BY product_performance DESC
 LIMIT 10;

Step 3: Combine With a CTE

Now use two CTEs — one for low stock, one for performance — and the IN operator to keep only the high-performance products that also rank in the top ten for low stock:

WITH
low_stock_table AS (
SELECT productCode,
       ROUND(SUM(quantityOrdered) * 1.0 /
             (SELECT quantityInStock
                FROM products p
               WHERE p.productCode = od.productCode), 2) AS low_stock
  FROM orderdetails od
 GROUP BY productCode
 ORDER BY low_stock DESC
 LIMIT 10
)

SELECT productCode,
       SUM(quantityOrdered * priceEach) AS product_performance
  FROM orderdetails
 WHERE productCode IN (SELECT productCode FROM low_stock_table)
 GROUP BY productCode
 ORDER BY product_performance DESC
 LIMIT 10;

Joining these product codes back to products reveals the priority restock list:

productNameproductLine
1968 Ford MustangClassic Cars
1928 Mercedes-Benz SSKVintage Cars
1997 BMW F650 STMotorcycles
F/A 18 Hornet 1/72Planes
2002 Yamaha YZR M1Motorcycles
The MayflowerShips
1960 BSA Gold Star DBD34Motorcycles
1928 Ford Phaeton DeluxeVintage Cars
Pont YachtShips
1911 Ford Town CarVintage Cars

Answer: Vintage cars and motorcycles dominate the restock priorities — they sell frequently and generate strong revenue.


Question 2: VIP and Less-Engaged Customers

To tailor marketing, you first need each customer’s profitability. Profit per line item is quantityOrdered * (priceEach - buyPrice), summed per customer. That requires joining products, orders, and orderdetails.

Step 1: Profit Per Customer

SELECT o.customerNumber,
       SUM(quantityOrdered * (priceEach - buyPrice)) AS profit
  FROM products p
  JOIN orderdetails od
    ON p.productCode = od.productCode
  JOIN orders o
    ON o.orderNumber = od.orderNumber
 GROUP BY o.customerNumber;

Step 2: The Top Five VIP Customers

Wrap the profit query in a CTE, join it to customers for names and location, and take the top five:

WITH
profit_table AS (
SELECT o.customerNumber,
       SUM(quantityOrdered * (priceEach - buyPrice)) AS profit
  FROM products p
  JOIN orderdetails od
    ON p.productCode = od.productCode
  JOIN orders o
    ON o.orderNumber = od.orderNumber
 GROUP BY o.customerNumber
)

SELECT c.contactLastName, c.contactFirstName, c.city, c.country, pt.profit
  FROM customers c
  JOIN profit_table pt
    ON c.customerNumber = pt.customerNumber
 ORDER BY pt.profit DESC
 LIMIT 5;

Output:

contactLastNamecontactFirstNamecitycountryprofit
FreyreDiegoMadridSpain326519.66
NelsonSusanSan RafaelUSA236769.39
YoungJeffNYCUSA72370.09
FergusonPeterMelbourneAustralia70311.07
LabruneJanineNantesFrance60875.30

Step 3: The Five Least-Engaged Customers

The same query with ORDER BY pt.profit ASC surfaces the customers who bring in the least:

contactLastNamecontactFirstNamecitycountryprofit
YoungMaryGlendaleUSA2610.87
TaylorLeslieBrickhavenUSA6586.02
RicottiFrancoMilanItaly9532.93
SchmittCarineNantesFrance10063.80
SmithThomasLondonUK10868.04

Answer: Reward the VIPs (such as Diego Freyre, who alone contributes over 326,000 in profit) with loyalty events, and run re-engagement campaigns aimed at the least-engaged group.


Question 3: How Much to Spend Acquiring Customers?

Before deciding to spend on acquisition, it helps to confirm new customers have slowed — and then estimate what a customer is worth.

Are New Customers Slowing Down?

This query uses CTEs and correlated subqueries to compute, per month, the share of customers who are brand new and the share of revenue they bring:

WITH
payment_with_year_month_table AS (
SELECT *,
       CAST(SUBSTR(paymentDate, 1, 4) AS INTEGER) * 100
       + CAST(SUBSTR(paymentDate, 6, 7) AS INTEGER) AS year_month
  FROM payments
),

customers_by_month_table AS (
SELECT p1.year_month,
       COUNT(*) AS number_of_customers,
       SUM(p1.amount) AS total
  FROM payment_with_year_month_table p1
 GROUP BY p1.year_month
),

new_customers_by_month_table AS (
SELECT p1.year_month,
       COUNT(DISTINCT customerNumber) AS number_of_new_customers,
       SUM(p1.amount) AS new_customer_total,
       (SELECT number_of_customers
          FROM customers_by_month_table c
         WHERE c.year_month = p1.year_month) AS number_of_customers,
       (SELECT total
          FROM customers_by_month_table c
         WHERE c.year_month = p1.year_month) AS total
  FROM payment_with_year_month_table p1
 WHERE p1.customerNumber NOT IN (SELECT customerNumber
                                   FROM payment_with_year_month_table p2
                                  WHERE p2.year_month < p1.year_month)
 GROUP BY p1.year_month
)

SELECT year_month,
       ROUND(number_of_new_customers * 100 / number_of_customers, 1) AS number_of_new_customers_props,
       ROUND(new_customer_total * 100 / total, 1) AS new_customers_total_props
  FROM new_customers_by_month_table;

Output (sample):

year_monthnumber_of_new_customers_propsnew_customers_total_props
200301100.0100.0
20030775.068.3
20040512.017.3
20040710.06.5
20040933.056.4

The share of new customers falls steadily through 2003 and 2004, and 2005 brings no new customers at all. The store has stopped attracting newcomers — so spending on acquisition makes sense.

Customer Lifetime Value

To decide how much to spend, compute customer lifetime value (LTV) — the average profit a customer generates. Reuse the profit-per-customer logic as a CTE and average it:

WITH
profit_table AS (
SELECT o.customerNumber,
       SUM(quantityOrdered * (priceEach - buyPrice)) AS profit
  FROM products p
  JOIN orderdetails od
    ON p.productCode = od.productCode
  JOIN orders o
    ON o.orderNumber = od.orderNumber
 GROUP BY o.customerNumber
)

SELECT AVG(profit) AS ltv
  FROM profit_table;

Output:

ltv
39039.594388

Answer: An average customer is worth about 39,000 in profit. So ten new customers would generate roughly 390,000 — a clear basis for setting an acquisition budget.


Practice Exercises

Extend the analysis on your own.

Exercise 1: Restock by Product Line

Group the priority-restock products by productLine to see which category most needs attention. Join your product-code results to products and group by productLine.

Hint

Wrap the Question 1 final query as a CTE, join it to products on productCode, then GROUP BY productLine with a COUNT(*).

Exercise 2: Profit by Country

Adapt the profit-per-customer query to compute total profit per country instead of per customer, ordered from highest to lowest.

Hint

Join the profit CTE to customers, then GROUP BY c.country and SUM(pt.profit).

Exercise 3: Save an Analysis as a View

Turn the profit-per-customer query into a view named customer_profit, then query it to confirm it returns the same results. (Use PostgreSQL syntax.)

Hint

CREATE VIEW customer_profit AS SELECT o.customerNumber, SUM(...) AS profit FROM ... GROUP BY o.customerNumber; then SELECT * FROM customer_profit ORDER BY profit DESC;.


Summary

You analyzed a real sales database from question to conclusion. Correlated subqueries computed low-stock rates, CTEs combined inventory metrics and ranked customers by profit, and a final CTE produced customer lifetime value. Along the way you saw how subqueries, CTEs, and views work together on a genuine business problem.

Key Concepts

  • Database profiling — summarizing every table’s size with UNION ALL before analysis.
  • Correlated subquery for ratios — looking up a per-row value (like stock) to compute a metric.
  • CTEs for multi-step analysis — naming intermediate results to keep a long analysis readable.
  • Customer lifetime value (LTV) — average profit per customer, used to set an acquisition budget.
  • ProfitquantityOrdered * (priceEach - buyPrice), the core measure of customer and product value.

Why This Matters

This is what SQL work actually looks like on the job: a vague business question, an unfamiliar schema, and a toolkit of subqueries, CTEs, and views to bridge the two. The workflow you practiced — profile the data, break each question into sub-problems, build the query step by step, and translate the numbers into a recommendation — is exactly how data analysts earn their keep. A project like this, written up clearly, is also one of the strongest pieces you can put in a portfolio.


Next Steps

You have completed the Subqueries, CTEs, and Views module. The natural next step is window functions — a feature that lets you compute running totals, rankings, and moving averages across rows without collapsing them, complementing everything you have just learned.

Continue to Window Functions

Compute running totals, rankings, and moving averages across rows with SQL window functions

Back to Module Overview

Return to the Subqueries, CTEs, and Views module overview


Continue Building Your Skills

You did more than learn syntax in this module — you learned to compose SQL, layering queries into logic that answers real questions. Polish this project, write up the story behind your findings, and add it to your portfolio. The ability to turn a database into a decision is exactly what employers are looking for, and you now have it.