Lesson 6 - Guided Project: Customer and Product Analysis
Welcome to the Guided Project
You have spent five lessons building a complete toolkit: scalar, multi-row, nested, and correlated subqueries, common table expressions, and views. Now you will use all of it on a single, realistic dataset to answer the kinds of questions a real business actually asks.
You play the role of a data analyst for a store that sells scale model cars. Sales analysis turns raw transaction data into key performance indicators that guide smarter decisions — saving time, money, and inventory. Your job is to mine the sales database and produce recommendations leadership can act on.
By the end of this lesson, you will be able to:
- Explore an unfamiliar relational database and understand how its tables link
- Use correlated subqueries and CTEs to compute inventory and performance metrics
- Rank customers by profitability to guide marketing strategy
- Estimate customer lifetime value to inform an acquisition budget
- Tell a clear data story from question to conclusion
Let’s meet the database and the three questions that drive the project.
Data for this lesson
Database: stores.db — a scale-model car retailer (SQLite).
Tables used: customers, products, employees, offices, orders, orderdetails, payments, productlines
Open it in any SQLite client — DB Browser for SQLite, the sqlite3 CLI, or Python’s sqlite3 — to run every query yourself.
The Project: Three Business Questions
Good analysis starts with good questions. Leadership wants answers to three:
- Which products should we order more of or less of? This is an inventory question about low stock and product performance — keeping best-sellers from going out of stock.
- How should we tailor marketing to customer behavior? This means identifying the most valuable (VIP) customers and the least engaged ones.
- How much can we spend on acquiring new customers? This requires estimating how much profit an average customer generates over their lifetime.
You will answer each in turn, building on the SQL patterns from earlier lessons. Create a file named project.sql to collect your queries, and add comments describing each table as you explore — that file becomes a portfolio piece.
The Scale Model Cars Database
The database has eight tables:
- customers — customer details
- employees — employee information
- offices — sales office locations
- orders — customers’ sales orders
- orderdetails — line items for each order
- payments — customer payment records
- products — the catalog of scale model cars
- productlines — product category groupings
You can preview any table with a simple query before diving in:
SELECT *
FROM products
LIMIT 5;Profiling the Whole Database
A great first step is to summarize every table at once — its name, its number of columns, and its number of rows — using UNION ALL to stack one summary row per table:
SELECT 'Customers' AS table_name,
13 AS number_of_attributes,
COUNT(*) AS number_of_rows
FROM customers
UNION ALL
SELECT 'Products', 9, COUNT(*) FROM products
UNION ALL
SELECT 'ProductLines', 4, COUNT(*) FROM productlines
UNION ALL
SELECT 'Orders', 7, COUNT(*) FROM orders
UNION ALL
SELECT 'OrderDetails', 5, COUNT(*) FROM orderdetails
UNION ALL
SELECT 'Payments', 4, COUNT(*) FROM payments
UNION ALL
SELECT 'Employees', 8, COUNT(*) FROM employees
UNION ALL
SELECT 'Offices', 9, COUNT(*) FROM offices;Output:
| table_name | number_of_attributes | number_of_rows |
|---|---|---|
| Customers | 13 | 122 |
| Products | 9 | 110 |
| ProductLines | 4 | 7 |
| Orders | 7 | 326 |
| OrderDetails | 5 | 2996 |
| Payments | 4 | 273 |
| Employees | 8 | 23 |
| Offices | 9 | 7 |
Now you know the shape of the data: 110 products, 122 customers, and nearly 3,000 order lines to analyze.
Question 1: Which Products to Restock?
This question is about inventory. Two metrics matter, and we define them precisely.
Low stock measures how quickly a product sells relative to how much is on hand — total quantity ordered divided by quantity in stock:
Product performance measures how much revenue a product generates:
Priority restocking products are those with high performance that are also nearly out of stock (high low-stock score).
Step 1: Low Stock With a Correlated Subquery
The low-stock metric needs quantityInStock from products and SUM(quantityOrdered) from orderdetails. A correlated subquery fetches the matching stock figure per product:
SELECT productCode,
ROUND(SUM(quantityOrdered) * 1.0 /
(SELECT quantityInStock
FROM products p
WHERE p.productCode = od.productCode), 2) AS low_stock
FROM orderdetails od
GROUP BY productCode
ORDER BY low_stock DESC
LIMIT 10;For each product, the correlated subquery looks up that product’s stock, and the outer query divides total quantity ordered by it. The top ten are the products closest to selling out.
Step 2: Product Performance
Performance is a straightforward aggregation per product:
SELECT productCode,
SUM(quantityOrdered * priceEach) AS product_performance
FROM orderdetails
GROUP BY productCode
ORDER BY product_performance DESC
LIMIT 10;Step 3: Combine With a CTE
Now use two CTEs — one for low stock, one for performance — and the IN operator to keep only the high-performance products that also rank in the top ten for low stock:
WITH
low_stock_table AS (
SELECT productCode,
ROUND(SUM(quantityOrdered) * 1.0 /
(SELECT quantityInStock
FROM products p
WHERE p.productCode = od.productCode), 2) AS low_stock
FROM orderdetails od
GROUP BY productCode
ORDER BY low_stock DESC
LIMIT 10
)
SELECT productCode,
SUM(quantityOrdered * priceEach) AS product_performance
FROM orderdetails
WHERE productCode IN (SELECT productCode FROM low_stock_table)
GROUP BY productCode
ORDER BY product_performance DESC
LIMIT 10;Joining these product codes back to products reveals the priority restock list:
| productName | productLine |
|---|---|
| 1968 Ford Mustang | Classic Cars |
| 1928 Mercedes-Benz SSK | Vintage Cars |
| 1997 BMW F650 ST | Motorcycles |
| F/A 18 Hornet 1/72 | Planes |
| 2002 Yamaha YZR M1 | Motorcycles |
| The Mayflower | Ships |
| 1960 BSA Gold Star DBD34 | Motorcycles |
| 1928 Ford Phaeton Deluxe | Vintage Cars |
| Pont Yacht | Ships |
| 1911 Ford Town Car | Vintage Cars |
Answer: Vintage cars and motorcycles dominate the restock priorities — they sell frequently and generate strong revenue.
Question 2: VIP and Less-Engaged Customers
To tailor marketing, you first need each customer’s profitability. Profit per line item is quantityOrdered * (priceEach - buyPrice), summed per customer. That requires joining products, orders, and orderdetails.
Step 1: Profit Per Customer
SELECT o.customerNumber,
SUM(quantityOrdered * (priceEach - buyPrice)) AS profit
FROM products p
JOIN orderdetails od
ON p.productCode = od.productCode
JOIN orders o
ON o.orderNumber = od.orderNumber
GROUP BY o.customerNumber;Step 2: The Top Five VIP Customers
Wrap the profit query in a CTE, join it to customers for names and location, and take the top five:
WITH
profit_table AS (
SELECT o.customerNumber,
SUM(quantityOrdered * (priceEach - buyPrice)) AS profit
FROM products p
JOIN orderdetails od
ON p.productCode = od.productCode
JOIN orders o
ON o.orderNumber = od.orderNumber
GROUP BY o.customerNumber
)
SELECT c.contactLastName, c.contactFirstName, c.city, c.country, pt.profit
FROM customers c
JOIN profit_table pt
ON c.customerNumber = pt.customerNumber
ORDER BY pt.profit DESC
LIMIT 5;Output:
| contactLastName | contactFirstName | city | country | profit |
|---|---|---|---|---|
| Freyre | Diego | Madrid | Spain | 326519.66 |
| Nelson | Susan | San Rafael | USA | 236769.39 |
| Young | Jeff | NYC | USA | 72370.09 |
| Ferguson | Peter | Melbourne | Australia | 70311.07 |
| Labrune | Janine | Nantes | France | 60875.30 |
Step 3: The Five Least-Engaged Customers
The same query with ORDER BY pt.profit ASC surfaces the customers who bring in the least:
| contactLastName | contactFirstName | city | country | profit |
|---|---|---|---|---|
| Young | Mary | Glendale | USA | 2610.87 |
| Taylor | Leslie | Brickhaven | USA | 6586.02 |
| Ricotti | Franco | Milan | Italy | 9532.93 |
| Schmitt | Carine | Nantes | France | 10063.80 |
| Smith | Thomas | London | UK | 10868.04 |
Answer: Reward the VIPs (such as Diego Freyre, who alone contributes over 326,000 in profit) with loyalty events, and run re-engagement campaigns aimed at the least-engaged group.
Question 3: How Much to Spend Acquiring Customers?
Before deciding to spend on acquisition, it helps to confirm new customers have slowed — and then estimate what a customer is worth.
Are New Customers Slowing Down?
This query uses CTEs and correlated subqueries to compute, per month, the share of customers who are brand new and the share of revenue they bring:
WITH
payment_with_year_month_table AS (
SELECT *,
CAST(SUBSTR(paymentDate, 1, 4) AS INTEGER) * 100
+ CAST(SUBSTR(paymentDate, 6, 7) AS INTEGER) AS year_month
FROM payments
),
customers_by_month_table AS (
SELECT p1.year_month,
COUNT(*) AS number_of_customers,
SUM(p1.amount) AS total
FROM payment_with_year_month_table p1
GROUP BY p1.year_month
),
new_customers_by_month_table AS (
SELECT p1.year_month,
COUNT(DISTINCT customerNumber) AS number_of_new_customers,
SUM(p1.amount) AS new_customer_total,
(SELECT number_of_customers
FROM customers_by_month_table c
WHERE c.year_month = p1.year_month) AS number_of_customers,
(SELECT total
FROM customers_by_month_table c
WHERE c.year_month = p1.year_month) AS total
FROM payment_with_year_month_table p1
WHERE p1.customerNumber NOT IN (SELECT customerNumber
FROM payment_with_year_month_table p2
WHERE p2.year_month < p1.year_month)
GROUP BY p1.year_month
)
SELECT year_month,
ROUND(number_of_new_customers * 100 / number_of_customers, 1) AS number_of_new_customers_props,
ROUND(new_customer_total * 100 / total, 1) AS new_customers_total_props
FROM new_customers_by_month_table;Output (sample):
| year_month | number_of_new_customers_props | new_customers_total_props |
|---|---|---|
| 200301 | 100.0 | 100.0 |
| 200307 | 75.0 | 68.3 |
| 200405 | 12.0 | 17.3 |
| 200407 | 10.0 | 6.5 |
| 200409 | 33.0 | 56.4 |
The share of new customers falls steadily through 2003 and 2004, and 2005 brings no new customers at all. The store has stopped attracting newcomers — so spending on acquisition makes sense.
Customer Lifetime Value
To decide how much to spend, compute customer lifetime value (LTV) — the average profit a customer generates. Reuse the profit-per-customer logic as a CTE and average it:
WITH
profit_table AS (
SELECT o.customerNumber,
SUM(quantityOrdered * (priceEach - buyPrice)) AS profit
FROM products p
JOIN orderdetails od
ON p.productCode = od.productCode
JOIN orders o
ON o.orderNumber = od.orderNumber
GROUP BY o.customerNumber
)
SELECT AVG(profit) AS ltv
FROM profit_table;Output:
| ltv |
|---|
| 39039.594388 |
Answer: An average customer is worth about 39,000 in profit. So ten new customers would generate roughly 390,000 — a clear basis for setting an acquisition budget.
Practice Exercises
Extend the analysis on your own.
Exercise 1: Restock by Product Line
Group the priority-restock products by productLine to see which category most needs attention. Join your product-code results to products and group by productLine.
Hint
Wrap the Question 1 final query as a CTE, join it to products on productCode, then GROUP BY productLine with a COUNT(*).
Exercise 2: Profit by Country
Adapt the profit-per-customer query to compute total profit per country instead of per customer, ordered from highest to lowest.
Hint
Join the profit CTE to customers, then GROUP BY c.country and SUM(pt.profit).
Exercise 3: Save an Analysis as a View
Turn the profit-per-customer query into a view named customer_profit, then query it to confirm it returns the same results. (Use PostgreSQL syntax.)
Hint
CREATE VIEW customer_profit AS SELECT o.customerNumber, SUM(...) AS profit FROM ... GROUP BY o.customerNumber; then SELECT * FROM customer_profit ORDER BY profit DESC;.
Summary
You analyzed a real sales database from question to conclusion. Correlated subqueries computed low-stock rates, CTEs combined inventory metrics and ranked customers by profit, and a final CTE produced customer lifetime value. Along the way you saw how subqueries, CTEs, and views work together on a genuine business problem.
Key Concepts
- Database profiling — summarizing every table’s size with
UNION ALLbefore analysis. - Correlated subquery for ratios — looking up a per-row value (like stock) to compute a metric.
- CTEs for multi-step analysis — naming intermediate results to keep a long analysis readable.
- Customer lifetime value (LTV) — average profit per customer, used to set an acquisition budget.
- Profit —
quantityOrdered * (priceEach - buyPrice), the core measure of customer and product value.
Why This Matters
This is what SQL work actually looks like on the job: a vague business question, an unfamiliar schema, and a toolkit of subqueries, CTEs, and views to bridge the two. The workflow you practiced — profile the data, break each question into sub-problems, build the query step by step, and translate the numbers into a recommendation — is exactly how data analysts earn their keep. A project like this, written up clearly, is also one of the strongest pieces you can put in a portfolio.
Next Steps
You have completed the Subqueries, CTEs, and Views module. The natural next step is window functions — a feature that lets you compute running totals, rankings, and moving averages across rows without collapsing them, complementing everything you have just learned.
Continue to Window Functions
Compute running totals, rankings, and moving averages across rows with SQL window functions
Back to Module Overview
Return to the Subqueries, CTEs, and Views module overview
Continue Building Your Skills
You did more than learn syntax in this module — you learned to compose SQL, layering queries into logic that answers real questions. Polish this project, write up the story behind your findings, and add it to your portfolio. The ability to turn a database into a decision is exactly what employers are looking for, and you now have it.