Lesson 7 - Guided Project: Analyzing Kickstarter Campaigns
Welcome to the Guided Project
You have spent six lessons building real SQL skills — selecting, transforming, filtering, sorting, and labeling data. Now you will put them all to work in a single, connected analysis. This is a guided project: you will build one query step by step, layering on a new technique each time, until you have a polished analysis that answers a genuine business question.
In this project you are a data analyst at a startup. The product team is thinking about launching a Kickstarter campaign and wants to understand what makes campaigns succeed or fail before they commit. You will query a real dataset (a selection of fields from a Kaggle Kickstarter dataset) to help them learn from past campaigns.
By the end of this project, you will be able to:
- Explore an unfamiliar table’s schema with
PRAGMA table_info() - Select the columns relevant to a question
- Filter records by category and quantity with
WHERE,IN, andAND - Sort results by multiple fields, including a calculated field
- Apply
CASElogic to bucket campaigns by funding status
You should have completed Lessons 1 through 6. Let’s begin.
Data for this lesson
Dataset: Kickstarter campaigns — a selection of fields from the Kaggle Kickstarter Projects dataset.
Tables used: ksprojects
Import the CSV into any SQLite client (or query it with Python’s pandas/sqlite3) to follow along.
The Problem and the Data
The product team wants answers to two questions:
- What types of projects are most likely to be successful?
- Which projects fail, and why?
The database has a single table, ksprojects. Here are its columns:
ID— Kickstarter project IDname— name of the projectcategory— category of the projectmain_category— broad category of the projectgoal— fundraising goalpledged— amount actually pledgedstate— outcome (successful, failed, canceled, suspended, etc.)backers— number of people who backed the project
Step 1: Explore the Schema
The very first thing to do with any new table is find out what is inside it. As you learned in Lesson 2, PRAGMA table_info() reveals a table’s columns and their data types:
-- Get the name and data type of every field in the ksprojects table
PRAGMA table_info(ksprojects);| cid | name | type | notnull | dflt_value | pk |
|---|---|---|---|---|---|
| 0 | ID | INTEGER | 0 | NULL | 0 |
| 1 | name | TEXT | 0 | NULL | 0 |
| 2 | category | TEXT | 0 | NULL | 0 |
| 3 | main_category | TEXT | 0 | NULL | 0 |
| 4 | goal | REAL | 0 | NULL | 0 |
| 5 | pledged | REAL | 0 | NULL | 0 |
| 6 | state | TEXT | 0 | NULL | 0 |
| 7 | backers | INTEGER | 0 | NULL | 0 |
Now you know the shape of the data: goal and pledged are REAL numbers, backers is an INTEGER, and state, category, and main_category are TEXT. Notice the comment at the top of the query — documenting your intent from the start is a habit worth keeping, especially in a multi-step analysis.
Selecting and Filtering the Data
With the schema understood, you can start narrowing the data toward the team’s question.
Step 2: Select the Relevant Columns
Even a one-table database has more data than you need. To assess a project’s result by its main category, goal, backers, and amount pledged, select just those four columns and preview the first ten rows:
SELECT main_category, goal, backers, pledged
FROM ksprojects
LIMIT 10;This trims away the noise (ID, name, narrow category) and leaves only the fields relevant to your analysis. Limiting to 10 rows keeps the preview fast and readable while you build the query.
Step 3: Filter to Unsuccessful Campaigns
Thomas Edison said, “I have not failed. I’ve just found 10,000 ways that won’t work.” In that spirit, the product team wants to learn from campaigns that did not succeed. The state field records outcomes, so use IN to keep only the unsuccessful ones:
SELECT main_category, backers, pledged, goal
FROM ksprojects
WHERE state IN ('failed', 'canceled', 'suspended')
LIMIT 10;The IN operator (Lesson 3 and 4) is the clean way to match several text values at once — far tidier than chaining OR conditions. Now you are looking only at campaigns that failed, were canceled, or were suspended.
Step 4: Filter by Size
The database contains many tiny projects that are not relevant to a serious product launch. The team only cares about campaigns of real scale, so add quantity filters with AND: at least 100 backers and at least $20,000 pledged.
SELECT main_category, backers, pledged, goal
FROM ksprojects
WHERE state IN ('failed', 'canceled', 'suspended')
AND backers >= 100
AND pledged >= 20000
LIMIT 10;Each AND makes the filter more selective (Lesson 3). Now you are studying only substantial campaigns that, despite attracting real money and real backers, still did not succeed — a much more interesting group than abandoned $50 projects.
Sorting and Labeling the Results
You have the right rows. Now make them tell a story by sorting and labeling.
Step 5: Sort by Funding Percentage
A raw count of backers is interesting, but the metric that truly defines a Kickstarter outcome is how much of the goal was funded. Create a calculated field pct_pledged (pledged divided by goal), and sort by main category ascending, then by funding percentage descending. Narrow the focus to projects that specifically failed:
SELECT main_category, backers, pledged, goal,
pledged / goal AS pct_pledged
FROM ksprojects
WHERE state IN ('failed')
AND backers >= 100
AND pledged >= 20000
ORDER BY main_category, pct_pledged DESC
LIMIT 10;Two things to notice. First, pct_pledged is created as an alias in SELECT and then reused in ORDER BY — which is allowed, because ORDER BY runs after SELECT (Lesson 5). Second, sorting by main_category then pct_pledged DESC groups the failures by type and surfaces, within each type, the ones that came closest to their goal.
Step 6: Bucket Campaigns with CASE
Sorting by percentage is useful, but human-readable labels are even better for the product team. Use a CASE expression (Lesson 6) to classify each campaign’s funding_status:
- 100% or more of the goal pledged: “Fully funded”
- Between 75% and 100%: “Nearly funded”
- Less than 75%: “Not nearly funded”
SELECT main_category, backers, pledged, goal,
pledged / goal AS pct_pledged,
CASE
WHEN pledged / goal >= 1 THEN 'Fully funded'
WHEN pledged / goal BETWEEN 0.75 AND 1 THEN 'Nearly funded'
ELSE 'Not nearly funded'
END AS funding_status
FROM ksprojects
WHERE state IN ('failed')
AND backers >= 100
AND pledged >= 20000
ORDER BY main_category, pct_pledged DESC
LIMIT 10;Remember the rule from Lesson 3: you cannot reference the pct_pledged alias inside the CASE expression, because CASE is evaluated as part of SELECT — so you repeat the raw pledged / goal calculation each time.
This final query is the whole module in one place: schema knowledge, column selection, calculated fields, multi-condition filtering, multi-field sorting, and conditional labeling. A useful observation from results like these: many “failed” campaigns are not failing for lack of interest — plenty attract 100+ backers and tens of thousands of dollars, yet still fall short of an ambitious goal. That insight alone could reshape how the product team sets its target.
Practice Exercises
Extend the analysis on your own using the ksprojects table.
Exercise 1: Successful Campaigns
Adapt the Step 5 query to study successful campaigns instead of failed ones, keeping the same size filters and sort order.
Hint
Change the WHERE clause to state IN ('successful'). Everything else can stay the same.
Exercise 2: Focus on One Category
Pick a main category you would personally back (for example, ‘Games’ or ‘Technology’) and write a query returning name, goal, pledged, and backers for failed campaigns in just that category, sorted by pledged descending.
Hint
Add AND main_category = 'Games' to the WHERE clause and finish with ORDER BY pledged DESC.
Exercise 3: Explore Freely
Write your own query to investigate something that interests you — perhaps which categories have the most backers, or how canceled campaigns differ from failed ones. Follow the style conventions from this module: capitalized keywords, one clause per line, and a clean river.
Hint
Combine the tools you know: SELECT your columns, WHERE to filter, ORDER BY to sort, and CASE if you want to label groups. Add a comment explaining what your query is for.
Summary
You completed a full, multi-step SQL analysis from scratch. Starting with nothing but an unfamiliar table, you explored its schema, selected relevant columns, filtered by category and size, derived and sorted by a funding-percentage field, and labeled each campaign with a CASE expression — producing a clean result the product team can act on.
Key Concepts
- PRAGMA table_info() — your first move on any new table, revealing columns and data types.
- Layered queries — real analysis is built incrementally, adding one clause at a time.
- IN and AND filtering — combine categorical and quantitative conditions to isolate the records that matter.
- Calculated fields in sorting — derive a field like
pledged / goaland sort by it. - CASE for labeling — turn a numeric ratio into business-friendly funding-status categories; repeat raw calculations inside
CASE. - Clean style — comments, capitalized keywords, and a river make a complex query readable and reusable.
Why This Matters
This project mirrors how analysts actually work. You rarely receive a perfectly shaped dataset and a single query to run. Instead, you start with a vague question and an unfamiliar table, and you build understanding one query at a time — exploring, filtering, calculating, sorting, and labeling until the data answers the question. Being able to do this end to end, with clean and well-documented SQL, is the core skill that makes you genuinely useful with data. You now have that skill, and a real analysis to prove it.
Next Steps
Continue to Summarizing Data with SQL
Move from individual rows to the big picture with aggregate functions like COUNT, SUM, and AVG
Back to Module Overview
Return to the Getting Started with SQL module overview
Continue Building Your Skills
Congratulations — you have finished Getting Started with SQL and completed a real analysis along the way. You can now explore, filter, sort, transform, and label data with confidence. The natural next step is to move from looking at individual rows to summarizing them: counting, totaling, and averaging across whole groups. That is exactly where the next module picks up. Keep querying, keep exploring, and keep your code clean.