Lesson 7 - Guided Project: Analyzing Kickstarter Campaigns

Welcome to the Guided Project

You have spent six lessons building real SQL skills — selecting, transforming, filtering, sorting, and labeling data. Now you will put them all to work in a single, connected analysis. This is a guided project: you will build one query step by step, layering on a new technique each time, until you have a polished analysis that answers a genuine business question.

In this project you are a data analyst at a startup. The product team is thinking about launching a Kickstarter campaign and wants to understand what makes campaigns succeed or fail before they commit. You will query a real dataset (a selection of fields from a Kaggle Kickstarter dataset) to help them learn from past campaigns.

By the end of this project, you will be able to:

  • Explore an unfamiliar table’s schema with PRAGMA table_info()
  • Select the columns relevant to a question
  • Filter records by category and quantity with WHERE, IN, and AND
  • Sort results by multiple fields, including a calculated field
  • Apply CASE logic to bucket campaigns by funding status

You should have completed Lessons 1 through 6. Let’s begin.

Data for this lesson

Dataset: Kickstarter campaigns — a selection of fields from the Kaggle Kickstarter Projects dataset.

Tables used: ksprojects

Import the CSV into any SQLite client (or query it with Python’s pandas/sqlite3) to follow along.


The Problem and the Data

The product team wants answers to two questions:

  • What types of projects are most likely to be successful?
  • Which projects fail, and why?

The database has a single table, ksprojects. Here are its columns:

  • ID — Kickstarter project ID
  • name — name of the project
  • category — category of the project
  • main_category — broad category of the project
  • goal — fundraising goal
  • pledged — amount actually pledged
  • state — outcome (successful, failed, canceled, suspended, etc.)
  • backers — number of people who backed the project

Step 1: Explore the Schema

The very first thing to do with any new table is find out what is inside it. As you learned in Lesson 2, PRAGMA table_info() reveals a table’s columns and their data types:

-- Get the name and data type of every field in the ksprojects table
PRAGMA table_info(ksprojects);
cidnametypenotnulldflt_valuepk
0IDINTEGER0NULL0
1nameTEXT0NULL0
2categoryTEXT0NULL0
3main_categoryTEXT0NULL0
4goalREAL0NULL0
5pledgedREAL0NULL0
6stateTEXT0NULL0
7backersINTEGER0NULL0

Now you know the shape of the data: goal and pledged are REAL numbers, backers is an INTEGER, and state, category, and main_category are TEXT. Notice the comment at the top of the query — documenting your intent from the start is a habit worth keeping, especially in a multi-step analysis.


Selecting and Filtering the Data

With the schema understood, you can start narrowing the data toward the team’s question.

Step 2: Select the Relevant Columns

Even a one-table database has more data than you need. To assess a project’s result by its main category, goal, backers, and amount pledged, select just those four columns and preview the first ten rows:

SELECT main_category, goal, backers, pledged
  FROM ksprojects
 LIMIT 10;

This trims away the noise (ID, name, narrow category) and leaves only the fields relevant to your analysis. Limiting to 10 rows keeps the preview fast and readable while you build the query.

Step 3: Filter to Unsuccessful Campaigns

Thomas Edison said, “I have not failed. I’ve just found 10,000 ways that won’t work.” In that spirit, the product team wants to learn from campaigns that did not succeed. The state field records outcomes, so use IN to keep only the unsuccessful ones:

SELECT main_category, backers, pledged, goal
  FROM ksprojects
 WHERE state IN ('failed', 'canceled', 'suspended')
 LIMIT 10;

The IN operator (Lesson 3 and 4) is the clean way to match several text values at once — far tidier than chaining OR conditions. Now you are looking only at campaigns that failed, were canceled, or were suspended.

Step 4: Filter by Size

The database contains many tiny projects that are not relevant to a serious product launch. The team only cares about campaigns of real scale, so add quantity filters with AND: at least 100 backers and at least $20,000 pledged.

SELECT main_category, backers, pledged, goal
  FROM ksprojects
 WHERE state IN ('failed', 'canceled', 'suspended')
   AND backers >= 100
   AND pledged >= 20000
 LIMIT 10;

Each AND makes the filter more selective (Lesson 3). Now you are studying only substantial campaigns that, despite attracting real money and real backers, still did not succeed — a much more interesting group than abandoned $50 projects.


Sorting and Labeling the Results

You have the right rows. Now make them tell a story by sorting and labeling.

Step 5: Sort by Funding Percentage

A raw count of backers is interesting, but the metric that truly defines a Kickstarter outcome is how much of the goal was funded. Create a calculated field pct_pledged (pledged divided by goal), and sort by main category ascending, then by funding percentage descending. Narrow the focus to projects that specifically failed:

SELECT main_category, backers, pledged, goal,
       pledged / goal AS pct_pledged
    FROM ksprojects
   WHERE state IN ('failed')
     AND backers >= 100
     AND pledged >= 20000
ORDER BY main_category, pct_pledged DESC
   LIMIT 10;

Two things to notice. First, pct_pledged is created as an alias in SELECT and then reused in ORDER BY — which is allowed, because ORDER BY runs after SELECT (Lesson 5). Second, sorting by main_category then pct_pledged DESC groups the failures by type and surfaces, within each type, the ones that came closest to their goal.

Step 6: Bucket Campaigns with CASE

Sorting by percentage is useful, but human-readable labels are even better for the product team. Use a CASE expression (Lesson 6) to classify each campaign’s funding_status:

  • 100% or more of the goal pledged: “Fully funded”
  • Between 75% and 100%: “Nearly funded”
  • Less than 75%: “Not nearly funded”
SELECT main_category, backers, pledged, goal,
       pledged / goal AS pct_pledged,
       CASE
           WHEN pledged / goal >= 1                THEN 'Fully funded'
           WHEN pledged / goal BETWEEN 0.75 AND 1  THEN 'Nearly funded'
           ELSE 'Not nearly funded'
       END AS funding_status
    FROM ksprojects
   WHERE state IN ('failed')
     AND backers >= 100
     AND pledged >= 20000
ORDER BY main_category, pct_pledged DESC
   LIMIT 10;

Remember the rule from Lesson 3: you cannot reference the pct_pledged alias inside the CASE expression, because CASE is evaluated as part of SELECT — so you repeat the raw pledged / goal calculation each time.

This final query is the whole module in one place: schema knowledge, column selection, calculated fields, multi-condition filtering, multi-field sorting, and conditional labeling. A useful observation from results like these: many “failed” campaigns are not failing for lack of interest — plenty attract 100+ backers and tens of thousands of dollars, yet still fall short of an ambitious goal. That insight alone could reshape how the product team sets its target.


Practice Exercises

Extend the analysis on your own using the ksprojects table.

Exercise 1: Successful Campaigns

Adapt the Step 5 query to study successful campaigns instead of failed ones, keeping the same size filters and sort order.

Hint

Change the WHERE clause to state IN ('successful'). Everything else can stay the same.

Exercise 2: Focus on One Category

Pick a main category you would personally back (for example, ‘Games’ or ‘Technology’) and write a query returning name, goal, pledged, and backers for failed campaigns in just that category, sorted by pledged descending.

Hint

Add AND main_category = 'Games' to the WHERE clause and finish with ORDER BY pledged DESC.

Exercise 3: Explore Freely

Write your own query to investigate something that interests you — perhaps which categories have the most backers, or how canceled campaigns differ from failed ones. Follow the style conventions from this module: capitalized keywords, one clause per line, and a clean river.

Hint

Combine the tools you know: SELECT your columns, WHERE to filter, ORDER BY to sort, and CASE if you want to label groups. Add a comment explaining what your query is for.


Summary

You completed a full, multi-step SQL analysis from scratch. Starting with nothing but an unfamiliar table, you explored its schema, selected relevant columns, filtered by category and size, derived and sorted by a funding-percentage field, and labeled each campaign with a CASE expression — producing a clean result the product team can act on.

Key Concepts

  • PRAGMA table_info() — your first move on any new table, revealing columns and data types.
  • Layered queries — real analysis is built incrementally, adding one clause at a time.
  • IN and AND filtering — combine categorical and quantitative conditions to isolate the records that matter.
  • Calculated fields in sorting — derive a field like pledged / goal and sort by it.
  • CASE for labeling — turn a numeric ratio into business-friendly funding-status categories; repeat raw calculations inside CASE.
  • Clean style — comments, capitalized keywords, and a river make a complex query readable and reusable.

Why This Matters

This project mirrors how analysts actually work. You rarely receive a perfectly shaped dataset and a single query to run. Instead, you start with a vague question and an unfamiliar table, and you build understanding one query at a time — exploring, filtering, calculating, sorting, and labeling until the data answers the question. Being able to do this end to end, with clean and well-documented SQL, is the core skill that makes you genuinely useful with data. You now have that skill, and a real analysis to prove it.


Next Steps

Continue to Summarizing Data with SQL

Move from individual rows to the big picture with aggregate functions like COUNT, SUM, and AVG

Back to Module Overview

Return to the Getting Started with SQL module overview


Continue Building Your Skills

Congratulations — you have finished Getting Started with SQL and completed a real analysis along the way. You can now explore, filter, sort, transform, and label data with confidence. The natural next step is to move from looking at individual rows to summarizing them: counting, totaling, and averaging across whole groups. That is exactly where the next module picks up. Keep querying, keep exploring, and keep your code clean.