Lesson 5 - Prompting for Data Tasks
On this page
- Welcome to Prompting for Data Tasks
- The Data We’ll Work With
- Extraction: Pull Structured Fields from Messy Text
- Classification: Force One Label from a Fixed Set
- Summarization: Shorten Under Constraints
- Putting It Together: Processing a Batch
- Practice Exercises
- Summary
- Next Steps
- Continue Building Your Skills
Welcome to Prompting for Data Tasks
So far you’ve used prompts to produce text — a blurb, an answer, a rewrite. But the same skills turn a language model into something more useful: a data tool that reads messy, free-form text and gives you back clean, structured results you can store, count, and act on.
Three tasks cover most of this work: pulling fields out of text (extraction), sorting text into categories (classification), and shortening text under rules (summarization). In this lesson you’ll do all three on real model output, then loop one over a small batch — the shape of every real data pipeline.
By the end of this lesson, you will be able to:
- Extract structured fields from unstructured text using a tool/schema
- Classify text into a fixed set of labels with a controlled vocabulary
- Write constrained summaries that respect length, focus, and audience
- Run the same prompt over a batch of items and collect the results reliably
You’ll keep using the cheap claude-haiku-4-5 model. Let’s begin.
The Data We’ll Work With
Every example below runs against the same handful of short customer messages — the kind of free-form text a support inbox fills up with. We’ll write them inline so you can reproduce every result:
import anthropic
import json
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from your environment
reviews = [
"The wireless earbuds keep disconnecting from my phone every few minutes. Really frustrating for the price.",
"Love the new standing desk! Assembly took 20 minutes and it's rock solid. Worth every penny.",
"Ordered the blender three weeks ago and it still hasn't shipped. No reply from support either.",
"Coffee grinder works fine but it's much louder than I expected. Wakes up the whole house.",
]Four messages, each a different shape: a defect, a happy customer, a shipping problem, and a mixed review. That variety is on purpose — it’s how you find out whether your prompt actually holds up.
Extraction: Pull Structured Fields from Messy Text
Extraction means turning a sentence a human wrote into fields a program can use. From the first review you’d want: which product, what issue, and the overall sentiment. You could ask for that in prose and parse the reply, but the reliable way — the one you met in Lesson 4 — is to define a tool with a schema and force the model to fill it in.
The schema is the contract. The model must return a product, an issue, and a sentiment that is one of three allowed values:
extract_tool = {
"name": "record_review",
"description": "Record the structured fields extracted from a single customer review.",
"input_schema": {
"type": "object",
"properties": {
"product": {"type": "string", "description": "The product the review is about."},
"issue": {"type": "string", "description": "The main problem, or 'none' if positive."},
"sentiment": {"type": "string", "enum": ["positive", "negative", "mixed"]},
},
"required": ["product", "issue", "sentiment"],
},
}
def extract(review):
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=300,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "record_review"},
messages=[{"role": "user", "content": f"Extract the fields from this review:\n\n{review}"}],
)
for block in resp.content:
if block.type == "tool_use":
return block.input
print(json.dumps(extract(reviews[0]), indent=2)){
"product": "wireless earbuds",
"issue": "keeps disconnecting from phone",
"sentiment": "negative"
}That’s a Python dictionary, not a paragraph you have to clean up. tool_choice={"type": "tool", "name": "record_review"} forces the model to call the tool, so you never get a chatty “Sure, here’s what I found…” reply — you get the fields. The enum on sentiment guarantees the value is one of three you expect, which matters the moment you try to count or filter on it.
The schema does the prompting
Notice how short the actual user message is — “Extract the fields from this review”. With a tool/schema, the field names and descriptions carry most of the instruction. A clear schema is worth more than a long prose prompt for extraction.
Classification: Force One Label from a Fixed Set
Classification sorts each message into a category. The trap is letting the model invent its own labels — one review comes back “connectivity issue”, the next “Bluetooth problem”, and now your data won’t group. The fix is a controlled vocabulary: you decide the labels up front and force the model to pick exactly one.
CATEGORIES = ["product_quality", "shipping_delay", "billing", "other"]
def classify(message):
prompt = (
"Classify the support message into exactly ONE of these categories:\n"
f"{', '.join(CATEGORIES)}\n\n"
"Reply with only the category name, nothing else. "
"If none clearly fits, reply 'other'.\n\n"
f"Message: {message}"
)
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=10,
temperature=0,
messages=[{"role": "user", "content": prompt}],
)
label = resp.content[0].text.strip().lower()
return label if label in CATEGORIES else "other"
for r in reviews:
print(f"{classify(r):16} | {r[:55]}")product_quality | The wireless earbuds keep disconnecting from my phone e
other | Love the new standing desk! Assembly took 20 minutes an
shipping_delay | Ordered the blender three weeks ago and it still hasn't
product_quality | Coffee grinder works fine but it's much louder than I eThree deliberate choices make this reliable:
temperature=0— for classification you want the same input to always produce the same label. A low temperature makes the model pick the most likely answer instead of varying its wording, so your results are consistent across runs.- A small
max_tokens(here10) — the answer is one word, so cap it. This also stops the model from drifting into an explanation. - A validation line —
return label if label in CATEGORIES else "other". Even with a tight prompt, never trust that the reply lands in your set. Check it in code and fall back to"other". This is your safety net.
Look at the happy standing-desk review: it landed in other. That’s correct given these labels — there’s no “praise” bucket. It’s a useful reminder that your label set defines what the model can say. If you need to catch positive feedback, that’s a category you have to add; the model can only choose from what you give it.
Handling “none” and “unsure”
Always include an escape-hatch label like other or unsure in your set, and tell the model when to use it. Without one, a message that fits nothing gets crammed into the nearest category and quietly pollutes your data. A model that can say “I’m not sure” is more trustworthy than one forced to guess.
Summarization: Shorten Under Constraints
A raw “summarize this” gives you an unpredictable blob. For data work you want a summary with a fixed shape — a set length, a specific focus, and a known audience — so every summary in your pipeline looks the same. You supply those as constraints, exactly the way you sharpened prompts in earlier lessons.
def summarize(message):
prompt = (
"You are a support team lead. Summarize this customer message in ONE sentence "
"for an engineer who will act on it. State the product and the problem only. "
"No greeting, no preamble.\n\n"
f"Message: {message}"
)
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=80,
messages=[{"role": "user", "content": prompt}],
)
return resp.content[0].text.strip()
print(summarize(reviews[2]))Blender order from three weeks ago has not shipped and customer support has not responded.Every constraint earns its place. Length (“ONE sentence”, plus a small max_tokens) keeps it scannable. Focus (“the product and the problem only”) strips out everything an engineer doesn’t need. Audience (“for an engineer who will act on it”) sets the register — terse and factual, not a customer-friendly apology. And “No greeting, no preamble” kills the “Here’s a summary:” opener that would otherwise break a downstream pipeline. The result reads like a ticket title, which is exactly what a support lead would want.
Putting It Together: Processing a Batch
One message at a time is a demo. Real work means running the same prompt over many items and collecting the results into one structured table. The pattern is a plain Python loop that builds a list of dictionaries — one row per item:
results = []
for r in reviews:
fields = extract(r) # product, issue, sentiment
fields["category"] = classify(r) # add the classification label
results.append(fields)
for row in results:
print(json.dumps(row)){"product": "wireless earbuds", "issue": "keeps disconnecting from phone", "sentiment": "negative", "category": "product_quality"}
{"product": "standing desk", "issue": "none", "sentiment": "positive", "category": "other"}
{"product": "blender", "issue": "hasn't shipped after three weeks and no reply from support", "sentiment": "negative", "category": "shipping_delay"}
{"product": "Coffee grinder", "issue": "Louder than expected, wakes up the whole house", "sentiment": "mixed", "category": "product_quality"}Four messy sentences are now four clean records. Because extract() returns a dictionary, you can drop a new key (category) straight in and keep going. A list of these dicts goes directly into pandas.DataFrame(results), a CSV, or a database — the model has become the parsing step in an otherwise ordinary data pipeline.
Reliability scales with the batch
The bigger the batch, the more a single odd result hides in the pile. Lean on the three habits from this lesson: a schema with enums so fields can’t drift, a fixed label set with temperature=0 so categories stay consistent, and a validation step so a stray value is caught instead of stored. They cost almost nothing per item and save you from silently corrupt data at scale.
Practice Exercises
Exercise 1: Add a field to the schema
Extend the record_review tool with a severity field — an enum of "low", "medium", "high" — and re-run extract() over all four reviews. Watch how the model fills it in without any change to your prose prompt.
Hint
Add "severity": {"type": "string", "enum": ["low", "medium", "high"]} to properties and to required. The schema is where you “ask” — you don’t need to mention severity in the user message at all.
Exercise 2: Break, then fix, the label set
Run classify() but temporarily remove the validation line (return label). Then feed it a message that fits none of your categories, like "What are your store hours?", and see what comes back. Put the validation and the other label back and confirm it’s handled cleanly.
Hint
The point is to feel why the safety net matters. Without it, an off-set reply flows straight into your results. The two-line fix — an other category plus the if label in CATEGORIES check — is what makes classification trustworthy.
Exercise 3: Re-target the summary
Rewrite the summarize() prompt for a different audience: a one-sentence summary for the customer, apologetic and friendly, instead of terse-for-an-engineer. Keep the same message and compare the two outputs side by side.
Hint
Change only the role and audience lines (“You are a customer-support agent… write a warm one-sentence reply”). Same task, same length constraint — the audience alone reshapes the tone. This is the format-and-audience lever from Lesson 1, applied to summarization.
Summary
Prompting is how you turn a language model into a data tool. Extraction pulls structured fields from text — use a tool/schema with enums so the output is a clean dictionary, not prose. Classification sorts text into categories — fix the label set, run at low temperature for consistency, give the model an escape-hatch label, and validate the reply in code. Summarization shortens under rules — constrain length, focus, and audience so every summary has the same usable shape. Wrap any of these in a loop and you have a batch pipeline that converts messy text into rows you can store and analyze.
Key Concepts
- Extraction — pulling structured fields from unstructured text, best done with a tool/schema.
- Controlled vocabulary — a fixed set of labels the model must choose from; prevents category drift.
temperature=0— makes classification deterministic so the same input gives the same label.- Validation step — checking the model’s output against your allowed set in code, with a fallback.
- Constrained summarization — controlling length, focus, and audience so summaries are uniform.
- Batch loop — running one prompt over many items and collecting results into a list of dicts.
Why This Matters
Most production LLM work is data work, not chat. The moment you need to process more than a handful of items — tagging tickets, structuring reviews, summarizing documents — these three tasks and the reliability habits around them are what stand between a clean dataset and a pile of inconsistent guesses. Get them right and a language model slots neatly into the data pipelines you already build.
Next Steps
Continue to Lesson 6 - Evaluating and Improving Prompts
Measure prompt quality objectively and iterate toward output that works every time.
Back to Module Overview
Return to the Prompt Engineering module overview
Continue Building Your Skills
You can now make a language model extract, classify, and summarize — the workhorse tasks of real data pipelines — and run them reliably over a batch. Next you’ll learn to judge whether a prompt is actually good: how to evaluate output objectively and improve it on purpose, instead of changing words and hoping.