Lesson 5 - Prompting for Data Tasks

Welcome to Prompting for Data Tasks

So far you’ve used prompts to produce text — a blurb, an answer, a rewrite. But the same skills turn a language model into something more useful: a data tool that reads messy, free-form text and gives you back clean, structured results you can store, count, and act on.

Three tasks cover most of this work: pulling fields out of text (extraction), sorting text into categories (classification), and shortening text under rules (summarization). In this lesson you’ll do all three on real model output, then loop one over a small batch — the shape of every real data pipeline.

By the end of this lesson, you will be able to:

  • Extract structured fields from unstructured text using a tool/schema
  • Classify text into a fixed set of labels with a controlled vocabulary
  • Write constrained summaries that respect length, focus, and audience
  • Run the same prompt over a batch of items and collect the results reliably

You’ll keep using the cheap claude-haiku-4-5 model. Let’s begin.


The Data We’ll Work With

Every example below runs against the same handful of short customer messages — the kind of free-form text a support inbox fills up with. We’ll write them inline so you can reproduce every result:

import anthropic
import json

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from your environment

reviews = [
    "The wireless earbuds keep disconnecting from my phone every few minutes. Really frustrating for the price.",
    "Love the new standing desk! Assembly took 20 minutes and it's rock solid. Worth every penny.",
    "Ordered the blender three weeks ago and it still hasn't shipped. No reply from support either.",
    "Coffee grinder works fine but it's much louder than I expected. Wakes up the whole house.",
]

Four messages, each a different shape: a defect, a happy customer, a shipping problem, and a mixed review. That variety is on purpose — it’s how you find out whether your prompt actually holds up.


Extraction: Pull Structured Fields from Messy Text

Extraction means turning a sentence a human wrote into fields a program can use. From the first review you’d want: which product, what issue, and the overall sentiment. You could ask for that in prose and parse the reply, but the reliable way — the one you met in Lesson 4 — is to define a tool with a schema and force the model to fill it in.

The schema is the contract. The model must return a product, an issue, and a sentiment that is one of three allowed values:

extract_tool = {
    "name": "record_review",
    "description": "Record the structured fields extracted from a single customer review.",
    "input_schema": {
        "type": "object",
        "properties": {
            "product": {"type": "string", "description": "The product the review is about."},
            "issue": {"type": "string", "description": "The main problem, or 'none' if positive."},
            "sentiment": {"type": "string", "enum": ["positive", "negative", "mixed"]},
        },
        "required": ["product", "issue", "sentiment"],
    },
}

def extract(review):
    resp = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=300,
        tools=[extract_tool],
        tool_choice={"type": "tool", "name": "record_review"},
        messages=[{"role": "user", "content": f"Extract the fields from this review:\n\n{review}"}],
    )
    for block in resp.content:
        if block.type == "tool_use":
            return block.input

print(json.dumps(extract(reviews[0]), indent=2))
{
  "product": "wireless earbuds",
  "issue": "keeps disconnecting from phone",
  "sentiment": "negative"
}

That’s a Python dictionary, not a paragraph you have to clean up. tool_choice={"type": "tool", "name": "record_review"} forces the model to call the tool, so you never get a chatty “Sure, here’s what I found…” reply — you get the fields. The enum on sentiment guarantees the value is one of three you expect, which matters the moment you try to count or filter on it.

The schema does the prompting

Notice how short the actual user message is — “Extract the fields from this review”. With a tool/schema, the field names and descriptions carry most of the instruction. A clear schema is worth more than a long prose prompt for extraction.


Classification: Force One Label from a Fixed Set

Classification sorts each message into a category. The trap is letting the model invent its own labels — one review comes back “connectivity issue”, the next “Bluetooth problem”, and now your data won’t group. The fix is a controlled vocabulary: you decide the labels up front and force the model to pick exactly one.

CATEGORIES = ["product_quality", "shipping_delay", "billing", "other"]

def classify(message):
    prompt = (
        "Classify the support message into exactly ONE of these categories:\n"
        f"{', '.join(CATEGORIES)}\n\n"
        "Reply with only the category name, nothing else. "
        "If none clearly fits, reply 'other'.\n\n"
        f"Message: {message}"
    )
    resp = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=10,
        temperature=0,
        messages=[{"role": "user", "content": prompt}],
    )
    label = resp.content[0].text.strip().lower()
    return label if label in CATEGORIES else "other"

for r in reviews:
    print(f"{classify(r):16} | {r[:55]}")
product_quality  | The wireless earbuds keep disconnecting from my phone e
other            | Love the new standing desk! Assembly took 20 minutes an
shipping_delay   | Ordered the blender three weeks ago and it still hasn't
product_quality  | Coffee grinder works fine but it's much louder than I e

Three deliberate choices make this reliable:

  • temperature=0 — for classification you want the same input to always produce the same label. A low temperature makes the model pick the most likely answer instead of varying its wording, so your results are consistent across runs.
  • A small max_tokens (here 10) — the answer is one word, so cap it. This also stops the model from drifting into an explanation.
  • A validation linereturn label if label in CATEGORIES else "other". Even with a tight prompt, never trust that the reply lands in your set. Check it in code and fall back to "other". This is your safety net.

Look at the happy standing-desk review: it landed in other. That’s correct given these labels — there’s no “praise” bucket. It’s a useful reminder that your label set defines what the model can say. If you need to catch positive feedback, that’s a category you have to add; the model can only choose from what you give it.

Handling “none” and “unsure”

Always include an escape-hatch label like other or unsure in your set, and tell the model when to use it. Without one, a message that fits nothing gets crammed into the nearest category and quietly pollutes your data. A model that can say “I’m not sure” is more trustworthy than one forced to guess.


Summarization: Shorten Under Constraints

A raw “summarize this” gives you an unpredictable blob. For data work you want a summary with a fixed shape — a set length, a specific focus, and a known audience — so every summary in your pipeline looks the same. You supply those as constraints, exactly the way you sharpened prompts in earlier lessons.

def summarize(message):
    prompt = (
        "You are a support team lead. Summarize this customer message in ONE sentence "
        "for an engineer who will act on it. State the product and the problem only. "
        "No greeting, no preamble.\n\n"
        f"Message: {message}"
    )
    resp = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=80,
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.content[0].text.strip()

print(summarize(reviews[2]))
Blender order from three weeks ago has not shipped and customer support has not responded.

Every constraint earns its place. Length (“ONE sentence”, plus a small max_tokens) keeps it scannable. Focus (“the product and the problem only”) strips out everything an engineer doesn’t need. Audience (“for an engineer who will act on it”) sets the register — terse and factual, not a customer-friendly apology. And “No greeting, no preamble” kills the “Here’s a summary:” opener that would otherwise break a downstream pipeline. The result reads like a ticket title, which is exactly what a support lead would want.


Putting It Together: Processing a Batch

One message at a time is a demo. Real work means running the same prompt over many items and collecting the results into one structured table. The pattern is a plain Python loop that builds a list of dictionaries — one row per item:

results = []
for r in reviews:
    fields = extract(r)            # product, issue, sentiment
    fields["category"] = classify(r)   # add the classification label
    results.append(fields)

for row in results:
    print(json.dumps(row))
{"product": "wireless earbuds", "issue": "keeps disconnecting from phone", "sentiment": "negative", "category": "product_quality"}
{"product": "standing desk", "issue": "none", "sentiment": "positive", "category": "other"}
{"product": "blender", "issue": "hasn't shipped after three weeks and no reply from support", "sentiment": "negative", "category": "shipping_delay"}
{"product": "Coffee grinder", "issue": "Louder than expected, wakes up the whole house", "sentiment": "mixed", "category": "product_quality"}

Four messy sentences are now four clean records. Because extract() returns a dictionary, you can drop a new key (category) straight in and keep going. A list of these dicts goes directly into pandas.DataFrame(results), a CSV, or a database — the model has become the parsing step in an otherwise ordinary data pipeline.

Reliability scales with the batch

The bigger the batch, the more a single odd result hides in the pile. Lean on the three habits from this lesson: a schema with enums so fields can’t drift, a fixed label set with temperature=0 so categories stay consistent, and a validation step so a stray value is caught instead of stored. They cost almost nothing per item and save you from silently corrupt data at scale.


Practice Exercises

Exercise 1: Add a field to the schema

Extend the record_review tool with a severity field — an enum of "low", "medium", "high" — and re-run extract() over all four reviews. Watch how the model fills it in without any change to your prose prompt.

Hint

Add "severity": {"type": "string", "enum": ["low", "medium", "high"]} to properties and to required. The schema is where you “ask” — you don’t need to mention severity in the user message at all.

Exercise 2: Break, then fix, the label set

Run classify() but temporarily remove the validation line (return label). Then feed it a message that fits none of your categories, like "What are your store hours?", and see what comes back. Put the validation and the other label back and confirm it’s handled cleanly.

Hint

The point is to feel why the safety net matters. Without it, an off-set reply flows straight into your results. The two-line fix — an other category plus the if label in CATEGORIES check — is what makes classification trustworthy.

Exercise 3: Re-target the summary

Rewrite the summarize() prompt for a different audience: a one-sentence summary for the customer, apologetic and friendly, instead of terse-for-an-engineer. Keep the same message and compare the two outputs side by side.

Hint

Change only the role and audience lines (“You are a customer-support agent… write a warm one-sentence reply”). Same task, same length constraint — the audience alone reshapes the tone. This is the format-and-audience lever from Lesson 1, applied to summarization.


Summary

Prompting is how you turn a language model into a data tool. Extraction pulls structured fields from text — use a tool/schema with enums so the output is a clean dictionary, not prose. Classification sorts text into categories — fix the label set, run at low temperature for consistency, give the model an escape-hatch label, and validate the reply in code. Summarization shortens under rules — constrain length, focus, and audience so every summary has the same usable shape. Wrap any of these in a loop and you have a batch pipeline that converts messy text into rows you can store and analyze.

Key Concepts

  • Extraction — pulling structured fields from unstructured text, best done with a tool/schema.
  • Controlled vocabulary — a fixed set of labels the model must choose from; prevents category drift.
  • temperature=0 — makes classification deterministic so the same input gives the same label.
  • Validation step — checking the model’s output against your allowed set in code, with a fallback.
  • Constrained summarization — controlling length, focus, and audience so summaries are uniform.
  • Batch loop — running one prompt over many items and collecting results into a list of dicts.

Why This Matters

Most production LLM work is data work, not chat. The moment you need to process more than a handful of items — tagging tickets, structuring reviews, summarizing documents — these three tasks and the reliability habits around them are what stand between a clean dataset and a pile of inconsistent guesses. Get them right and a language model slots neatly into the data pipelines you already build.


Next Steps

Continue to Lesson 6 - Evaluating and Improving Prompts

Measure prompt quality objectively and iterate toward output that works every time.

Back to Module Overview

Return to the Prompt Engineering module overview


Continue Building Your Skills

You can now make a language model extract, classify, and summarize — the workhorse tasks of real data pipelines — and run them reliably over a batch. Next you’ll learn to judge whether a prompt is actually good: how to evaluate output objectively and improve it on purpose, instead of changing words and hoping.