Lesson 3 - Few-Shot Prompting and Roles

Welcome to Few-Shot Prompting and Roles

In the last two lessons you learned to describe what you want — role, task, context, format, constraints. Most of the time, a clear description is enough. But sometimes you can’t quite put the output into words, or the model keeps drifting away from the exact shape you need. When describing fails, the move is to show instead of tell.

In this lesson you’ll see two of the most reliable techniques in prompting: few-shot prompting, where you hand the model a few worked examples before the real input, and role prompting, where a detailed persona changes how the model speaks and judges. Both are small additions that pay off on every call.

By the end of this lesson, you will be able to:

  • Explain the difference between zero-shot and few-shot prompting
  • Add few-shot examples using alternating user/assistant turns
  • Use few-shot to lock output format and handle edge cases
  • Give the model a role that changes its vocabulary and judgment

You’ll reuse the simple ask() helper from earlier, and add one more for multi-turn calls. Let’s begin.


Zero-Shot: Describing the Task

Everything you’ve written so far has been zero-shot — you describe the task and the model does it cold, with no examples. Here’s a classification task: sort a support message into one of three labels. We’ll send it with the cheap claude-haiku-4-5 model using the familiar helper:

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from your environment

def ask(prompt, system=None, max_tokens=300):
    kwargs = dict(model="claude-haiku-4-5", max_tokens=max_tokens,
                  messages=[{"role": "user", "content": prompt}])
    if system:
        kwargs["system"] = system
    return client.messages.create(**kwargs).content[0].text

prompt = """Classify this support message as one of: BUG, BILLING, FEATURE_REQUEST.

Message: The export button hasn't worked since yesterday's update."""

print(ask(prompt))
BUG

This message describes a functionality that stopped working after an update, which is
a bug report rather than a billing issue or feature request.

The label is correct — but look at what came with it. The model added a paragraph of justification you never asked for. If you’re reading this answer yourself, that’s fine. If you’re feeding it into a program that expects the single word BUG, it’s broken: your code now has to strip the explanation, and it has to do so for output whose shape you can’t predict.


The Same Task, Run Across Several Inputs

The real problem shows up when you run the same zero-shot prompt over a batch of messages. Watch the format wobble from one input to the next:

messages = [
    "I was charged twice for last month.",
    "Can you add a dark mode?",
    "The app crashes when I open the settings page.",
]

for m in messages:
    out = ask(f"""Classify this support message as one of: BUG, BILLING, FEATURE_REQUEST.

Message: {m}""", max_tokens=120)
    print(f"[{m[:40]:40}] -> {out!r}")
[I was charged twice for last month.     ] -> 'BILLING\n\nThis message is about a charge/payment issue, which falls under billing support.'
[Can you add a dark mode?                ] -> 'FEATURE_REQUEST\n\nThis message is asking for a new capability (dark mode) to be added to the product, which is a feature request rather than reporting a bug or addressing a billing issue.'
[The app crashes when I open the settings] -> "BUG\n\nThis message describes a crash, which is a defect in the application's current functionality rather than a billing issue or a request for new features."

Every label is right, but every format is different — one short explanation, one long one, each with its own wording. You could keep adding constraints (“reply with only the label, no explanation”), and often that works. But there’s a more direct way to nail the format down: stop describing it and start demonstrating it.


Few-Shot: Showing the Model What You Mean

Few-shot prompting means putting a few solved examples in front of the real input. Instead of one user turn, you build a short fake conversation: a user turn with an example input, an assistant turn with the ideal answer, repeated a few times, and then the real input as the final user turn. The model reads the pattern and continues it.

Building the example conversation

Because we now need more than one message, we’ll add a small chat() helper that takes a full list of messages:

def chat(messages, system=None, max_tokens=120):
    kwargs = dict(model="claude-haiku-4-5", max_tokens=max_tokens, messages=messages)
    if system:
        kwargs["system"] = system
    return client.messages.create(**kwargs).content[0].text

Now the examples. Each is a user turn (the input) paired with an assistant turn (exactly the output we want — a bare label, nothing else):

examples = [
    {"role": "user", "content": "I was double-charged on my invoice."},
    {"role": "assistant", "content": "BILLING"},
    {"role": "user", "content": "Please add support for CSV export."},
    {"role": "assistant", "content": "FEATURE_REQUEST"},
    {"role": "user", "content": "The page goes blank when I click Save."},
    {"role": "assistant", "content": "BUG"},
]

system = ("Classify each support message as exactly one of: "
          "BUG, BILLING, FEATURE_REQUEST. Reply with only the label.")

The assistant turns are the demonstration. The model treats them as its own prior answers and imitates their shape — a single uppercase label with no commentary.

Running it on the same batch

messages = [
    "The export button hasn't worked since yesterday's update.",
    "I was charged twice for last month.",
    "Can you add a dark mode?",
    "The app crashes when I open the settings page.",
]

for m in messages:
    out = chat(examples + [{"role": "user", "content": m}],
               system=system, max_tokens=20)
    print(f"[{m[:42]:42}] -> {out!r}")
[The export button hasn't worked since yest] -> 'BUG'
[I was charged twice for last month.       ] -> 'BILLING'
[Can you add a dark mode?                  ] -> 'FEATURE_REQUEST'
[The app crashes when I open the settings p] -> 'BUG'

Same model, same messages — but now every answer is a clean, bare label. The explanations are gone, the format is identical across all four inputs, and the output drops straight into a program with no cleanup. The examples did what a paragraph of instructions struggled to do: they pinned the shape exactly.

Examples are demonstrations, not instructions

A few-shot example shows the model a complete input-to-output mapping. It’s often more reliable than describing the same rule in prose, because the model can copy a concrete pattern more faithfully than it can interpret a sentence. When an instruction isn’t sticking, replace it with one or two examples.


When Few-Shot Earns Its Keep: Formats You Can’t Describe

Classification is one case. The technique shines brightest when the output format is something you can only show, not easily name. Suppose your team has a house style for dates — 15 Apr 2022 (Fri) — and you ask the model to reformat dates into it by description alone:

dates = ["March 3, 2024", "the 7th of January 2023", "12/25/2022", "2021-08-09"]

zp = ('Reformat this date into our house style. '
      'Reply with only the reformatted date.\n\nDate: ')

for d in dates:
    print(f"[{d:24}] -> {ask(zp + d, max_tokens=40)!r}")
[March 3, 2024           ] -> "I don't have access to your house style guide. Could you please provide the formatting style you'd like me to use (for example: DD/MM/YYYY, 3 March "
[the 7th of January 2023 ] -> '7 January 2023'
[12/25/2022              ] -> '25 December 2022'
[2021-08-09              ] -> '9 August 2021'

This is a mess. The model has no idea what “our house style” means — so on the first date it asks for clarification, and on the other three it invents three different formats, none of which match what we wanted. Describing a style the model has never seen simply can’t work.

Show it instead

Two examples settle the question completely:

ex = [
    {"role": "user", "content": "April 15, 2022"},
    {"role": "assistant", "content": "15 Apr 2022 (Fri)"},
    {"role": "user", "content": "2020-11-30"},
    {"role": "assistant", "content": "30 Nov 2020 (Mon)"},
]
system = ("Reformat each date into the house style shown by the examples. "
          "Reply with only the reformatted date.")

for d in dates:
    out = chat(ex + [{"role": "user", "content": d}], system=system, max_tokens=30)
    print(f"[{d:24}] -> {out!r}")
[March 3, 2024           ] -> '3 Mar 2024 (Sun)'
[the 7th of January 2023 ] -> '7 Jan 2023 (Sat)'
[12/25/2022              ] -> '25 Dec 2022 (Sun)'
[2021-08-09              ] -> '09 Aug 2021 (Mon)'

Every date now comes back in the exact DD Mon YYYY (Day) shape — and the model even computed the correct weekday for each one (March 3, 2024 really was a Sunday). It accepted four wildly different input formats and normalized them all. Nothing in a prose description could have conveyed this style as precisely as two worked examples did.


How Many Examples, and Which Ones

Few-shot doesn’t mean many-shot. Two or three good examples usually do the job; past about five you’re spending tokens for diminishing returns. What matters far more than count is that your examples are representative:

  • Cover the range. If you have three labels, show one example of each. If inputs vary in length or phrasing, let your examples vary too.
  • Include the edge case that trips the model up. If the model keeps misclassifying refund requests as bugs, add a refund example with the correct label. One targeted example fixes a whole class of mistakes.
  • Make every example flawless. The model imitates exactly what you show, mistakes included. A sloppy example teaches sloppy output.
  • Keep the format identical across examples. Consistency between your examples is what produces consistency in the answer.

Think of the examples as the most concentrated part of your prompt: a few lines that demonstrate the rule beat a paragraph that explains it.


Role Prompting: Changing Who Is Answering

Few-shot changes the shape of the answer. A role changes its substance — the vocabulary the model reaches for and the judgments it’s willing to make. You set a role through the system prompt, and a detailed persona shifts the answer in ways a one-line task description won’t.

Here’s a question with no persona at all:

q = ("A user asks: 'Is it safe to take ibuprofen with my morning coffee?' "
     "Answer in 2-3 sentences.")

print(ask(q, max_tokens=200))
Yes, it's generally safe to take ibuprofen with coffee. However, both can irritate your
stomach, so taking ibuprofen with food or a full glass of water is recommended to minimize
gastrointestinal discomfort. If you have a history of stomach issues or ulcers, consult
your doctor first.

A reasonable, friendly answer. Now give the model a specific professional role and ask the identical question:

persona = (
    "You are a cautious clinical pharmacist. You never give definitive medical "
    "advice, you flag interactions and risks, and you always recommend confirming "
    "with a doctor or pharmacist who knows the patient's history."
)

print(ask(q, system=persona, max_tokens=200))
Generally, taking ibuprofen with coffee is considered safe for most people, though caffeine
may slightly increase stomach irritation risk—especially on an empty stomach. **However, I'd
recommend taking ibuprofen with food and confirming with your pharmacist or doctor**,
particularly if you have a history of ulcers, take blood thinners, or have other medical
conditions that could affect safety.

Same model, same question. But the persona reshaped the judgment: the no-persona reply opened with a flat “Yes,” while the pharmacist hedged (“considered safe for most people”), surfaced extra risks the first answer skipped entirely (blood thinners, other interactions), and foregrounded the confirm with a professional caveat instead of tucking it at the end. The role didn’t just change the tone — it changed what the model judged worth saying.

A role is the model’s frame of reference

The more specific the role, the more it shifts the answer. “You are a pharmacist” nudges the vocabulary; “You are a cautious clinical pharmacist who always flags interactions and defers to the patient’s own doctor” changes the actual judgments. Spend a sentence describing how the persona thinks, not just its job title.


Practice Exercises

Exercise 1: Few-shot a sentiment scale

Build a few-shot prompt that classifies short product reviews on a five-point scale: 1 (very negative) through 5 (very positive). Provide one example near each end and one in the middle, then run it on three new reviews. Confirm the output is always a single digit.

Hint

Use the chat() helper with alternating user/assistant turns. Make each assistant turn just the digit ("1", "3", "5") so the model imitates that exact shape. A system line like “Reply with only the number” reinforces it.

Exercise 2: Fix an edge case with one example

Start from the support-ticket classifier in this lesson and feed it a tricky message like "I want a refund because the app keeps crashing." — which mixes BILLING and BUG. Note which label it picks, then add one example that demonstrates your preferred rule (say, crashes win → BUG) and rerun to see the behavior change.

Hint

Append a new pair to the examples list: a user turn with a refund-plus-crash message and an assistant turn with the label you want. One well-chosen example often fixes the whole category of mixed cases.

Exercise 3: Contrast two roles on one question

Take a single open-ended question — for example, “Should I refactor this function or leave it?” — and answer it three ways: with no role, as “a pragmatic senior engineer on a tight deadline,” and as “a meticulous reviewer who prizes long-term maintainability.” Compare how the recommendation itself changes, not just the wording.

Hint

Keep the user prompt byte-for-byte identical across all three calls; change only the system string. The point is to see that the role changes the model’s judgment, not merely its tone.


Summary

When describing a task isn’t enough, show the model what you want. Few-shot prompting supplies a few worked examples as alternating user/assistant turns before the real input, and the model imitates the pattern — locking output format and handling edge cases far more reliably than prose instructions can, especially for formats you can only demonstrate. Role prompting sets a persona in the system prompt that shifts not just the model’s tone but its vocabulary and judgment. Both are small, cheap additions that improve nearly every call.

Key Concepts

  • Zero-shot — describing a task with no examples; fine for simple tasks, but format can drift.
  • Few-shot — supplying a few input/output examples so the model copies the pattern.
  • Alternating turns — examples are encoded as user (input) / assistant (ideal output) pairs before the real input.
  • Representative examples — cover the range, include the tricky edge case, and keep every example flawless and consistently formatted.
  • Role prompting — a detailed system persona that changes the model’s vocabulary and judgment, not just its tone.

Why This Matters

Few-shot prompting and roles are the techniques you’ll reach for constantly in production. The moment you need output in a precise shape a pipeline can consume — or answers framed by a particular kind of expert — examples and personas are how you get there, without a bigger model or a longer prompt. They’re the bridge from “the model usually does what I want” to “the model does exactly what I want, every time.”


Next Steps

Continue to Lesson 4 - Structured Outputs

Stop parsing free-form prose — get clean, reliable JSON you can drop straight into code.

Back to Module Overview

Return to the Prompt Engineering module overview


Continue Building Your Skills

You can now teach a model by example and put it in the right frame of mind. Next you’ll take format control one decisive step further: instead of nudging the model toward a shape with examples, you’ll constrain it to emit valid, schema-checked JSON — the foundation of every reliable LLM-powered feature you’ll build.