Lesson 4 - Planning and Multi-Step Tasks

Welcome to Planning and Multi-Step Tasks

In Lesson 1 you watched an agent chain two tools, and in Lesson 2 you built the loop that drives it. Those examples needed a short, obvious sequence. Most real work isn’t like that — a single goal can hide four, five, or a dozen smaller steps, and the right order isn’t always clear up front. An agent handles this by planning: it reads the goal, works out what it needs, and gathers each piece with a tool before producing an answer. You don’t write that plan; you shape it. The main lever you have is the system prompt — a few sentences that tell the agent how to think, when to use tools, and when not to guess.

This lesson shows you how to write that prompt, give the agent a goal that genuinely needs several steps, and then read the trajectory it produces — counting the steps and seeing how each observation feeds the next.

By the end of this lesson, you will be able to:

  • Write a system prompt that encourages an agent to plan and use tools instead of guessing
  • Give an agent a multi-step goal that forces a real chain of tool calls
  • Read a multi-step trajectory: count the steps and trace how each result informs the next
  • Explain the practical limits of agent planning — step caps, cost, and wrong paths

You’ll build on the agent loop from Lesson 2. Let’s begin.


A System Prompt That Encourages Planning

The system prompt is the agent’s standing instructions — it sits in the system= parameter of every client.messages.create call and shapes how the model approaches the whole task. For a single tool round-trip you can often skip it. For multi-step work it earns its keep, because it’s where you tell the model two things it won’t reliably do on its own: plan before acting, and use tools rather than guessing.

Here is the prompt we’ll use for the rest of the lesson:

PLANNING_SYSTEM = (
    "You are a careful assistant that solves tasks step by step. "
    "When a task needs information you don't have, plan the steps first, "
    "then use the available tools to get real values instead of guessing. "
    "Never invent prices or do arithmetic in your head when a calculator tool "
    "is available—always call the calculator for any computation."
)

Look at what each clause does. “Solves tasks step by step” and “plan the steps first” push the model to lay out a sequence instead of lunging at an answer. “Use the available tools to get real values instead of guessing” tells it that when it lacks a fact, the move is a tool call, not a confident-sounding fabrication. And the last sentence is the strongest: it names a specific failure — doing arithmetic in its head — and forbids it. A language model will happily multiply numbers for you, but it can be subtly wrong, so for anything that must be correct we route it through a real calculator. The system prompt is how you make that the agent’s default behavior rather than something you hope for.


A Multi-Step Goal and the Tools It Needs

A good multi-step example needs a goal that cannot be answered in one tool call, plus tools that each do one small thing. We’ll give the agent two tools — a get_price lookup and a calculator — and a goal that requires several of each.

The tools are plain dictionaries with a name, a description, and an input_schema, exactly as in earlier lessons:

tools = [
    {
        "name": "get_price",
        "description": "Look up the monthly price in US dollars for a subscription plan. "
                       "Valid plan names: 'starter', 'pro', 'team'.",
        "input_schema": {
            "type": "object",
            "properties": {
                "plan": {"type": "string", "description": "The plan name: starter, pro, or team."}
            },
            "required": ["plan"],
        },
    },
    {
        "name": "calculator",
        "description": "Evaluate a single arithmetic expression and return the number. "
                       "Example expression: '8 * 12 * 3'.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "A Python arithmetic expression, e.g. '12 * 3'."}
            },
            "required": ["expression"],
        },
    },
]

The implementations are tiny. get_price reads from a small dictionary; calculator evaluates the expression with a locked-down eval (the empty {"__builtins__": {}} blocks access to dangerous functions), and both return a string, because tool_result content must be a string:

PRICES = {"starter": 0, "pro": 8, "team": 12}

def run_tool(name, args):
    if name == "get_price":
        plan = args["plan"].lower().strip()
        return str(PRICES.get(plan, "unknown plan"))
    if name == "calculator":
        return str(eval(args["expression"], {"__builtins__": {}}, {}))
    return "unknown tool"

Now the goal. We want one sentence that forces the agent to retrieve two prices and then compute with them:

goal = ("Compare the yearly cost of the Pro and Team plans for a 3-person team, "
        "and tell me the difference in dollars.")

Nothing the agent knows up front answers this. It has to look up the Pro price, look up the Team price, turn monthly prices into yearly costs, and subtract. That’s a real chain — and the agent has to plan it.


Reading the Real Trajectory

We drive this with the same loop from Lesson 2: call the model, append the assistant content, run any tools and feed their results back keyed to the tool_use id, and repeat while stop_reason is tool_use — with a max_steps cap so a runaway agent can’t loop forever:

def run_agent(goal, max_steps=10):
    messages = [{"role": "user", "content": goal}]
    for step in range(1, max_steps + 1):
        resp = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=500,
            system=PLANNING_SYSTEM,
            tools=tools,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})
        if resp.stop_reason != "tool_use":
            return
        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                result = run_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })
        messages.append({"role": "user", "content": tool_results})

Here is the real trajectory it produced (get_price returns the monthly dollar figure; calculator returns the computed number):

--- step 1: stop_reason=tool_use ---
  text: I'll help you compare the yearly costs. Let me first get the monthly prices for both plans.
  tool_use: get_price({'plan': 'pro'})
  tool_use: get_price({'plan': 'team'})
  -> result: 8
  -> result: 12
--- step 2: stop_reason=tool_use ---
  text: Now let me calculate the yearly costs and the difference:
  tool_use: calculator({'expression': '8 * 12 * 3'})
  tool_use: calculator({'expression': '12 * 12'})
  tool_use: calculator({'expression': '(8 * 12 * 3) - (12 * 12)'})
  -> result: 288
  -> result: 144
  -> result: 144
--- step 3: stop_reason=end_turn ---
  text: Pro Plan (3 people): $8/month × 12 months × 3 people = $288/year
        Team Plan: $12/month × 12 months = $144/year
        Difference: $288 - $144 = $144/year
        The Team plan is $144 cheaper per year for a 3-person team.

Count the steps: three model turns, five tool calls. Nobody told the agent to fetch prices before computing, or which expressions to evaluate. It planned the chain itself. Trace how each observation feeds the next: the 8 and 12 from step 1’s lookups reappear inside step 2’s calculator expressions, and the 288 and 144 it computes there become the numbers in step 3’s final answer. That carry-forward — result of one step becoming input to the next — is the whole point of a multi-step agent.

Notice too that step 1 and step 2 each fired several tools at once (the model can request multiple tool_use blocks in one turn, and our loop runs them all and returns every result together). That’s the agent batching independent work — fetching both prices in one go rather than one per turn.

The system prompt shapes how the agent plans

Every calculation in that trajectory went through the calculator tool — even 8 * 12 * 3, which the model could easily have done in its head. That isn’t an accident; it’s the last sentence of PLANNING_SYSTEM doing its job. Change the system prompt and the trajectory changes with it: drop the “always call the calculator” instruction and the model will often just write the arithmetic itself, which is faster but riskier. The system prompt is your steering wheel for how the agent plans, not just what it can do.

One honest detail worth reading carefully: the agent treated the Team plan as a single flat $12/month license (12 * 12), not $12 per person. That’s a reasonable interpretation of an ambiguous goal — but it’s a reminder that an agent plans against the goal as written. If you meant per-person pricing for Team too, you’d need to say so. Agents follow your specification, including its gaps.


The Practical Limits of Planning

Planning is powerful, but it isn’t free, and it isn’t always right. Three limits are worth holding in mind whenever you build a multi-step agent.

Step caps exist for a reason. Our loop had max_steps=10. The trajectory finished in three, so the cap never mattered — but it’s there as a safety net. An agent that misreads a tool result, or gets a tool error it can’t recover from, can keep trying indefinitely. The cap turns “runs forever and burns money” into “stops after N steps so you can investigate.” Always set one.

More steps cost more. Each step is a full model call, and the conversation grows every turn — you resend the whole message history, including all prior tool results, on every request. A three-step task sends the goal once but reprocesses the accumulating context three times. Token usage (and latency, and dollars) climbs with the length of the trajectory. A task that takes ten steps doesn’t cost ten times one step; it can cost more, because later steps carry more context. This is why you reach for an agent when the flexibility is worth it, not for tasks a fixed workflow handles.

Agents can take wrong paths. The model decides the plan, which means the model can plan badly. It might call the wrong tool, pass a malformed argument, compute the wrong expression, or — as we saw — interpret an ambiguous goal differently than you intended. Nothing guarantees the trajectory is correct; it’s just the path the model chose this time. Good tool descriptions, a clear system prompt, an unambiguous goal, and a step cap all reduce the odds of a wrong path, but they don’t eliminate it. Reading the trajectory — exactly what you just did — is how you catch it.


Practice Exercises

Exercise 1: Tighten the system prompt

Suppose your agent keeps doing arithmetic in its head and occasionally gets it wrong. Which sentence of PLANNING_SYSTEM is meant to prevent that, and how would you make it even more forceful?

Hint

The relevant sentence is the last one: “Never invent prices or do arithmetic in your head when a calculator tool is available—always call the calculator for any computation.” To strengthen it, name the consequence and remove wiggle room — e.g. “Any answer containing a number you did not get from a tool result is invalid. Every total, difference, or product must come from a calculator call.” The system prompt is the cheapest place to fix planning behavior.

Exercise 2: Count the steps and the carry-forward

In the trajectory above, the agent made five tool calls across three model turns. Identify which values produced in one step were used as inputs in a later step.

Hint

Step 1’s get_price calls returned 8 (Pro) and 12 (Team). Both appear in step 2’s calculator expressions (8 * 12 * 3 and 12 * 12). Step 2’s results 288 and 144 then appear in step 3’s final answer. Each observation became the input to the next action — that chaining is what makes it a plan, not three unrelated calls.

Exercise 3: Force a longer chain

How would you change the goal so the agent has to retrieve three prices instead of two? What does that tell you about the relationship between the goal and the number of steps?

Hint

Ask it to compare all three plans — for example, “Rank Starter, Pro, and Team by yearly cost for a 3-person team.” Now the agent must call get_price three times before it can compute. The length of the trajectory is driven by the goal: a goal that references more facts forces more lookups, which forces more steps — and more steps means more cost.


Summary

Real tasks rarely fit in a single tool call. An agent handles a multi-step goal by planning — working out the steps it needs and gathering each piece with a tool before answering — and you steer that planning primarily through the system prompt, which tells the agent to plan first and to use tools instead of guessing. You gave the agent a goal that needed two price lookups and several calculations, watched it produce a real three-step, five-call trajectory, and traced how the result of each step fed the next. You also saw the limits: step caps that stop runaway loops, the rising cost of longer trajectories, and the fact that a model-chosen plan can still take a wrong path — which is exactly why reading the trajectory matters.

Key Concepts

  • Planning — the agent deciding, at runtime, what steps a goal needs and in what order.
  • System prompt — the standing instructions in system= that shape how the agent plans and whether it uses tools.
  • Multi-step trajectory — the full chain of model turns and tool calls an agent takes to reach an answer.
  • Carry-forward — using the result of one step as the input to a later step; the hallmark of a real plan.
  • Step cap — a max_steps limit that bounds the loop so a misbehaving agent can’t run forever.

Why This Matters

Every serious agent — a coding assistant, a research tool, an analyst that pulls data and computes over it — succeeds or fails on its ability to plan a multi-step path and stay on it. Knowing how to encourage good planning with a system prompt, how to read the resulting trajectory, and where the practical limits lie (cost, step caps, wrong paths) is what separates an agent you can trust in production from a demo that works once. These are the skills you’ll lean on in the guided project next.


Next Steps

Continue to Lesson 5 - Guided Project: Research Assistant

Put planning, tools, and the agent loop together to build a research assistant that works through a real multi-step task.

Back to Module Overview

Return to the Building AI Agents module overview


Continue Building Your Skills

You’ve seen how a few sentences of system prompt can turn a model into an agent that plans a multi-step path, and how to read the trajectory it produces. Next you’ll combine everything from this module — the loop, tools, memory, and planning — into a guided project: a research assistant that works through a genuine multi-step task from start to finish.