Lesson 3 - ReAct: Reasoning and Acting

Welcome to ReAct: Reasoning and Acting

In Lesson 2 you taught the agent to plan before it acts — decompose a hard goal into an ordered list of steps, then carry them out. That plan-then-execute approach is powerful, but it commits to the whole route up front, before the agent has seen a single result. What if the first step’s result changes everything? A plan made in advance can’t react to what it learns along the way.

This lesson is the opposite move. ReAct — short for Reasoning and Acting — interleaves a reasoning step with each action, so the agent decides its next action only after seeing the last result. Think, act, observe, think again. The best part, which you already glimpsed in Lesson 1: ReAct needs no new engine. It’s the exact agent loop you built in Module 2, plus a system prompt that asks the model to think before each tool call, plus a little code to surface the thinking as a trace. In this lesson you’ll build that on top of your loop and watch Atlas — our travel agent — reason its way to a recommendation.

By the end of this lesson, you will be able to:

  • Explain how ReAct interleaves reasoning and acting, and how it differs from decomposition
  • Identify the “thought,” “action,” and “observation” inside the agent loop you already have
  • Write a run_react function that surfaces the reason-act-observe cycle as a trace
  • Recognize ReAct’s trade-off — flexible, but able to wander — and when to combine it with decomposition

Let’s start with the contrast that makes ReAct worth its own pattern.


Plan-All-Upfront vs. Decide-As-You-Go

Decomposition (Lesson 2) is plan-then-execute: the agent writes the whole list of steps first, then runs them in order. That’s exactly right when the steps are independent and predictable — “check the weather, convert the budget, draft the itinerary” doesn’t change shape based on what the weather turns out to be.

But many tasks do change shape. The right second move depends on what the first move returned. ReAct is built for that: it decides the next action only after seeing the last observation. That single difference is the whole point — ReAct adapts. If Atlas checks Kyoto’s weather and sees 16°C, clear, its next thought is “great for being outside, recommend it.” If the very same check had returned 3°C, snow, the next thought — and the next action — would be different: maybe suggest an indoor day, or check another city. Plan-then-execute can’t do that pivot, because it chose its steps before any result existed.

So the two patterns sit at opposite ends of a spectrum: decomposition plans everything in advance; ReAct plans one step at a time, using each observation to inform the next. Neither is “better” everywhere — they’re tools for different shapes of problem, and later you’ll see they combine.


The Reason-Act-Observe Cycle

The mechanics of ReAct are a tight little loop you’ve already met by name in Lesson 1:

The ReAct cycle. A Thought box ('To plan an outdoor day, I should check the weather') leads to an Action box (get_weather('Kyoto')), which leads to an Observation box ('16°C, clear'); a dashed arrow feeds the observation back so the result informs the next thought, and the cycle repeats until the agent has enough to give a final answer.
ReAct interleaves reasoning and acting: the agent emits a thought, takes an action, observes the result, then thinks again — each action is a deliberate choice informed by the last observation.

Here is the insight that makes this almost free to build. Recall that an assistant response from the agent loop can carry both a text block and a tool-use block in the same turn. In ReAct terms:

  • The text block is the Thought — the agent’s one-line reason for what it’s about to do.
  • The tool-use block is the Action — the actual tool call, with its arguments.
  • The tool result you feed back is the Observation — what the world returned.

Read that again, because it’s the entire trick: the loop already produces a thought and an action together, and already feeds the observation back as the next user message. So you don’t build a ReAct engine. You take the loop you have, ask the model (via the system prompt) to state a reason before each call, and surface the thought/action/observation as you go. No new machinery — just more visibility into the loop you already wrote.


Building run_react

Let’s make it concrete with Atlas and a weather tool. First, the tool and the system prompt that drives the behavior. The one load-bearing line is the system prompt: it tells the model to think before each tool call, which is what turns a bare loop into a ReAct trace.

WEATHER = {"Kyoto": "16°C, clear", "Sapporo": "3°C, snow"}
def get_weather(city): return WEATHER.get(city, "unknown")

tools = [{"name": "get_weather", "description": "Current weather for a city",
          "input_schema": {"type": "object",
                           "properties": {"city": {"type": "string"}},
                           "required": ["city"]}}]

system = "Think before each tool call: state a one-line reason, then call the tool."

Now the orchestration. Look closely and you’ll see it is the agent loop from Module 2 — call the model, append the turn, check the stop reason, run tools, feed results back — with two additions: it pulls the text block out as the Thought and prints it, and it records every thought, action, and observation into a trace so you can inspect the reasoning afterward. That’s all ReAct adds.

def run_react(client, user_message, *, system, tools, tool_functions,
              model="claude-haiku-4-5", max_steps=6):
    messages = [{"role": "user", "content": user_message}]
    trace = []
    for step in range(1, max_steps + 1):
        response = client.messages.create(
            model=model, max_tokens=512, system=system, tools=tools, messages=messages)
        messages.append({"role": "assistant", "content": response.content})

        # The text block is the agent's "Thought"; print it.
        thought = "".join(b.text for b in response.content if b.type == "text").strip()
        if thought:
            trace.append(("thought", thought))
            print(f"Thought: {thought}")

        if response.stop_reason != "tool_use":
            return {"answer": thought, "trace": trace, "steps": step}

        tool_results = []
        for block in response.content:
            if block.type != "tool_use":
                continue
            trace.append(("action", (block.name, block.input)))
            print(f"Action: {block.name}({block.input})")
            fn = tool_functions[block.name]
            observation = str(fn(**block.input))
            trace.append(("observation", observation))
            print(f"Observation: {observation}")
            tool_results.append({"type": "tool_result", "tool_use_id": block.id,
                                 "content": observation})
        messages.append({"role": "user", "content": tool_results})
    return {"answer": "Stopped: step limit.", "trace": trace, "steps": max_steps}

If run_agent from Module 2 feels familiar here, that’s the point. The skeleton is identical: a for loop bounded by max_steps, a messages.create call, messages.append(...) of the assistant turn, a stop-reason check that exits on any non-tool_use response, and one user message carrying the tool results. The only new code is the three trace.append / print lines that name what’s already happening — the thought, the action, the observation. ReAct is a lens on the loop, not a replacement for it.


Reading the Trace

Run Atlas against “Recommend whether tomorrow is a good day to spend outdoors in Kyoto,” and the loop prints its own reasoning as it goes. Here is the verified trace:

Thought: To recommend an outdoor day I should check Kyoto's weather first.
Action: get_weather({'city': 'Kyoto'})
Observation: 16°C, clear
Thought: 16°C and clear is great for being outside, so I can recommend it.

Read it as the cycle from the figure. The first Thought is the text block from Claude’s first turn — the agent stating why it’s about to act. The Action is the tool-use block from that same turn: get_weather with Kyoto. The Observation is what your code fed back: 16°C, clear. Then the loop comes around, the model sees that observation, and produces a second Thought — this time with no tool call, because it now has what it needs. That final turn’s stop reason is not tool_use, so run_react returns, and the last thought is the answer.

Now picture the adaptivity that decomposition lacks. If the observation had been 3°C, snow instead, the model would not have reached “great for being outside.” Its next thought would react to the cold — perhaps recommending an indoor day, or checking a warmer city. The agent never committed to a conclusion in advance; it reasoned from the observation it actually got. That’s the defining behavior of ReAct, and it falls out naturally because the next thought is generated only after the last observation is in the messages list.

ReAct is a system prompt plus a trace, not a new loop

It’s worth saying plainly: run_react and run_agent are the same control flow. What makes one “ReAct” is the system prompt asking the model to state a reason before each tool call, and the trace that surfaces the thought/action/observation triple the loop was already producing. The text block was always the thought and the tool-use block was always the action — ReAct just names them and prints them. When you internalize that, you stop looking for a “ReAct framework” and start seeing it for what it is: good prompting on the loop you already have.


The Trade-Off: Flexible, but It Can Wander

ReAct’s strength is its weakness. Because it decides one step at a time, it adapts beautifully to surprises — but it also has no overall plan. With nothing committing it to a route, a ReAct agent can meander: re-checking things it already knows, chasing a tangent an observation suggested, or taking a long path to an answer a plan would have reached directly. Adaptivity and aimlessness are two sides of “decide as you go.”

That’s exactly why real-world agents often use both patterns together. Decompose the hard goal into a short list of steps first (Lesson 2’s plan-then-execute), giving the agent a backbone it won’t wander off — then run ReAct within each step, so the agent can still adapt to what it observes while staying on the overall route. Plan for direction; ReAct for adaptation. You’ll put exactly this combination to work in the guided project at the end of the module: decompose the trip into steps, and reason-act-observe your way through each one.


Practice Exercises

Exercise 1: What if it had snowed?

In the verified trace, get_weather('Kyoto') returned 16°C, clear, and the next thought was “great for being outside, so I can recommend it.” Suppose the observation had instead been 3°C, snow. What might the next thought and action be — and what does that show about ReAct?

Hint

The next thought would react to the cold rather than recommend the outdoors — e.g. “3°C and snowing is poor for being outside; I should suggest an indoor day” (possibly followed by a different action, like checking a warmer city). The point is adaptivity: ReAct chooses the next move from the observation it actually got, so the same starting question produces a different path depending on the result. A plan made up front couldn’t make that pivot.

Exercise 2: Why no new loop machinery?

ReAct adds a “thought” before every action — yet run_react reuses the agent loop almost verbatim, adding no new control flow. Why does ReAct need no new engine?

Hint

Because the loop already produces everything ReAct needs. An assistant turn carries both a text block (the thought) and a tool-use block (the action), and the loop already feeds the tool result back as the next observation. So you get reason-act-observe for free by (a) prompting the model in the system prompt to state a reason before each call, and (b) surfacing the thought/action/observation as a trace. The only additions to run_agent are the print/trace lines that name what was already there.

Exercise 3: Decompose, ReAct, or both?

For each, say whether you’d reach first for decomposition, ReAct, or a combination, and why: (a) a fixed checklist whose steps never change; (b) a task where each step’s next move depends heavily on what the last step returned; (c) a long, multi-stage trip-planning goal you want to finish reliably without wandering.

Hint

(a) Decomposition — predictable, independent steps suit plan-then-execute; there’s nothing to adapt to. (b) ReAct — when the next move depends on the last result, you need to decide-as-you-go. (c) Both — decompose the goal into a short ordered plan so the agent has a backbone and won’t wander, then run ReAct within each step so it can still adapt to what it observes. Plan for direction, ReAct for adaptation.


Summary

ReAct interleaves reasoning with acting: the agent emits a thought (one-line reason), takes an action (a tool call), sees an observation (the tool result), and thinks again — deciding the next action only after seeing the last result. That makes it adaptive, the very thing plan-then-execute decomposition (Lesson 2) lacks: change the observation and the next thought and action change with it. Crucially, ReAct needs no new machinery — it’s your existing agent loop, where the text block is the thought, the tool-use block is the action, and the tool result is the observation. run_react is run_agent plus a system prompt asking the model to reason before each call, plus a trace that surfaces the cycle. The trade-off: ReAct is flexible but can wander without an overall plan, which is why production agents often decompose first, then ReAct within each step — direction from the plan, adaptation from the cycle.

Key Concepts

  • ReAct — reason interleaved with act; decide the next action after seeing the last observation.
  • Adaptivity — ReAct pivots on results; plan-then-execute (Lesson 2) commits up front and can’t.
  • No new engine — text block = thought, tool-use block = action, tool result = observation; it’s the loop you have.
  • The change — a “think first” system prompt plus a trace; that’s all run_react adds to run_agent.
  • Trade-off and combination — flexible but can wander; decompose for direction, ReAct within each step for adaptation.

Why This Matters

ReAct is the pattern most people picture when they hear “agent that reasons” — and the most empowering realization here is that you already built its engine. Once you see that a thought is just a text block and an action is just a tool-use block, the whole pattern stops being a framework to learn and becomes a prompting move you control. That clarity also tells you when to use it: reach for ReAct when the next step truly depends on the last result, lean on decomposition when the route is predictable, and combine them when a task is both long and surprising. Next you’ll add the third planning pattern — reflection — teaching the agent to critique and revise its own output before it calls the job done.


Next Steps

Continue to Lesson 4 - Reflection and Self-Correction

Teach the agent to critique its own output and revise it before finishing — planning after acting.

Back to Module Overview

Return to the Planning and Reasoning module overview


Continue Building Your Skills

You now have ReAct working on top of your agent loop — a system prompt that asks for a thought before each action, and a trace that surfaces the reason-act-observe cycle so every tool call is a deliberate, result-informed choice. You’ve also seen its trade-off: adaptive, but able to wander without a plan. Next you’ll build the third and final pattern of this module — reflection — where the agent steps back, critiques what it produced, and corrects itself before declaring the task done.