Lesson 3 - Retrieval as a Tool

Welcome to Retrieval as a Tool

In Lesson 2 you built a knowledge base: chunk documents into passages, embed them, and search them by similarity. Now you have to decide when the agent retrieves. There are two ways, and the difference matters. The first is to always retrieve before answering — every question runs through search first, then the model answers from whatever came back. That’s a fixed pipeline, and it’s exactly what you’ll build in Lesson 4. The second is to hand the agent a search_knowledge tool and let it decide, mid-loop, whether and when to look something up. That’s agentic RAG, and it’s what this lesson is about.

The shift is small in code but large in capability. When retrieval is a tool, it becomes just another action in the agent loop you already wrote — the model can skip it, call it once, or call it several times with refined queries, all on its own judgment. This lesson builds the tool, plugs it into the canonical run_agent loop, and walks a verified trace where the agent chooses to retrieve and cites the source it found.

By the end of this lesson, you will be able to:

  • Distinguish always-retrieve (a fixed pipeline) from agentic RAG (retrieval as a tool the agent triggers)
  • Build a search_knowledge tool that wraps kb.search and returns formatted passages
  • Wire that tool into the canonical run_agent loop and read a real retrieval trace
  • Explain why making retrieval a tool lets the agent skip, repeat, or combine lookups

Let’s give the agent a way to look things up on its own.


Two Ways to Do RAG

A retrieval-augmented agent has to answer one question: who decides when to retrieve? You have two choices.

  • Always retrieve (a fixed pipeline). Before the model ever sees the question, your code runs kb.search, stuffs the top passages into the prompt, and asks the model to answer from them. Retrieval is mandatory and happens exactly once. Simple, predictable, and great when every question genuinely needs the knowledge base. That’s Lesson 4.
  • Retrieval as a tool (agentic RAG). You give the model a search_knowledge tool and let it decide. If the question is “what’s 2+2,” it just answers — no wasted lookup. If it’s “when should I visit Arashiyama,” it calls the tool, reads the passages, and answers from them. The model chooses, mid-loop, the same way it chooses any other tool.

This lesson teaches the second. The whole trick is that you already have the machinery: a knowledge base with a search method (Lesson 2) and an agent loop that runs tools (Module 2). Retrieval becomes a tool, and the loop does the rest.


Building the search_knowledge Tool

A tool is two things: a Python function that does the work, and a schema that tells the model the tool exists and when to use it. Here’s the function — it wraps kb.search(query, k=2) and formats the hits into a string the model can read:

def search_knowledge(query):
    hits = kb.search(query, k=2)
    if not hits:
        return "No relevant passages found."
    return "\n".join(f"[{src}] {txt}" for src, txt, _ in hits)

That’s the entire tool. It takes a query, searches the knowledge base for the two best passages, and returns them as text — one line per passage, each tagged with its source like [kyoto-guide] Arashiyama and the bamboo grove are a top autumn destination.... If nothing matches, it returns "No relevant passages found." so the model gets an honest empty result instead of a crash. The source tag in front of each passage is deliberate: it’s what lets the model cite where a fact came from in its final answer.

Now the schema — the dictionary that describes the tool to the model:

tools = [{"name": "search_knowledge",
          "description": "Search the travel knowledge base for relevant passages. "
                         "Use before answering questions about destinations.",
          "input_schema": {"type": "object",
                           "properties": {"query": {"type": "string"}},
                           "required": ["query"]}}]

The name matches the function. The input_schema says the tool takes one string argument, query. But the load-bearing field is the description — it’s how the model knows when to reach for this tool. “Search the travel knowledge base… Use before answering questions about destinations” tells the model the trigger condition in plain language. A vague description (“searches stuff”) leaves the model guessing; a precise one (“use before answering questions about destinations”) makes it retrieve at the right moments and skip retrieval when the question doesn’t need it. The description is your only lever on tool-selection behavior, so write it like an instruction.

Agentic RAG with retrieval as a tool. An Agent (Claude) box at the top asks 'do I need to look this up?' and, when the answer is yes, emits a tool_use block (search_knowledge with a query) flowing down to a search_knowledge() tool. The tool embeds the query, ranks passages by similarity, and reads from a Knowledge base cylinder labeled 'documents, chunked and embedded'. A tool_result carrying the retrieved passages (each tagged with its source like [kyoto-guide]) flows back up to the agent, which produces a final answer grounded in those passages with the source cited inline. A note says: retrieval is just another action in the loop, so the agent can skip it, repeat it, or combine it with other tools.
Retrieval as a tool: the agent decides whether to call search_knowledge, the tool embeds the query and reads from the knowledge base, and the retrieved passages flow back as a tool_result the agent answers from — with the source cited.

Wiring It Into the Loop

The tool plugs straight into the canonical run_agent loop from Module 2 — the same function, unchanged. It calls the model with the tools, and while the stop reason is tool_use, it runs the requested tool and feeds the result back:

def run_agent(client, user_message, *, system, tools, tool_functions,
              model="claude-haiku-4-5", max_steps=6):
    messages = [{"role": "user", "content": user_message}]
    for step in range(1, max_steps + 1):
        response = client.messages.create(
            model=model, max_tokens=512, system=system, tools=tools, messages=messages)
        messages.append({"role": "assistant", "content": response.content})
        if response.stop_reason != "tool_use":
            final = "".join(b.text for b in response.content if b.type == "text")
            return {"answer": final, "steps": step, "messages": messages}
        results = []
        for block in response.content:
            if block.type != "tool_use":
                continue
            fn = tool_functions[block.name]
            results.append({"type": "tool_result", "tool_use_id": block.id,
                            "content": str(fn(**block.input))})
        messages.append({"role": "user", "content": results})
    return {"answer": "Stopped: step limit.", "steps": max_steps, "messages": messages}

Nothing here knows it’s doing retrieval — search_knowledge is just an entry in tool_functions, dispatched by name like any other tool. You call it like this:

out = run_agent(client, "When should I visit Arashiyama?",
                system="Search the knowledge base before answering; cite the source.",
                tools=tools, tool_functions={"search_knowledge": search_knowledge})

Now watch what happens. The question is “When should I visit Arashiyama?” Here is the verified trace, step by step:

  1. Step 1 — the agent thinks, then acts. Claude’s first turn contains a text block — “Let me check the knowledge base for Arashiyama timing.” — and a tool_use block: search_knowledge({"query": "Arashiyama bamboo grove best time"}). The stop reason is tool_use, so the loop runs the tool. Notice the model rewrote the query: the user said “Arashiyama,” the model searched “Arashiyama bamboo grove best time,” a richer query for the knowledge base.
  2. The tool retrieves real passages. search_knowledge runs kb.search and returns the actual indexed passage: [kyoto-guide] Arashiyama and the bamboo grove are a top autumn destination, best visited early morning to avoid crowds. That string goes back as a tool_result.
  3. Step 2 — the agent answers from what it found. With the retrieved passage now in messages, Claude returns a final answer with an end_turn stop reason: “Visit Arashiyama and the bamboo grove early morning to avoid crowds [kyoto-guide].” The loop returns. steps == 2: one model call to request retrieval, one to answer.

The final answer carries [kyoto-guide] because the retrieved passage was tagged with its source — the model cited what it actually read.

What was verified, and what’s illustrative

There’s no ANTHROPIC_API_KEY in this environment, so live Claude calls can’t run. The orchestration — the run_agent loop, tool dispatch, and the real retrieval (kb.search over the real embedded passages) — was verified against a mock client mirroring the Anthropic SDK surface. So the retrieved passage ("…best visited early morning to avoid crowds"), the [kyoto-guide] source tag, and steps == 2 are real and checked. The model’s exact wording — the “Let me check…” thought and the final sentence — is illustrative; phrasing varies run to run because it’s generated text. The Claude API code is correct as written; point it at a real client with a key and it runs unchanged.


Why Retrieval as a Tool Is Powerful

Making retrieval a tool isn’t just a tidy refactor — it unlocks behavior a fixed pipeline can’t do, because retrieval is now an action the agent reasons about:

  • It can skip retrieval. Ask “what’s 2+2” and the model just answers — no pointless search. A fixed pipeline retrieves for every question whether it needs to or not.
  • It can retrieve multiple times with refined queries. If the first search comes back thin, the model can call search_knowledge again with a better query, the same way it already rewrote “Arashiyama” into “Arashiyama bamboo grove best time.”
  • It can combine retrieval with other tools. Retrieval sits in the same tool_functions dict as get_weather or convert_currency. The model can search the knowledge base and check the weather in one run, interleaving them however the task demands.

All of this falls out of one fact: retrieval is just another action in the loop. And the loop is exactly the ReAct shape from Module 5 — the text block is the Thought (“Let me check the knowledge base”), the tool_use is the Action (search_knowledge(...)), and the tool_result is the Observation (the retrieved passage). The agent thinks, acts, observes, and reasons on. You designed search_knowledge with the tool-description discipline from Modules 2 and 3; the loop runs it with the control flow from Module 2; and the result is an agent that retrieves on its own judgment. The one thing this lesson doesn’t enforce is grounding discipline — answering only from retrieved text and refusing when there’s nothing. That’s Lesson 4.


Practice Exercises

Exercise 1: Tool RAG vs. fixed pipeline

A teammate says “let’s always run search before the model answers — simpler than a tool.” For a bot that answers only product questions, that’s reasonable. When does the tool approach win instead?

Hint

The tool wins when not every question needs the knowledge base. A general assistant that handles math, chit-chat, and product questions shouldn’t retrieve for “what’s 2+2” — that’s wasted search and irrelevant passages cluttering the prompt. Let the model decide via the tool. A fixed pipeline wins only when you know every question needs retrieval; then mandatory search is simpler and more predictable (that’s Lesson 4).

Exercise 2: Why does the description matter so much?

The search_knowledge schema says “Use before answering questions about destinations.” Why is that sentence — not the function name or the input schema — the thing that controls when the agent retrieves?

Hint

The model never sees your Python — it only sees the tool’s name, description, and input_schema. The schema says how to call the tool (one string, query); the name is a label. Only the description tells the model when to call it. A vague description makes the model retrieve at the wrong moments or skip it when it shouldn’t; a precise “use before answering questions about destinations” sets a clear trigger. The description is your one lever on tool-selection behavior.

Exercise 3: Map the trace onto ReAct

In the verified run, the agent produced a text block, then a tool_use, then (after the tool ran) a final answer. Which ReAct role — Thought, Action, Observation — is each piece?

Hint

The text block “Let me check the knowledge base for Arashiyama timing” is the Thought. The tool_use block search_knowledge({"query": "Arashiyama bamboo grove best time"}) is the Action. The tool_result carrying the retrieved passage ("…best visited early morning to avoid crowds") is the Observation. The final answer is the agent reasoning on the observation. Agentic RAG is ReAct with retrieval as one of the available actions.


Summary

There are two ways to do RAG: always retrieve before answering (a fixed pipeline, Lesson 4) or expose retrieval as a search_knowledge tool the agent calls on its own judgment — agentic RAG. You built the tool as a function that wraps kb.search(query, k=2) and returns formatted, source-tagged passages (or “No relevant passages found.”), plus a schema whose description tells the model when to use it. Wired into the canonical run_agent loop, the tool is dispatched by name like any other. In the verified trace for “When should I visit Arashiyama?”, the agent emitted a Thought and a tool_use (with a refined query), the tool returned the real [kyoto-guide] passage, and the agent answered citing the source — steps == 2, verified against an SDK-shaped mock. Because retrieval is now just an action in the loop, the agent can skip it, repeat it with better queries, or combine it with other tools.

Key Concepts

  • Two RAG patterns — always-retrieve (fixed pipeline) vs. retrieval-as-a-tool (agentic, the model decides).
  • search_knowledge tool — wraps kb.search, returns source-tagged passages or an honest empty result.
  • The description is the trigger — it tells the model when to retrieve; it’s your only lever on tool selection.
  • Retrieval is an action in the loop — Thought (text) → Action (tool_use) → Observation (tool_result), the ReAct shape from Module 5.

Why This Matters

Most production agents use retrieval as a tool, not a fixed pipeline, because real assistants handle a mix of questions — only some of which need the knowledge base. Giving the agent a search_knowledge tool lets it retrieve when it should, skip when it shouldn’t, and refine its query when the first search falls short, all without you hard-coding when to look. And because it’s just another tool in the loop, it composes with everything else the agent can do. What’s still missing is discipline: nothing yet forces the agent to answer only from what it retrieved or to refuse when the knowledge base comes up empty. That’s the grounding and citation work you’ll build next.


Next Steps

Continue to Lesson 4 - Grounding and Citations

Make the agent answer only from retrieved sources, cite every claim, and refuse honestly when the knowledge base has nothing relevant.

Back to Module Overview

Return to the Retrieval-Augmented Agents module overview


Continue Building Your Skills

You now have an agent that retrieves on its own judgment: a search_knowledge tool that wraps your knowledge base, a description that tells the model when to use it, and the canonical loop that dispatches it like any other action. Next you’ll add the discipline that makes it trustworthy — grounding answers strictly in retrieved passages, citing each source, and refusing when retrieval comes back empty.