Lesson 5 - Guided Project: Research Assistant

Welcome to the Research Assistant Project

This is where every piece of the module comes together. Over the last four lessons you built an agent loop, learned to give an agent more than one tool, wired a vector database into a retrieval step, and wrote a planning system prompt that lets the model decide its own sequence of actions. In this project you’ll assemble all of it into a single thing that feels genuinely useful: a research assistant that answers real questions about a document. You give it a goal — “how much does a year of Pro cost?” — and it works out the path on its own: search the handbook, find the price, do the arithmetic, and answer. No fixed pipeline, no hand-written “retrieve then compute.” The model drives.

You’ll work over a real document — the Acme Cloud Notes Product Handbook. Download it from https://datatweets.com/datasets/product-handbook.md and save it next to your script as product-handbook.md.

By the end of this project, you will be able to:

  • Build a knowledge-base search tool backed by a persistent vector database
  • Combine multiple tools (retrieval, computation) behind a single agent loop
  • Write a planning system prompt that makes the agent decide and chain its own steps
  • Run real multi-step research questions and read the agent’s full trajectory

Let’s build it stage by stage. Everything below was run against the live Claude API, and the output is exactly what came back.


Stage 1: Build the Knowledge Base and the Search Tool

An agent is only as good as what it can look up. Our first job is to turn the handbook into something searchable. We’ll chunk the document by its section headings, store each chunk in a persistent Chroma collection, and wrap a query into a search_docs function. Chroma computes embeddings locally, so this needs no API key and nothing leaves your machine.

import os, re
os.environ["ANONYMIZED_TELEMETRY"] = "False"

import chromadb

def load_chunks(path):
    with open(path) as f:
        text = f.read()
    # one chunk per "## " section of the handbook
    parts = re.split(r"\n(?=## )", text)
    return [p.strip() for p in parts if p.strip().startswith("## ")]

chunks = load_chunks("product-handbook.md")
print("number of chunks:", len(chunks))
for c in chunks:
    print(" -", c.splitlines()[0])

client = chromadb.PersistentClient(path="chroma_db")
collection = client.get_or_create_collection("handbook")
if collection.count() == 0:
    collection.add(
        documents=chunks,
        ids=[f"chunk-{i}" for i in range(len(chunks))],
    )
print("collection count:", collection.count())
number of chunks: 7
 - ## Accounts and Sign-In
 - ## Plans and Billing
 - ## Syncing and Offline Access
 - ## Sharing and Collaboration
 - ## Exporting and Importing
 - ## Data, Privacy, and Deletion
 - ## Support
collection count: 7

Seven clean chunks, one per topic. Chunking on headings keeps each piece self-contained — a query about billing pulls back the whole Plans and Billing section, not a sentence fragment torn out of context. Now the search tool itself. It runs a similarity query and returns the most relevant chunks as a single string, because — as you saw in Lesson 2 — a tool_result must be a string.

def search_docs(query, k=2):
    res = collection.query(query_texts=[query], n_results=k)
    return "\n\n---\n\n".join(res["documents"][0])

print(search_docs("how much does the Pro plan cost", k=1)[:200])
## Plans and Billing

Acme Cloud Notes offers three plans. The Starter plan is free forever and includes up to 100 notes and 1 GB of storage. The Pro plan costs 8 dollars per month and raises those li

The vector search found the billing section from a question that never used the word “billing.” That’s the payoff of retrieval over keyword matching: the agent can ask in plain language and still land on the right passage.


Stage 2: Define the Tools and the Planning Prompt

The search tool covers “what does the handbook say?” But a real research question often needs more — like arithmetic the model shouldn’t do in its head. So we add a calculator, using the same safe-eval pattern from earlier lessons, and a small list_topics helper so the agent can see what the handbook even covers.

def calculator(expression):
    return str(eval(expression, {"__builtins__": {}}, {}))

def list_topics():
    return ", ".join(c.splitlines()[0].replace("## ", "") for c in chunks)

Each Python function needs a schema so the model knows when and how to call it. Notice the descriptions are written for the model — they say what the tool is for and when to reach for it.

TOOLS = [
    {
        "name": "search_docs",
        "description": "Search the Acme Cloud Notes product handbook for passages "
                       "relevant to a query. Returns the most relevant sections.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "What to look for in the handbook."}
            },
            "required": ["query"],
        },
    },
    {
        "name": "calculator",
        "description": "Evaluate an arithmetic expression and return the result. Example: '8 * 12'.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "A Python arithmetic expression."}
            },
            "required": ["expression"],
        },
    },
    {
        "name": "list_topics",
        "description": "List the topics (sections) available in the product handbook.",
        "input_schema": {"type": "object", "properties": {}},
    },
]

DISPATCH = {
    "search_docs": lambda i: search_docs(i["query"]),
    "calculator":  lambda i: calculator(i["expression"]),
    "list_topics": lambda i: list_topics(),
}

The last ingredient is the planning system prompt from Lesson 4. This is what turns three loose tools into a disciplined researcher. It tells the model to plan before acting, to ground every fact in search_docs, to use the calculator for arithmetic, and — crucially — to admit when the handbook simply doesn’t have the answer.

SYSTEM = (
    "You are a research assistant for the Acme Cloud Notes product handbook. "
    "Answer the user's question using the tools provided. Plan before you act: "
    "decide what information you need, use search_docs to find facts in the handbook, "
    "and use calculator for any arithmetic. You may call tools several times and chain "
    "their results. Base every factual claim on what search_docs returns. "
    "If the handbook does not contain the answer, say so plainly instead of guessing."
)

Stage 3: The Agent Loop, and a Question That Needs Two Tools

The loop is the same one you built by hand in Lesson 2, unchanged in spirit: call the model, check stop_reason, run any requested tools, feed the results back, and repeat until the model stops asking for tools. The messages list is the agent’s memory — every turn, every tool call, and every observation accumulates there, so each decision is made with the full history in view.

import anthropic

llm = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from the environment

def run_agent(question, max_steps=6):
    messages = [{"role": "user", "content": question}]
    print(f"QUESTION: {question}")
    for step in range(1, max_steps + 1):
        resp = llm.messages.create(
            model="claude-haiku-4-5",
            max_tokens=600,
            system=SYSTEM,
            tools=TOOLS,
            messages=messages,
        )
        print(f"--- step {step}: stop_reason={resp.stop_reason} ---")
        for block in resp.content:
            if block.type == "text" and block.text.strip():
                print("  text:", block.text.strip())
            elif block.type == "tool_use":
                print(f"  tool_use: {block.name}({block.input})")

        messages.append({"role": "assistant", "content": resp.content})
        if resp.stop_reason != "tool_use":
            return "".join(b.text for b in resp.content if b.type == "text")

        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                result = DISPATCH[block.name](block.input)
                print(f"  -> result: {result.replace(chr(10), ' ')[:120]}")
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })
        messages.append({"role": "user", "content": tool_results})
    return "(stopped: max steps reached)"

Two details carry over from Lesson 2 and matter here. The assistant’s content is appended whole — text blocks and tool-use blocks together — so the model sees its own reasoning next turn. And each tool_result is keyed to the tool_use_id it answers, so the model can match results to the calls it made. Now the question this whole module has been building toward — one that no single tool can answer:

answer = run_agent("If I'm on the Pro plan, how much do I pay for a full year "
                   "(ignore the annual discount)?")
print("FINAL ANSWER:", answer)
QUESTION: If I'm on the Pro plan, how much do I pay for a full year (ignore the annual discount)?
--- step 1: stop_reason=tool_use ---
  text: I'll search the handbook for information about the Pro plan pricing.
  tool_use: search_docs({'query': 'Pro plan pricing cost'})
  -> result: ## Plans and Billing  Acme Cloud Notes offers three plans. The Starter plan is free forever...
--- step 2: stop_reason=tool_use ---
  text: Great! I found the pricing information. According to the handbook, the Pro plan costs **$8 per month**. Let me calculate the annual cost without the discount:
  tool_use: calculator({'expression': '8 * 12'})
  -> result: 96
--- step 3: stop_reason=end_turn ---
  text: If you're on the Pro plan, you pay **$96 for a full year** (without the annual discount).
FINAL ANSWER: If you're on the Pro plan, you pay **$96 for a full year** (without the annual discount).

This is the entire module in one trajectory. The agent planned a two-step sequence on its own: it had no Pro price, so it searched the docs; it read back 8 dollars per month, recognized that “a full year” meant arithmetic, and called the calculator with 8 * 12; then it answered. You never wrote “retrieve, then compute” — the model chose that order because the question demanded it. Watch how the 8 from step 1 flows into the expression in step 2: an observation becoming the input to the next action. That is exactly what makes this an agent and not a script.


Stage 4: More Questions — Including One It Should Refuse

A real research assistant has to handle the easy questions, the harder ones, and the ones it honestly can’t answer. Let’s put ours through all three.

First, a straightforward retrieval — no arithmetic needed, so the agent should use exactly one tool and stop.

print(run_agent("How long do I have to recover a deleted note?"))
QUESTION: How long do I have to recover a deleted note?
--- step 1: stop_reason=tool_use ---
  text: I'll search the handbook for information about deleted note recovery.
  tool_use: search_docs({'query': 'recover deleted note recovery time period'})
  -> result: ## Data, Privacy, and Deletion  Your notes belong to you. Acme Cloud Notes does not read your notes...
--- step 2: stop_reason=end_turn ---
  text: According to the Acme Cloud Notes handbook, when you delete a note, it goes to Trash where it remains recoverable for **30 days** before it is permanently removed.
FINAL ANSWER: ...

One search, one answer — the agent doesn’t over-tool a simple question. Now another two-tool question with more complex arithmetic, to confirm the retrieve-then-compute pattern holds for a different plan:

print(run_agent("My team has 5 people on the Team plan. How much is that per year, "
                "ignoring the annual discount?"))
QUESTION: My team has 5 people on the Team plan. How much is that per year, ignoring the annual discount?
--- step 1: stop_reason=tool_use ---
  tool_use: search_docs({'query': 'Team plan pricing cost per person'})
  -> result: ## Plans and Billing  Acme Cloud Notes offers three plans...
--- step 2: stop_reason=tool_use ---
  text: The Team plan costs $12 per user per month. Let me calculate the annual cost for 5 people:
  tool_use: calculator({'expression': '12 * 5 * 12'})
  -> result: 720
--- step 3: stop_reason=end_turn ---
  text: With 5 people on the Team plan at $12 per user per month, the annual cost ... would be **$720 per year**.
FINAL ANSWER: ...

The agent built a three-factor expression — 12 * 5 * 12, dollars by users by months — from a price it had to look up and numbers it pulled from the question. Finally, the test that separates a trustworthy assistant from a confident liar: a question the handbook can’t answer.

print(run_agent("Does Acme Cloud Notes integrate with Slack?"))
QUESTION: Does Acme Cloud Notes integrate with Slack?
--- step 1: stop_reason=tool_use ---
  text: I'll search the handbook to find information about Slack integration.
  tool_use: search_docs({'query': 'Slack integration'})
  -> result: ## Support  Starter and Pro users can reach support by email...
--- step 2: stop_reason=tool_use ---
  text: The search results don't mention Slack integration. Let me try a broader search for integrations.
  tool_use: search_docs({'query': 'integrations'})
  -> result: ## Syncing and Offline Access  Notes sync automatically across all your devices...
--- step 3: stop_reason=end_turn ---
  text: Based on my search of the Acme Cloud Notes handbook, I cannot find any information about Slack integration. The handbook does not mention Slack or any third-party integrations. If you need to know whether this feature exists, I'd recommend contacting Acme Cloud Notes support at [email protected]...
FINAL ANSWER: ...

This is the trajectory to be proudest of. The agent searched once, didn’t find the answer, searched again with a broader query to be sure, and still came up empty — so it declined honestly and even suggested where to get a real answer. The planning prompt’s instruction to “say so plainly instead of guessing” did its job. An assistant that knows the limits of its own knowledge base is far more useful than one that confabulates.

This is the architecture behind real assistants

What you just built — a loop, a few tools, a vector-store search, conversation memory, and a planning prompt — is the same shape that powers production research assistants and coding agents. The difference at scale is mostly more: more tools, bigger knowledge bases, smarter retrieval. In Module 9 you’ll meet LangChain and LangGraph, frameworks that hand you this exact loop, tool wiring, and memory out of the box, so you can focus on the tools and the prompt instead of rewriting the plumbing each time.


Extend the Project

The assistant works — now make it better. Each exercise builds on the code above.

Exercise 1: Add a “compare plans” capability

Right now the agent answers one plan at a time. Add a question like “Which is cheaper for one person over a year, Pro or Team?” and watch whether the agent searches once and runs two calculator calls, or searches twice. You don’t need a new tool — just a harder question. Then consider: would a dedicated compare_plans tool help, or does the model handle it well with what it has?

Hint

A single search_docs("plan pricing") returns the whole Plans and Billing section, which contains both prices — so the agent can read both and make two calculator calls in one trajectory. Try it and read the steps. Adding a narrow tool only pays off when the model struggles without it; often a good prompt and a good search tool are enough.

Exercise 2: Cap the steps with a clear message

The loop returns "(stopped: max steps reached)" if it runs out of steps, but the agent never learns that happened. Change the loop so that on the final allowed step you inject a tool_result (or a system reminder) telling the model “this is your last step, answer now with what you have.” Lower max_steps to 2 and ask a two-tool question to trigger it.

Hint

Track the step number; when you’re about to exceed max_steps, append a user message like “You have reached the step limit — give your best answer now using the information gathered so far.” A graceful cap beats a silent truncation, especially for agents that might otherwise loop forever.

Exercise 3: Log the trajectory and cite sources

Real research tools show their work. Add a step that collects which chunk each search_docs call returned, then have the agent append a “Sources” line naming the handbook sections it relied on. Store the trajectory (every tool_use and result) in a list so you can inspect or save it after the run.

Hint

Have search_docs also return the chunk’s heading (the first line of each document) and ask the system prompt to end answers with “Sources:

.” Collecting the trajectory is just appending each step’s tool name, input, and result to a list inside the loop — the same data you’re already printing.


Summary

You built a complete research-assistant agent by combining everything from this module. A persistent Chroma collection holds the handbook as seven heading-based chunks, and a search_docs tool turns plain-language questions into relevant passages. A calculator (safe-eval) and a list_topics helper round out the toolbox. The agent loop from Lesson 2 drives it all — checking stop_reason, running tools, and feeding tool_results back keyed by tool_use_id — while the growing messages list serves as the agent’s memory. A planning system prompt ties it together, telling the model to plan, ground its facts in retrieval, compute when needed, and decline honestly when the docs fall short. You watched it answer a question that needed both retrieval and computation, handle a simple lookup with a single tool, chain a three-factor calculation, and refuse a question the handbook couldn’t answer — each time choosing its own steps.

Key Concepts

  • Research-assistant agent — a loop with retrieval, computation, and memory that answers multi-step questions from your documents.
  • Knowledge-base toolsearch_docs over a persistent vector store, letting the agent look up facts in plain language.
  • Tool composition — multiple tools (search, calculator, list_topics) behind one loop, with the model choosing which and when.
  • Planning prompt — a system prompt that makes the agent plan, ground claims in retrieval, and decline when it can’t answer.
  • Trajectory — the real sequence of tool calls and observations that reveals exactly how the agent reasoned.

Why This Matters

This is the project that proves the module: you no longer just understand agents, you’ve shipped one that does real work over a real document. The same architecture — loop, tools, retrieval, memory, planning prompt — scales from this seven-chunk handbook to assistants searching entire knowledge bases. And the habits you practiced here, especially grounding every answer in retrieval and refusing to guess, are exactly what make an agent trustworthy enough to deploy. Next you’ll build agents like this faster, on frameworks designed for the job.


Next Steps

Continue to Module 9 - LangChain & LangGraph

Frameworks that give you the agent loop, tool wiring, and memory out of the box — so you build production agents without rewriting the plumbing.

Back to Module Overview

Return to the Building AI Agents module overview


Continue Building Your Skills

You’ve assembled a working research assistant from raw parts — a loop, a vector store, a couple of tools, and a careful prompt — and seen it reason through real questions on its own. From here, the next step is leverage: frameworks that hand you that machinery prebuilt, so you can spend your energy on the tools and tasks that make your agent genuinely useful.