Lesson 3 - Giving Agents Memory and Tools
Welcome to Giving Agents Memory and Tools
In Module 7 your RAG pipeline always retrieved before it answered. That was a workflow: you wrote “retrieve, then generate,” and it ran every time, even for a question the model could answer on its own. In this lesson you’ll hand that decision to the agent. You’ll wrap your vector search in a search_docs tool, describe it, and let the model decide when to reach for it — looking things up for a product question, and skipping the lookup entirely for a greeting or a bit of arithmetic. Then you’ll give the agent memory by keeping a single running messages list across turns, so a follow-up like “and how much is that per year?” knows what “that” refers to.
You’ll work with a real document, the Acme Cloud Notes product handbook. Download it from https://datatweets.com/datasets/product-handbook.md and save it next to your script as product-handbook.md.
By the end of this lesson, you will be able to:
- Wrap a vector-database retrieval call as a
search_docstool the agent can choose to call - Explain how retrieval-as-a-tool differs from the forced retrieval of a RAG workflow
- Maintain conversation memory by keeping one
messageslist across multiple user turns - Read a real trajectory where the agent retrieves only when needed and resolves a follow-up from memory
You’ll build on the agent loop from Lesson 2 and the vector store from Module 7. Let’s begin.
Retrieval as a Tool
Start by loading the handbook into a vector store, exactly as you did for RAG — but this time you won’t call it directly in your pipeline. You’ll expose it through a tool. Split the handbook into chunks by section heading, store them in a Chroma collection (Chroma computes the embeddings locally, no API key needed), and wrap a query into a plain Python function.
import os
import re
os.environ["ANONYMIZED_TELEMETRY"] = "False" # quiet Chroma telemetry
import chromadb
# --- Build the knowledge base from the handbook ---
text = open("product-handbook.md", encoding="utf-8").read()
# One chunk per "## " section
chunks = [p.strip() for p in re.split(r"\n(?=## )", text)
if p.strip().startswith("## ")]
chroma = chromadb.Client()
collection = chroma.create_collection(name="handbook")
collection.add(
documents=chunks,
ids=[f"chunk-{i}" for i in range(len(chunks))], # ids must be strings
)
print("Chunks stored:", len(chunks))
for i, c in enumerate(chunks):
print(f" chunk-{i}: {c.splitlines()[0]}")Chunks stored: 7
chunk-0: ## Accounts and Sign-In
chunk-1: ## Plans and Billing
chunk-2: ## Syncing and Offline Access
chunk-3: ## Sharing and Collaboration
chunk-4: ## Exporting and Importing
chunk-5: ## Data, Privacy, and Deletion
chunk-6: ## SupportSeven clean chunks, one per section. Now the key step: turn the query into a function, and confirm it returns the right section for a real question.
def search_docs(query, n_results=2):
"""Return the most relevant handbook passages for a query."""
res = collection.query(query_texts=[query], n_results=n_results)
return "\n\n---\n\n".join(res["documents"][0])
# Sanity check: which chunks come back for a pricing question?
res = collection.query(query_texts=["How much does the Pro plan cost?"],
n_results=2)
for cid, dist in zip(res["ids"][0], res["distances"][0]):
print(f"{cid} distance={dist:.4f}")chunk-1 distance=0.7799
chunk-6 distance=1.5203The “Plans and Billing” section (chunk-1) comes back first, with a much smaller distance than the runner-up. Chroma’s default distance is squared L2, so lower means closer — chunk-1 is clearly the best match. This search_docs function is everything the agent needs; the next step is letting the agent decide when to call it.
Letting the Agent Decide When to Retrieve
To make search_docs available to the agent, describe it as a tool. The description is the only thing the model reads when deciding whether to call it, so write it like instructions: say what the tool does and when to use it. We’ll add a calculator too, so the agent has a choice of tools.
import anthropic
tools = [
{
"name": "search_docs",
"description": (
"Search the Acme Cloud Notes product handbook for relevant "
"passages. Use this whenever a question is about the product "
"(plans, pricing, syncing, sharing, exporting, privacy, "
"support). Returns the most relevant handbook sections as text."
),
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What to look up"}
},
"required": ["query"],
},
},
{
"name": "calculator",
"description": "Evaluate an arithmetic expression and return the numeric result.",
"input_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "e.g. '8 * 12'"}
},
"required": ["expression"],
},
},
]
def calculator(expression):
return eval(expression, {"__builtins__": {}}, {})
tool_functions = {"search_docs": search_docs, "calculator": calculator}The agent loop is the same one from Lesson 2, with one deliberate change: messages is passed in and mutated, instead of being created fresh inside the function. That single change is what gives us memory in the next section — the loop appends to whatever conversation you hand it.
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from the environment
SYSTEM = (
"You are a support assistant for the app Acme Cloud Notes. "
"When a question is about the product, use the search_docs tool to find "
"the answer in the handbook rather than guessing. For trivial questions "
"(greetings, simple arithmetic) just answer directly."
)
def run_agent(messages, max_steps=6):
"""Drive the think-act-observe loop over a running messages list."""
for step in range(1, max_steps + 1):
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=400,
system=SYSTEM,
tools=tools,
messages=messages,
)
text = "".join(b.text for b in resp.content if b.type == "text")
print(f"--- step {step}: stop_reason={resp.stop_reason} ---")
if text:
print(f" text: {text}")
# Always record the assistant turn so memory stays complete
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason != "tool_use":
return text
tool_results = []
for block in resp.content:
if block.type == "tool_use":
result = tool_functions[block.name](**block.input)
print(f" tool_use: {block.name}({block.input})")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result), # tool_result content must be a string
})
messages.append({"role": "user", "content": tool_results})
return "Stopped: reached max_steps."Now watch the agent choose. First, a product question:
messages = [{"role": "user", "content": "How much does the Pro plan cost?"}]
run_agent(messages)--- step 1: stop_reason=tool_use ---
tool_use: search_docs({'query': 'Pro plan pricing cost'})
-> result: ## Plans and Billing Acme Cloud Notes offers three plans...
--- step 2: stop_reason=end_turn ---
text: The Pro plan costs **$8 per month**. It includes unlimited notes,
50 GB of storage, and version history. If you prefer annual billing, you
can save 20% with an annual subscription instead of paying monthly.The agent decided, on its own, that this needed the docs. It wrote a search query (“Pro plan pricing cost” — note that’s not a verbatim copy of the user’s words), read the returned section, and answered from it. Compare with a question that doesn’t touch the product at all:
run_agent([{"role": "user", "content": "Hi! Can you help me out?"}])--- step 1: stop_reason=end_turn ---
text: Hi! I'd be happy to help you out. I'm a support assistant for Acme
Cloud Notes. What can I help you with? Whether you have questions about
the product, need help with features, or have other questions, just let
me know!No tool call. The agent went straight to end_turn and answered directly. This is the whole point of retrieval-as-a-tool: in your Module 7 workflow, the greeting would still have triggered a vector search and stuffed irrelevant handbook text into the prompt. Here the agent skipped it, because nothing in the question needed the docs. The model judges relevance for itself.
Conversation Memory Across Turns
So far each call started from a fresh messages list. Memory is just the decision not to throw that list away. Because run_agent appends every turn — user messages, the assistant’s tool calls, and the tool results — to the same list, you can ask a follow-up by appending to it and calling again. The agent sees the entire history, including its own earlier answer.
Pick up the conversation right where the pricing question left off (we still have that messages list from earlier). The follow-up never says “the Pro plan” — it just says “that”:
messages.append({
"role": "user",
"content": "And what does that come to per year, without the annual discount?",
})
run_agent(messages)--- step 1: stop_reason=tool_use ---
tool_use: calculator({'expression': '8 * 12'})
-> result: 96
--- step 2: stop_reason=end_turn ---
text: At $8 per month, the Pro plan comes to **$96 per year** without the
annual discount. With the 20% annual discount, it would be $76.80 per year.This only works because of memory. The agent resolved “that” to the Pro plan’s $8/month from its previous answer — there’s nothing about $8 in the new user message. It then chose the calculator (not search_docs, because it already had the number it needed) and computed 8 * 12 = 96. A different tool for a different sub-task, all driven by what it remembered. If you had started a fresh messages list, the agent would have had no idea what “that” meant and would have had to search the docs again or ask you to clarify.
The cost of memory is that the list keeps growing. After this exchange the conversation held eight messages, and it would have kept climbing with every turn. That growth is what eventually runs into the model’s context window — a problem you’ll tackle head-on later in this module.
Memory is the messages list; tools are a menu, not a script
Two ideas come together here. First, memory is nothing more than a messages list you don’t reset — keep appending and the agent remembers; start fresh and it forgets. Second, a tool is an option the model may take, not a step you force. The agent searched for the pricing question, skipped the search for the greeting, and reached for the calculator on the follow-up — three different paths through the same toolbox, each chosen by the model. The flip side of memory is that the list only grows, which is why context limits become the next thing to manage.
Practice Exercises
Exercise 1: Tool description as the trigger
The agent decided to call search_docs for the pricing question but not for the greeting. The only thing it read to make that call was the tool’s description. Rewrite the description so the agent uses search_docs more aggressively — even for borderline questions. What words would you add or change?
Hint
The description is the agent’s instruction manual for the tool. Phrases like “Use this for any question that mentions Acme Cloud Notes features, even indirectly” or “When in doubt, search first” push the model toward calling it. Make the description vague (“look things up”) and the agent will skip it more often. The behavior you want lives in the words.
Exercise 2: Break the memory
Take the follow-up question — “And what does that come to per year?” — but instead of appending it to the existing messages, send it in a brand-new list: run_agent([{"role": "user", "content": "..."}]). Predict what happens, then run it. Why does the answer change?
Hint
With a fresh list the agent has no record of the $8/month answer, so “that” is undefined. It will likely search the docs to figure out what plan you mean, or ask you to clarify. This shows memory is the running messages list — reset it and the agent’s “memory” is gone.
Exercise 3: A question that needs both tools
Ask the agent: “How much would the Team plan cost per month for 5 users?” Trace which tools it calls and in what order. Does it search first, then calculate? Why does the order matter?
Hint
The agent needs the per-user price from the handbook ($12/user/month) before it can multiply by 5, so it should call search_docs first and calculator second — the result of one step feeds the next, just like the chaining you saw in Lesson 1. If it tried to calculate first it wouldn’t have a number to work with.
Summary
You turned retrieval into a tool the agent calls on its own, instead of a forced step in a workflow. By wrapping a Chroma vector search in a search_docs function and describing it well, you let the model decide when to look things up: it searched for a pricing question, skipped the search entirely for a greeting, and chose the calculator for a numeric follow-up. You also gave the agent memory by keeping one running messages list across turns — so a follow-up that said only “that” correctly resolved to the Pro plan’s price from the agent’s previous answer. Memory is simply the list you choose not to reset, and the price of keeping it is that the list grows toward the model’s context limit.
Key Concepts
- Retrieval as a tool — exposing your vector search as a
search_docstool the agent may call, rather than always running it. - Forced retrieval vs. model choice — a RAG workflow always retrieves; an agent retrieves only when it judges the question needs it.
- Tool description as a trigger — the agent decides whether to call a tool based almost entirely on the description you write.
- Conversation memory — keeping a single
messageslist across turns so the agent remembers earlier context. - Memory growth — the running list grows every turn, eventually pressing against the context window.
Why This Matters
Real assistants don’t search a knowledge base for every message, and they don’t forget what you said two sentences ago. Retrieval-as-a-tool keeps the agent fast and focused — no irrelevant context jammed into a simple “hello” — while memory makes conversations feel natural, because follow-ups just work. Together they’re the difference between a one-shot question box and an assistant you can actually talk to. These two patterns underpin nearly every production agent, and they set up the next challenge: managing the steps and the context as conversations get longer.
Next Steps
Continue to Lesson 4 - Planning and Multi-Step Tasks
Watch an agent break a goal into steps, chain several tool calls, and work through a multi-step task to completion.
Back to Module Overview
Return to the Building AI Agents module overview
Continue Building Your Skills
Your agent can now look things up when it needs to and remember what was said. Next you’ll push it harder: give it a goal that takes several steps to reach, and watch it plan and chain tool calls all the way to the finish — the behavior that makes agents genuinely useful.