Lesson 1 - Why Agents Need Retrieval
Welcome to Why Agents Need Retrieval
Your agent is capable now — it loops, calls tools, remembers, and plans. But ask it something that depends on your data — “what’s our refund policy?”, “what did the Q3 report say?”, “is the Arashiyama bamboo grove worth an early start?” — and it does one of two things: answers from whatever happened to be in its training data, or makes something up that sounds right. Neither is good enough for a real product. The model’s knowledge is frozen at training time and blind to anything private. This module gives the agent a way out: retrieval — looking facts up in a knowledge base you control, and answering from what it finds. This lesson is about why that matters before we build it.
By the end of this lesson, you will be able to:
- Explain the three things a model’s built-in knowledge can’t do: be current, be private, be certain
- Describe retrieval-augmented generation (RAG) and the retrieve-augment-generate pattern
- See why retrieval is the cure for confident wrong answers on out-of-knowledge questions
- Recognize how the pieces map onto tools and memory you’ve already built
Let’s start with what’s actually missing.
What a Model Doesn’t Know
A language model is trained once, on a snapshot of data, and then frozen. That gives it broad general knowledge — but it creates three hard limits that matter the moment you build something real:
- It isn’t current. Anything that happened after training simply isn’t there. Today’s prices, this week’s release notes, the latest policy — invisible.
- It isn’t private. Your company’s docs, your user’s history, your own notes were never in the training set. The model has no way to know them.
- It isn’t certain. When the model doesn’t know, it rarely says so. It produces a fluent, plausible answer that may be wrong — a hallucination. On exactly the questions where you most need a correct answer, a bare model is most likely to confidently invent one.
None of these is fixed by a bigger model or a better prompt. They’re structural: the knowledge isn’t in the model. The fix is to put the knowledge somewhere the agent can look it up at the moment it needs it — and to make it answer from what it looked up.
Retrieve, Augment, Generate
That’s exactly what retrieval-augmented generation (RAG) does. Instead of asking the model to answer from memory, you first retrieve relevant text from a knowledge base, augment the prompt with it, and only then generate the answer:
The shift is small but profound. A bare model answers the question from its weights. A retrieval-augmented agent answers it from sources you provided — which means the answer can be current (your knowledge base is as fresh as you make it), private (it’s your data), and checkable (you know which passage each claim came from). The model stops being the source of truth and becomes the thing that reads and summarizes the source of truth.
RAG is grounding, not training
Retrieval does not change the model’s weights — there’s no fine-tuning, no training run. You’re handing the model relevant text at question time and asking it to answer from that text. That’s why RAG is so practical: update a document in your knowledge base and the agent’s answers update instantly, with no retraining. The model supplies language and reasoning; your knowledge base supplies the facts.
You’ve Already Built Most of the Parts
Here’s the encouraging part: retrieval isn’t a new world. It’s the memory module’s machinery pointed at documents instead of conversation. In Module 4 you built a VectorMemory: it turned notes into vectors with embed() and searched them by similarity. A knowledge base is the same idea — chunk your documents into passages, embed each one, and search them with the user’s question. The “search the store, prepend what’s relevant” move you used for long-term memory is retrieval.
And the way the agent uses retrieval is just a tool, the thing you designed in Modules 2 and 3. Give the agent a search_knowledge tool and it can decide, mid-loop, to look something up — retrieve, read the result, and reason on. So this module recombines two things you already have:
- Decomposition of knowledge (from memory): chunk → embed → search, now over documents — that’s the knowledge base (Lesson 2).
- Retrieval as an action (from tools): a
search_knowledgetool the agent calls when it needs facts — agentic RAG (Lesson 3).
The new skill the module adds on top is discipline: answering only from what was retrieved, citing sources, and refusing honestly when the knowledge base has nothing relevant (Lesson 4). That discipline is what turns “the model looked at some text” into “the agent gave a grounded, trustworthy answer.”
Practice Exercises
Exercise 1: Which questions need retrieval?
Which of these can a bare model answer well, and which need a knowledge base? (a) “What’s the capital of France?” (b) “What’s our company’s parental-leave policy?” (c) “What were last night’s game scores?”
Hint
(a) is stable, public, general knowledge — a bare model is fine. (b) is private — it was never in training data, so it needs retrieval from your docs. (c) is current — it happened after training, so it needs retrieval from a fresh source. The pattern: anything private or recent needs retrieval; timeless public facts usually don’t.
Exercise 2: Why does retrieval reduce hallucination?
A bare model asked an out-of-knowledge question tends to invent a confident answer. Why does giving it retrieved passages plus the instruction “answer only from these sources” help?
Hint
Two reasons. First, the answer is now in front of the model as text, so it can summarize fact instead of guessing. Second, the instruction gives it permission — and a rule — to say “the sources don’t cover this” when retrieval comes back empty, instead of filling the gap. You’ll build exactly that refusal gate in Lesson 4.
Exercise 3: Map RAG onto what you’ve built
Retrieval reuses two things from earlier modules. Which earlier piece becomes the knowledge base, and which becomes the way the agent triggers a lookup?
Hint
The knowledge base is the VectorMemory/embedding machinery from Module 4 (chunk → embed → search), now applied to documents. The trigger is a tool (Modules 2–3): a search_knowledge function the agent calls inside the loop. RAG is those two ideas combined, plus grounding discipline on top.
Summary
A model’s built-in knowledge is frozen and private-blind, and on questions it can’t answer it tends to guess confidently. Retrieval fixes this without retraining: retrieve relevant text from a knowledge base you control, augment the prompt with it, and generate an answer from those sources. The result is current, private, and checkable — the model summarizes your data instead of being the source of truth. And it’s built from parts you already have: the embed-and-search machinery from the memory module becomes the knowledge base, and a tool lets the agent trigger a lookup mid-loop. The one new skill is grounding discipline — answer only from what was retrieved, cite it, and refuse when there’s nothing relevant.
Key Concepts
- Three gaps in built-in knowledge — not current, not private, not certain.
- Retrieve-augment-generate — search a knowledge base, add the passages to the prompt, then answer.
- RAG is grounding, not training — no weight changes; update a doc and answers update instantly.
- Reuses what you’ve built — embedding/search (memory) is the knowledge base; a tool triggers retrieval.
Why This Matters
Retrieval is what takes an agent from “impressive on general questions” to “useful on your problem.” Almost every production agent — support bots, research assistants, internal copilots — is a retrieval-augmented agent at its core, because almost every real task depends on data the model was never trained on. Knowing why retrieval is needed, and that it’s grounding rather than training, is what lets you reach for it at the right moment instead of fine-tuning or over-prompting. Next, you’ll build the knowledge base itself: chunk documents, embed them, and search.
Next Steps
Continue to Lesson 2 - Building a Knowledge Base
Chunk documents into passages, embed them, and search by similarity — the store your agent retrieves from.
Back to Module Overview
Return to the Retrieval-Augmented Agents module overview
Continue Building Your Skills
You now know why an agent needs retrieval — current, private, checkable knowledge instead of frozen guesses — and the retrieve-augment-generate pattern that delivers it. Next you’ll build the knowledge base: take documents, chunk them into passages, embed each one, and search them with a query, reusing the embedding machinery from the memory module.