Lesson 1 - Why Agents Need Retrieval

Welcome to Why Agents Need Retrieval

Your agent is capable now — it loops, calls tools, remembers, and plans. But ask it something that depends on your data — “what’s our refund policy?”, “what did the Q3 report say?”, “is the Arashiyama bamboo grove worth an early start?” — and it does one of two things: answers from whatever happened to be in its training data, or makes something up that sounds right. Neither is good enough for a real product. The model’s knowledge is frozen at training time and blind to anything private. This module gives the agent a way out: retrieval — looking facts up in a knowledge base you control, and answering from what it finds. This lesson is about why that matters before we build it.

By the end of this lesson, you will be able to:

  • Explain the three things a model’s built-in knowledge can’t do: be current, be private, be certain
  • Describe retrieval-augmented generation (RAG) and the retrieve-augment-generate pattern
  • See why retrieval is the cure for confident wrong answers on out-of-knowledge questions
  • Recognize how the pieces map onto tools and memory you’ve already built

Let’s start with what’s actually missing.


What a Model Doesn’t Know

A language model is trained once, on a snapshot of data, and then frozen. That gives it broad general knowledge — but it creates three hard limits that matter the moment you build something real:

  • It isn’t current. Anything that happened after training simply isn’t there. Today’s prices, this week’s release notes, the latest policy — invisible.
  • It isn’t private. Your company’s docs, your user’s history, your own notes were never in the training set. The model has no way to know them.
  • It isn’t certain. When the model doesn’t know, it rarely says so. It produces a fluent, plausible answer that may be wrong — a hallucination. On exactly the questions where you most need a correct answer, a bare model is most likely to confidently invent one.

None of these is fixed by a bigger model or a better prompt. They’re structural: the knowledge isn’t in the model. The fix is to put the knowledge somewhere the agent can look it up at the moment it needs it — and to make it answer from what it looked up.


Retrieve, Augment, Generate

That’s exactly what retrieval-augmented generation (RAG) does. Instead of asking the model to answer from memory, you first retrieve relevant text from a knowledge base, augment the prompt with it, and only then generate the answer:

Retrieval-augmented generation pipeline. A Question box ('When do leaves peak in Kyoto?') flows left to right through four stages: Retrieve (search the knowledge base by similarity), Augment (put the retrieved passages into the prompt), and Generate (answer grounded in sources). Below, a Knowledge base cylinder labeled 'your documents, chunked and embedded' feeds the Retrieve step, and a cited-answer box ('...mid-to-late November [1]', 'every claim cites a source') comes out of Generate. A caption reads: the model answers from what it retrieved, not from memory, so answers stay current, private, and checkable.
Retrieve-augment-generate: search a knowledge base you control, put the relevant passages into the prompt, and have the model answer from them — so the answer reflects your data, not just the model's frozen training.

The shift is small but profound. A bare model answers the question from its weights. A retrieval-augmented agent answers it from sources you provided — which means the answer can be current (your knowledge base is as fresh as you make it), private (it’s your data), and checkable (you know which passage each claim came from). The model stops being the source of truth and becomes the thing that reads and summarizes the source of truth.

RAG is grounding, not training

Retrieval does not change the model’s weights — there’s no fine-tuning, no training run. You’re handing the model relevant text at question time and asking it to answer from that text. That’s why RAG is so practical: update a document in your knowledge base and the agent’s answers update instantly, with no retraining. The model supplies language and reasoning; your knowledge base supplies the facts.


You’ve Already Built Most of the Parts

Here’s the encouraging part: retrieval isn’t a new world. It’s the memory module’s machinery pointed at documents instead of conversation. In Module 4 you built a VectorMemory: it turned notes into vectors with embed() and searched them by similarity. A knowledge base is the same idea — chunk your documents into passages, embed each one, and search them with the user’s question. The “search the store, prepend what’s relevant” move you used for long-term memory is retrieval.

And the way the agent uses retrieval is just a tool, the thing you designed in Modules 2 and 3. Give the agent a search_knowledge tool and it can decide, mid-loop, to look something up — retrieve, read the result, and reason on. So this module recombines two things you already have:

  • Decomposition of knowledge (from memory): chunk → embed → search, now over documents — that’s the knowledge base (Lesson 2).
  • Retrieval as an action (from tools): a search_knowledge tool the agent calls when it needs facts — agentic RAG (Lesson 3).

The new skill the module adds on top is discipline: answering only from what was retrieved, citing sources, and refusing honestly when the knowledge base has nothing relevant (Lesson 4). That discipline is what turns “the model looked at some text” into “the agent gave a grounded, trustworthy answer.”


Practice Exercises

Exercise 1: Which questions need retrieval?

Which of these can a bare model answer well, and which need a knowledge base? (a) “What’s the capital of France?” (b) “What’s our company’s parental-leave policy?” (c) “What were last night’s game scores?”

Hint

(a) is stable, public, general knowledge — a bare model is fine. (b) is private — it was never in training data, so it needs retrieval from your docs. (c) is current — it happened after training, so it needs retrieval from a fresh source. The pattern: anything private or recent needs retrieval; timeless public facts usually don’t.

Exercise 2: Why does retrieval reduce hallucination?

A bare model asked an out-of-knowledge question tends to invent a confident answer. Why does giving it retrieved passages plus the instruction “answer only from these sources” help?

Hint

Two reasons. First, the answer is now in front of the model as text, so it can summarize fact instead of guessing. Second, the instruction gives it permission — and a rule — to say “the sources don’t cover this” when retrieval comes back empty, instead of filling the gap. You’ll build exactly that refusal gate in Lesson 4.

Exercise 3: Map RAG onto what you’ve built

Retrieval reuses two things from earlier modules. Which earlier piece becomes the knowledge base, and which becomes the way the agent triggers a lookup?

Hint

The knowledge base is the VectorMemory/embedding machinery from Module 4 (chunk → embed → search), now applied to documents. The trigger is a tool (Modules 2–3): a search_knowledge function the agent calls inside the loop. RAG is those two ideas combined, plus grounding discipline on top.


Summary

A model’s built-in knowledge is frozen and private-blind, and on questions it can’t answer it tends to guess confidently. Retrieval fixes this without retraining: retrieve relevant text from a knowledge base you control, augment the prompt with it, and generate an answer from those sources. The result is current, private, and checkable — the model summarizes your data instead of being the source of truth. And it’s built from parts you already have: the embed-and-search machinery from the memory module becomes the knowledge base, and a tool lets the agent trigger a lookup mid-loop. The one new skill is grounding discipline — answer only from what was retrieved, cite it, and refuse when there’s nothing relevant.

Key Concepts

  • Three gaps in built-in knowledge — not current, not private, not certain.
  • Retrieve-augment-generate — search a knowledge base, add the passages to the prompt, then answer.
  • RAG is grounding, not training — no weight changes; update a doc and answers update instantly.
  • Reuses what you’ve built — embedding/search (memory) is the knowledge base; a tool triggers retrieval.

Why This Matters

Retrieval is what takes an agent from “impressive on general questions” to “useful on your problem.” Almost every production agent — support bots, research assistants, internal copilots — is a retrieval-augmented agent at its core, because almost every real task depends on data the model was never trained on. Knowing why retrieval is needed, and that it’s grounding rather than training, is what lets you reach for it at the right moment instead of fine-tuning or over-prompting. Next, you’ll build the knowledge base itself: chunk documents, embed them, and search.


Next Steps

Continue to Lesson 2 - Building a Knowledge Base

Chunk documents into passages, embed them, and search by similarity — the store your agent retrieves from.

Back to Module Overview

Return to the Retrieval-Augmented Agents module overview


Continue Building Your Skills

You now know why an agent needs retrieval — current, private, checkable knowledge instead of frozen guesses — and the retrieve-augment-generate pattern that delivers it. Next you’ll build the knowledge base: take documents, chunk them into passages, embed each one, and search them with a query, reusing the embedding machinery from the memory module.