Lesson 1 - The Messages List Is Memory
Welcome to The Messages List Is Memory
People often imagine “agent memory” as some special subsystem. It isn’t — at least not the short-term kind. Look back at the agent loop you built: every turn, you append the model’s response and the tool results to one list called messages, and you send that whole list on the next call. That list is the agent’s memory. It’s why the agent can refer to something from three steps ago: the entire history is right there in the prompt. Seeing this clearly is the key to the whole module, because it immediately reveals the two problems you’ll spend the rest of it solving.
By the end of this lesson, you will be able to:
- Explain why the
messageslist is the agent’s short-term memory - Identify the two problems with relying on it alone
- Distinguish short-term memory from long-term memory
- Describe what each kind of memory is good for
This builds directly on the agent loop from Module 2. Let’s begin.
Memory Is Just the Running List
Here’s the loop’s core move, the one you wrote in Module 2:
messages = [{"role": "user", "content": user_message}]
# ... each turn:
messages.append({"role": "assistant", "content": response.content}) # what the model said
messages.append({"role": "user", "content": tool_results}) # what the tools returned
# ... and the whole list goes back on the next call:
response = client.messages.create(model=..., messages=messages, tools=tools)Because the entire messages list is sent on every call, the model sees the full conversation each time — the original request, every tool it called, every result it got back, everything it said. That’s what makes the agent feel like it “remembers.” There’s no separate memory store involved; the transcript is the memory. Each step the loop just adds to it.
This is wonderfully simple, and for short interactions it’s all you need. But “send the whole list every time” carries two consequences that bite as soon as conversations get longer.
Problem One: It Grows Without Bound
Every turn adds messages, and you resend all of them. A long agent run — many tool calls, long results — produces a long list, and that creates two costs:
- The context window. A model can only read so many tokens at once. Keep appending and eventually the conversation no longer fits — the call fails or older content has to be dropped.
- Money and latency. You pay per token of input, and you resend the whole history every call. A conversation that grows to thousands of tokens means every single step re-sends thousands of tokens. Cost and latency climb with the length of the run.
So an agent that does real, multi-step work will outgrow a naive “just keep appending” approach. You need ways to keep the list bounded — which is exactly truncation (Lesson 2) and summarization (Lesson 3).
Problem Two: It Vanishes When the Session Ends
The messages list lives in a variable, in memory, for the duration of one run. When that run ends, the list is gone. Start a new session tomorrow and the agent knows nothing about today — not the user’s name, not their preferences, not what you already figured out together.
For a one-off task that’s fine. But many agents should remember things across sessions: “this traveler is vegetarian,” “this customer’s account ID,” “we already tried that approach and it failed.” That kind of durable memory can’t live in a per-run list — it needs to be stored somewhere and retrieved later. That’s long-term memory, which you’ll build with a vector store in Lesson 4.
Two Kinds of Memory
Those two problems map onto the two kinds of memory an agent uses, and it’s worth holding the distinction clearly:
- Short-term memory is the
messageslist: the full, detailed transcript of this conversation. It’s complete and immediate, but temporary and size-limited. - Long-term memory is a store of durable facts — notes the agent should remember beyond a single run. It’s persistent and searchable, but you only pull the relevant pieces into context when you need them, rather than carrying everything.
Real agents use both: a managed short-term transcript for the current task, plus a long-term store they write to and read from across sessions. The rest of this module builds each one.
State is just data you carry between turns
“Memory” and “state” sound abstract, but for an agent they’re concrete: state is whatever data you carry from one turn (or session) to the next. Short-term state is the messages list you already maintain; long-term state is whatever you persist to a store. There’s no magic — designing an agent’s memory is really just deciding what to keep, where, and for how long.
Practice Exercises
Exercise 1: Where does the agent’s memory live?
A teammate asks, “Where is my agent storing the conversation so far? Do I need a database?” For the current run, what’s the honest answer?
Hint
For the current run, the conversation lives in the messages list you pass to messages.create each turn — an ordinary Python list in memory. No database is needed for short-term memory; the transcript is the memory. You only need a store (like a database or vector store) for memory that must survive across sessions.
Exercise 2: Diagnose the failure
An agent works fine for short chats but, deep into a long multi-tool session, starts failing or getting very slow and expensive. What’s the likely cause, and which problem from this lesson is it?
Hint
The messages list has grown large, so every call resends a huge history — driving up cost and latency, and eventually approaching or exceeding the context window. That’s Problem One (it grows without bound), which truncation and summarization address.
Exercise 3: Short-term or long-term?
For each, say whether it belongs in short-term (the messages list) or long-term (a store): (a) the tool result from two steps ago in this run; (b) “this user always flies economy”; (c) the user’s question you’re answering right now.
Hint
(a) short-term — it’s part of this run’s transcript; (b) long-term — a durable fact you’d want in future sessions; (c) short-term — it’s the current turn in the messages list. The rule of thumb: this-conversation detail is short-term; facts worth remembering next time are long-term.
Summary
An agent’s short-term memory is simply the messages list the loop appends to and resends every call — there’s no separate subsystem, the transcript is the memory. That simplicity creates two problems. First, the list grows without bound, eventually overflowing the context window and driving up cost and latency, which truncation and summarization will fix. Second, it vanishes when the session ends, so anything the agent should remember across runs needs long-term memory — a persistent, searchable store you pull relevant facts from. Real agents use both: a managed short-term transcript and a long-term store. Memory design is really just deciding what to keep, where, and for how long.
Key Concepts
- The messages list is short-term memory — the full transcript of this run, resent each call.
- Grows without bound — long runs overflow the context window and cost more.
- Vanishes at session end — per-run state doesn’t persist.
- Long-term memory — a persistent, searchable store of durable facts, retrieved by relevance.
Why This Matters
Almost every “my agent forgot…” or “my agent got slow and expensive…” problem traces back to misunderstanding this one thing: short-term memory is a list you must manage, and long-term memory is a store you must build. Seeing memory as plain data you carry between turns demystifies it and tells you exactly what to do — bound the list so it stays affordable, and persist the facts worth keeping. The next lesson tackles the first half: keeping that growing list inside the context window.
Next Steps
Continue to Lesson 2 - Managing Context: Truncation and Budgets
Keep the messages list inside the context window by trimming old turns and watching your token budget.
Back to Module Overview
Return to the Memory and State module overview
Continue Building Your Skills
You can now see short-term memory for what it is — the running messages list — and you understand the two problems it creates. Next you’ll solve the first one: keeping that list inside the context window with truncation and an eye on your token budget.