Lesson 5 - Guided Project: Give Atlas Memory

Welcome to the Guided Project

Across this module you took memory apart piece by piece: the messages list is short-term memory; long runs overflow the context window unless you trim or summarize them; and durable facts that should survive a session belong in a searchable store. Now you’ll put it all back together on one agent. Atlas, the travel-planning assistant you’ve been building, is going to get both kinds of memory at once — a short-term transcript it keeps bounded through compaction, and a long-term VectorMemory it writes traveler preferences to and consults before it plans. By the end, Atlas will hold a long planning conversation without blowing its budget and recall “this traveler is vegetarian” from an earlier session, then plan around it.

By the end of this project, you will be able to:

  • Keep Atlas’s running conversation bounded by compacting old turns into a summary
  • Build a VectorMemory store Atlas can write durable traveler facts to
  • Search that store with the user’s request and prepend the relevant facts before Atlas plans
  • Combine short-term and long-term memory so Atlas both stays affordable and remembers across sessions

We’ll build it in stages, reusing the exact pieces you wrote earlier in the module. Let’s give Atlas a memory.


Stage 1: Short-Term Memory Is the Messages List — Plus Compaction

Start with what Atlas already has. The agent loop from Module 2 maintains one list, messages, and resends the whole thing every turn. That list is Atlas’s short-term memory — the full transcript of the current planning conversation. Nothing special, just data the loop carries forward.

The problem you met in Lessons 1 through 3 is that this list grows without bound. A long planning session — many destinations considered, many tool results — eventually pushes past the context window and makes every call slower and more expensive. The fix from Lesson 3 is compaction: replace the older turns with a short summary and keep only the most recent turns verbatim.

def compact_history(messages, summarize, keep_last=2):
    if len(messages) <= keep_last + 1:
        return messages
    old, recent = messages[1:-keep_last], messages[-keep_last:]
    summary_msg = {"role": "user", "content": f"[Summary of earlier conversation: {summarize(old)}]"}
    return [messages[0], summary_msg] + recent

This keeps messages[0] (the system or opening message) untouched, hands the older middle turns to a summarize function, and preserves the last couple of turns in full so the immediate context stays sharp. Call it whenever the transcript gets long, and Atlas’s short-term memory stays inside the window no matter how long the planning session runs. That handles “remembers this conversation, affordably.” Now for the part the transcript can’t do: remembering across sessions.


Stage 2: The VectorMemory — Atlas’s Long-Term Store

When the run ends, the messages list is gone. So anything Atlas should remember next time — the traveler is vegetarian, prefers hostels, hates cold winters — has to live somewhere durable that Atlas can search. That’s the VectorMemory you built in Lesson 4: each note is turned into a vector by embed(), and a search embeds the query and ranks notes by similarity. This is the dependency-free keyword version; in production you’d swap embed() for a real embedding model like sentence-transformers, and the class wouldn’t change.

import hashlib, re
import numpy as np

STOP = {"the","a","an","and","or","is","are","to","of","in","on","for","with",
        "i","you","my","me","do","does","what","how","much","want","like","mind",
        "traveler","trips","trip","stays","options","prefers","loves"}

def embed(text, dim=256):
    vec = np.zeros(dim)
    for tok in re.findall(r"[a-z]+", text.lower()):
        if tok in STOP:
            continue
        vec[int(hashlib.md5(tok.encode()).hexdigest(), 16) % dim] += 1.0
    norm = np.linalg.norm(vec)
    return vec / norm if norm else vec

class VectorMemory:
    def __init__(self): self.notes, self.vecs = [], []
    def add(self, note): self.notes.append(note); self.vecs.append(embed(note))
    def search(self, query, k=1):
        q = embed(query); sims = [float(q @ v) for v in self.vecs]
        order = sorted(range(len(sims)), key=lambda i: -sims[i])[:k]
        return [(self.notes[i], round(sims[i], 3)) for i in order]

add stores a note alongside its vector; search returns the top-k notes with their similarity scores. This is the store Atlas writes its durable facts into — separate from the transcript, persistent, and queried only when relevant.


Stage 3: Saving Durable Facts to Memory

A store is only useful if Atlas writes to it. The moment to save is when the traveler states a preference — that’s exactly the kind of fact worth keeping across sessions. Saving is just a call to add:

memory = VectorMemory()
memory.add("The traveler is vegetarian and loves ramen and street food.")
memory.add("The traveler prefers budget-friendly trips and cheap hostels.")

Each saved note is one durable fact. In a full agent you’d trigger these writes automatically — when the traveler says “I’m vegetarian,” Atlas records it — but the mechanism is the same: turn the fact into a sentence and add it. These two notes now live in long-term memory, ready to be recalled in any future session, long after the original messages list is gone.


Stage 4: Recalling Relevant Memory Before Atlas Plans

Here’s the move that ties it together. Before Atlas answers, it searches its memory with the user’s request, collects the relevant saved facts, and prepends them to the conversation as a note. That way Atlas’s plan respects what it already knows — without carrying every stored fact in the prompt, only the ones that matter for this request.

memory = VectorMemory()
memory.add("The traveler is vegetarian and loves ramen and street food.")
memory.add("The traveler prefers budget-friendly trips and cheap hostels.")

def recall(memory, user_message, k=2):
    hits = memory.search(user_message, k=k)
    facts = "; ".join(note for note, score in hits)
    return f"Relevant things you remember about this traveler: {facts}"

print(recall(memory, "suggest some food spots for my trip"))

Running it:

Relevant things you remember about this traveler: The traveler is vegetarian and loves ramen and street food.; The traveler prefers budget-friendly trips and cheap hostels.

Atlas searched its memory with the user’s request, pulled back the relevant saved facts, and assembled them into one context note. Prepend that note to the conversation — as a system or user message — and Atlas plans with the preferences in front of it: it will suggest vegetarian, budget food spots rather than generic ones. The keyword embed() here matches on shared words, but with real semantic embeddings (sentence-transformers) the recall fires even when the wording doesn’t overlap — “where should I eat” would still surface the ramen note.


Stage 5: Putting It Together — Atlas Uses a Remembered Preference

Now picture the full turn. A traveler returns for a new planning session — the old messages list is long gone — and asks Atlas to suggest food spots:

  1. Recall first. Atlas calls recall(memory, user_message) and gets back the note above: vegetarian, loves ramen and street food, budget-friendly.
  2. Prepend it. That note goes into messages ahead of the request, so the model sees the remembered preferences as context.
  3. Plan with it. Atlas answers — and because the preferences are right there in the prompt, it recommends vegetarian, wallet-friendly food spots instead of generic tourist restaurants. It remembered.
  4. Stay bounded. As the conversation continues and the transcript grows, compact_history summarizes the older turns so the running session never overflows the window.

That’s both kinds of memory working at once: the short-term transcript kept affordable by compaction, and the long-term store giving Atlas continuity across sessions. The traveler never re-explained they were vegetarian — Atlas already knew.

Every serious agent does both

The pattern you just built is the one real agents rely on. Keep the running transcript bounded — truncate or summarize old turns so the current conversation stays inside the context window and stays cheap. And keep durable facts in a store you search on demand — write what’s worth remembering, then pull back only the relevant pieces when a request comes in. Short-term memory you manage; long-term memory you build. Atlas now has both.


Practice Exercises

Exercise 1: Auto-save a stated preference

Right now you call memory.add(...) by hand. Make Atlas save a fact automatically when the traveler states a preference — for example, when a message contains “I’m vegetarian” or “I prefer.” How would you detect a preference and turn it into a stored note?

Hint

The simplest version is a keyword check: if the user’s message contains a phrase like “i prefer”, “i love”, “i’m”, or “i hate”, build a note from it and call memory.add(note). A more robust version asks the model itself — “Does this message state a durable preference? If so, summarize it as one sentence” — and adds whatever it returns. Either way, the save happens inside the loop, right after you read the user’s message.

Exercise 2: Persist the store to disk

The VectorMemory lives in a Python list, so it disappears when the program exits — the same problem as the messages list. Make it survive restarts by saving the notes to a file and reloading them on startup.

Hint

You only need to persist self.notes — the vectors can be recomputed by re-embeding each note on load. Add a save(path) that writes the notes as JSON, and a load(path) that reads them back and calls add on each (which rebuilds the vectors). On startup, load the file if it exists; on each new fact, save again. Now a preference saved today is recalled tomorrow.

Exercise 3: Swap in real embeddings and recall non-overlapping words

The keyword embed() only matches on shared words. Replace it with sentence-transformers so Atlas recalls a fact even when the query uses different words — for instance, recalling the ramen note from “where can I grab a bite?”

Hint

Install sentence-transformers, load a model like all-MiniLM-L6-v2, and replace embed(text) with model.encode(text). The VectorMemory class doesn’t change — add and search still call embed and take a dot product. Now “grab a bite” lands near “ramen and street food” in vector space even with no shared words, because the embeddings capture meaning, not just tokens. That’s the whole point of using a vector store over keyword matching.


Summary

You gave Atlas both kinds of memory and saw them work together. Short-term memory is the messages list the loop already maintains; you kept it bounded with compact_history, which summarizes older turns and preserves the most recent ones so long planning sessions stay inside the context window. Long-term memory is a VectorMemory store: Atlas adds durable traveler facts to it, and before planning it searches the store with the user’s request, assembles the relevant hits into a context note with recall, and prepends that note so its plan respects what it remembers. The verified recall returned the vegetarian and budget-friendly facts for a food query — exactly the preferences Atlas should plan around. Together, compaction keeps Atlas affordable within a session, and the vector store gives it continuity across sessions.

Key Concepts

  • Compaction for short-term memory — summarize old turns, keep the recent ones, so the transcript stays bounded.
  • VectorMemory as long-term storeadd durable facts, search returns the most relevant by similarity.
  • Save on preference — write a fact to the store the moment the traveler states one.
  • Recall before planning — search with the user’s request and prepend the hits so the plan respects remembered facts.

Why This Matters

This is the shape of memory in real agents: a managed transcript plus a searchable store. Get it right and your agent stays cheap on long tasks and feels like it knows the user — it stops re-asking things it was already told and starts building on what it learned. Get it wrong and you either overflow the context window or build an agent with amnesia. Atlas now avoids both, which is the foundation every harder behavior — multi-step planning, reasoning, reflection — is built on top of.


Next Steps

Continue to Module 5 - Planning and Reasoning

Decomposition, ReAct-style reasoning, and reflection for harder multi-step tasks.

Back to Module Overview

Return to the Memory and State module overview


Continue Building Your Skills

Atlas can now hold a long conversation without losing its budget and recall a traveler’s saved preferences across sessions — both halves of agent memory, working at once. With memory in place, the next module turns to how Atlas decides what to do: breaking hard requests into steps, reasoning through them, and reflecting on its own work. On to planning and reasoning.