Lesson 4 - Long-Term Memory with a Vector Store
Welcome to Long-Term Memory with a Vector Store
In Lesson 1 you saw the limit clearly: the messages list vanishes when the session ends. Start a new run tomorrow and the agent knows nothing about today — not that the traveler is vegetarian, not that they hate the cold, not what you already worked out together. The fixes from Lessons 2 and 3 keep the current conversation inside the context window, but they do nothing for facts that should outlive a single run. That’s what long-term memory is for, and the standard way to build it is a vector store.
The idea is small once you see it. Store each durable fact as a short text note. To recall something, don’t try to match exact words — find the notes most relevant to what you need right now by comparing meaning. A vector store does exactly that: it turns each note into a vector (an embedding), turns your query into a vector too, and returns the notes whose vectors are closest. This lesson builds a tiny one you can run with only numpy, then shows the one-line swap to real semantic embeddings.
By the end of this lesson, you will be able to:
- Explain why long-term memory stores notes and retrieves them by relevance, not exact match
- Build a small
VectorMemorywith anadd(note)andsearch(query, k)interface - Run a dependency-free version using a keyword embedding
- Swap in
sentence-transformersorchromadbfor real semantic search without changing the interface - Wire memory into an agent: write durable notes, search before answering
This is the second half of the memory story Lesson 1 opened. Let’s build it.
The Idea: Store Notes, Retrieve by Relevance
Long-term memory has two jobs: keep the facts worth remembering, and find the relevant ones later. The keeping part is easy — write each fact as a short note and save it. The finding part is where the design choice lives.
You could store notes in a list and grep them by keyword, but that breaks the moment the words don’t match. A note says “the traveler is vegetarian” and the agent later asks “what does she eat?” — there’s not a single shared word, yet they’re clearly about the same thing. Exact-match retrieval misses it. What you actually want is retrieval by meaning.
A vector store gets you that. The trick is to convert text into a vector — a list of numbers — such that texts with similar meaning land near each other in that number space. That conversion is called an embedding. Once every note is a vector, recall is just geometry:
- Embed each note as you add it, and keep the vectors alongside the notes.
- Embed the query the same way when you need to recall something.
- Return the nearest notes by cosine similarity — the closer two unit vectors point in the same direction, the more related their meaning.
The interface we want is deliberately tiny: an add(note) to store a fact and a search(query, k) to pull back the k most relevant notes. Everything else is an implementation detail of how embed() works — and that detail is exactly what you’ll swap when you move from a demo to production.
A Runnable VectorMemory with add and search
Here’s a complete, dependency-free version. The only thing it needs beyond the standard library is numpy. Read the comment on embed() carefully — it is a keyword embedding, chosen so the demo runs anywhere; it is not the real semantic version (that comes next).
import hashlib, re
import numpy as np
# A simple KEYWORD embedding: bag-of-words over a hashed vocabulary, with common
# words removed. Good enough to run with only numpy; for real SEMANTIC search,
# swap embed() for a sentence-embedding model (see the note below).
STOP = {"the", "a", "an", "and", "or", "is", "are", "to", "of", "in", "on", "for",
"with", "i", "you", "my", "me", "do", "does", "what", "how", "much",
"want", "like", "mind", "traveler", "trips", "trip", "stays", "options",
"prefers", "loves"}
def embed(text, dim=256):
vec = np.zeros(dim)
for tok in re.findall(r"[a-z]+", text.lower()):
if tok in STOP:
continue
i = int(hashlib.md5(tok.encode()).hexdigest(), 16) % dim
vec[i] += 1.0
norm = np.linalg.norm(vec)
return vec / norm if norm else vec
class VectorMemory:
def __init__(self):
self.notes, self.vecs = [], []
def add(self, note):
self.notes.append(note)
self.vecs.append(embed(note))
def search(self, query, k=1):
q = embed(query)
sims = [float(q @ v) for v in self.vecs]
order = sorted(range(len(sims)), key=lambda i: -sims[i])[:k]
return [(self.notes[i], round(sims[i], 3)) for i in order]Walk through what each piece does. embed() lowercases the text, splits it into word tokens, drops common “stop” words that carry little meaning, and hashes each remaining word into one of dim slots — counting how often each slot is hit. Normalizing to unit length means a dot product (q @ v) is the cosine similarity directly. VectorMemory.add() stores the note and its vector side by side. VectorMemory.search() embeds the query, scores it against every stored vector, and returns the top k notes with their similarity scores.
Now store a few facts and ask for the relevant one:
mem = VectorMemory()
mem.add("The traveler is vegetarian and loves ramen and street food.")
mem.add("The traveler prefers budget-friendly trips and cheap hostels.")
mem.add("The traveler dislikes cold winter weather.")
for query in ["vegetarian food preferences",
"budget and cheap accommodation",
"cold winter weather"]:
note, score = mem.search(query, k=1)[0]
print(f"{query!r} -> {note} (score {score})")Output:
'vegetarian food preferences' -> The traveler is vegetarian and loves ramen and street food. (score 0.577)
'budget and cheap accommodation' -> The traveler prefers budget-friendly trips and cheap hostels. (score 0.577)
'cold winter weather' -> The traveler dislikes cold winter weather. (score 0.866)Each query pulls back exactly the note it’s about, with a similarity score attached. Notice the scores differ: “cold winter weather” shares two strong words (“cold”, “weather”) with its note, so it scores high (0.866); the others share one meaningful word each, so they land lower (0.577) but still win their match. This is the whole mechanism — embed, score, rank — and it already works. But it works because the queries happen to share words with their notes. That’s the limitation we fix next.
The Production Swap: Real Semantic Embeddings
The keyword embed() matches on shared words. Real long-term memory needs to match on meaning, so that “what does she eat?” finds the vegetarian note even though they share no words. The fix is to replace embed() with a trained sentence-embedding model — and only embed() changes. add() and search(), the entire interface, stay exactly as they are.
# Production: real semantic embeddings. Only embed() changes; add()/search() stay the same.
from sentence_transformers import SentenceTransformer
_model = SentenceTransformer("all-MiniLM-L6-v2")
def embed(text):
return _model.encode(text, normalize_embeddings=True)
# Or use a vector database (e.g. chromadb) that handles storage + search for you.all-MiniLM-L6-v2 is a small, fast, widely used sentence-embedding model. It maps text into a space where sentences about the same idea sit close together regardless of the exact words — so “the traveler is vegetarian” and “what does she eat?” end up near each other, and search() returns the right note with no shared vocabulary at all. Because the function signature is unchanged, your VectorMemory keeps working line for line; you have simply upgraded how it understands text.
For anything beyond a toy, you’d reach for a vector database like chromadb instead of two parallel Python lists. It handles embedding, storage, persistence to disk, and nearest-neighbor search for you — but the mental model is identical to what you just built: add a note, search for the relevant ones. The DataTweets Generative AI course covers sentence-transformers and chromadb in depth, including how embeddings are trained and how to run a real vector store; here, the point is that long-term memory is this add/search shape, and choosing a backend is choosing your embed.
Keyword matches words; embeddings match meaning
The demo’s keyword embed() only finds a note when the query shares actual words with it — that’s why “cold winter weather” works but “what does she eat?” would not. A real sentence-embedding model matches on meaning, so “what does she eat?” still retrieves the vegetarian note despite zero shared words. That is the entire point of the word semantic in “semantic search.” Swap embed() and the rest of VectorMemory — add, search, cosine ranking — is unchanged.
How the Agent Uses Memory
A vector store is only useful if the agent actually writes to it and reads from it at the right moments. Two simple habits make it work:
Write a note whenever the agent learns something durable. Not every line of conversation belongs in long-term memory — most of it is this-run detail that the messages list already holds. But when a fact surfaces that should survive the session, store it:
mem.add("The traveler is vegetarian.")
mem.add("The traveler dislikes cold winter destinations.")Good candidates are stable facts about the user or the task: preferences (“flies economy”), constraints (“vegetarian”), identifiers (“account ID 4471”), and conclusions you reached together (“we tried the overnight train and it was too slow”). Transient details — the weather today, the exact phrasing of one question — are not worth storing.
Search the store before answering. At the start of a turn, embed what the user is asking about and pull back the most relevant notes, then fold them into the prompt as context:
hits = mem.search(user_message, k=3)
recalled = "\n".join(f"- {note}" for note, score in hits)
context = f"Relevant things you know about this traveler:\n{recalled}"
# prepend `context` to the messages you send to the modelThis is the key difference from short-term memory: you do not carry the whole store into the prompt. You retrieve only the few notes relevant to the current need and inject those. The store can grow to thousands of notes across many sessions, yet each turn pays for just the handful that matter. That’s what makes long-term memory both durable and affordable — store everything worth keeping, recall only what’s relevant. In the next lesson you’ll wire exactly this write-and-search loop into Atlas, your travel agent.
Practice Exercises
Exercise 1: Why vectors instead of a keyword list?
A teammate proposes storing notes in a plain list and recalling them with a substring or keyword match. Why does the lesson embed notes as vectors and rank by cosine similarity instead?
Hint
Keyword matching only finds notes that share literal words with the query, so it misses paraphrases — “what does she eat?” would never match “the traveler is vegetarian.” Embedding text as vectors lets you rank notes by meaning (cosine similarity in the embedding space), which retrieves the relevant note even when the wording is completely different. That semantic recall is the whole reason to use a vector store.
Exercise 2: What do add and search each do?
In one or two sentences each, describe what VectorMemory.add(note) and VectorMemory.search(query, k) do, and why search returns a score alongside each note.
Hint
add(note) stores the note’s text and saves its embedding vector alongside it. search(query, k) embeds the query, computes cosine similarity against every stored vector, and returns the k highest-scoring notes. The score is that similarity value — it tells you how relevant each returned note is, so the agent can judge whether a match is strong enough to trust or ignore weak ones.
Exercise 3: What changes when you go to production?
You move from the dependency-free demo to real semantic search with sentence-transformers. Which part of the code changes, and which part stays the same — and what new capability do you gain?
Hint
Only embed() changes: you replace the keyword bag-of-words with a sentence-embedding model like all-MiniLM-L6-v2 (or hand storage/search to a vector database such as chromadb). The add()/search() interface and the cosine ranking stay identical. The new capability is matching on meaning rather than shared words, so a query like “what does she eat?” can still retrieve the vegetarian note with no overlapping vocabulary.
Summary
Long-term memory gives an agent facts that survive across sessions by storing durable notes and retrieving the relevant ones by meaning rather than exact words. A vector store makes that possible: embed each note as a vector, embed the query, and return the nearest by cosine similarity. You built a tiny VectorMemory with just add(note) and search(query, k), ran it dependency-free with a keyword embedding, and saw each query pull back its matching note with a similarity score. The production version changes only embed() — drop in sentence-transformers (all-MiniLM-L6-v2) or a vector database like chromadb — to match on meaning instead of shared words. The agent uses it by writing notes when it learns something durable and searching the store for the relevant few before answering, so the store can grow without inflating every prompt.
Key Concepts
- Notes as durable memory — store facts worth keeping as short text notes that outlive a session.
- Retrieval by meaning — embed notes and queries as vectors; rank by cosine similarity, not keyword match.
- The add/search interface —
add(note)stores a fact;search(query, k)returns thekmost relevant notes with scores. - Swap only embed() — keyword embedding for a dependency-free demo;
sentence-transformersorchromadbfor real semantic search, interface unchanged. - Write then search — record durable facts, and recall only the relevant ones before answering.
Why This Matters
Almost every “my agent forgot what I told it yesterday” problem is a missing long-term store. Once you see that long-term memory is just add a note and search for the relevant ones by meaning, the design becomes concrete: decide what’s durable enough to write, retrieve only what’s relevant to the current turn, and pick a backend by choosing your embed. That keeps an agent both knowledgeable across sessions and cheap per turn — and it’s the foundation for the guided project, where you give Atlas exactly this kind of memory.
Next Steps
Continue to Lesson 5 - Guided Project: Give Atlas Memory
Put it all together: wire short-term context management and a long-term vector store into your travel agent.
Back to Module Overview
Return to the Memory and State module overview
Continue Building Your Skills
You can now build long-term memory the way real agents do: store durable facts as notes, retrieve them by meaning with a vector store, and swap a keyword demo for sentence-transformers or chromadb without touching the add/search interface. Next you’ll bring everything in this module together in a guided project — giving Atlas both a managed short-term transcript and a persistent long-term store.