Lesson 4 - Multi-Turn Conversations
Welcome to Multi-Turn Conversations
You have made single calls to Claude and seen exactly what comes back. But a single question-and-answer is not a conversation. A conversation is a back-and-forth where each new message can depend on everything said before it. To build that, you need the model to remember — and here is the surprise that trips up nearly everyone the first time: the API remembers nothing.
In this lesson you will see that statelessness firsthand, then learn the one technique that turns a memoryless endpoint into a chatbot that recalls your name twenty turns later. It is simpler than you might expect, and it ties directly back to the context window from Lesson 1.
By the end of this lesson, you will be able to:
- Explain why the Messages API is stateless and what that means for your code
- Build a growing
messageslist that gives the model memory of earlier turns - Wrap the pattern in a small, reusable
Conversationhelper class - Predict how conversation cost grows as the history gets longer
You should be comfortable making a basic call from Lesson 2. Let’s begin.
The API Has No Memory
When you call client.messages.create(...), the model reads the messages you send, produces a reply, and then forgets the entire exchange. There is no session, no server-side history, no hidden thread tying one call to the next. Each request is judged entirely on its own.
The cleanest way to feel this is to not give the model any history and watch it come up empty. Suppose an earlier conversation established a name — but this call doesn’t include it:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=200,
messages=[
{"role": "user", "content": "What's my name and what do I love?"},
],
)
print(response.content[0].text)I don't have any information about you. I don't have access to previous
conversations, personal data, or any identifying information about who you
are.
If you'd like to tell me your name and what you love, I'm happy to listen!
Feel free to share.The model is not being difficult — it genuinely has nothing to go on. The messages list held a single question and no earlier context, so that question is the entire universe the model can see. This is the core mental model for the rest of the lesson: the model knows only what is in the messages list you send on this call. Memory is not something the API provides. Memory is something you provide, by choosing what to put in that list.
Stateless is a feature, not a limitation
A stateless API is easy to scale, easy to reason about, and easy to debug — the response depends only on the request, nothing hidden. The cost is that conversation memory becomes your responsibility. The rest of this lesson is about meeting that responsibility cleanly.
Giving the Model Memory
If the model only sees what you send, the fix is direct: send the whole conversation every time. You keep a Python list of messages, and on each turn you (1) append the new user message, (2) call the API with the full list, and (3) append the model’s reply back onto the list so it is there for next time.
That third step is the one people forget. The model’s answer has to become part of the history, or the next turn won’t know it ever happened.
A two-turn memory test
Let’s prove memory works. In turn 1 we tell the model a name and an interest. In turn 2 we ask it to recall them — and because the history is resent, it can:
import anthropic
client = anthropic.Anthropic()
messages = []
# Turn 1: tell the model something
messages.append({"role": "user", "content": "My name is Ada and I love number theory."})
response = client.messages.create(model="claude-haiku-4-5", max_tokens=300, messages=messages)
print("Turn 1:", response.content[0].text)
# Append the assistant's reply back into the history
messages.append({"role": "assistant", "content": response.content})
# Turn 2: ask it to recall
messages.append({"role": "user", "content": "What's my name and what do I love?"})
response = client.messages.create(model="claude-haiku-4-5", max_tokens=300, messages=messages)
print("Turn 2:", response.content[0].text)Turn 1: Hi Ada! It's great to meet you. Number theory is a fascinating
field—there's something deeply elegant about the properties of integers and
the patterns they follow.
...
Turn 2: Your name is Ada, and you love number theory!Turn 2 answers correctly — not because the model remembered, but because we reminded it by resending turn 1 inside the messages list. Compare this against the first example in this lesson, where the very same turn-2 question produced “I don’t have any information about you.” The only difference is the history. Resending the history is the entire mechanism behind conversation memory.
Appending response.content, not a string
Notice that we appended response.content — the model’s reply blocks — back as an assistant turn, not a hand-written string:
messages.append({"role": "assistant", "content": response.content})response.content is the list of content blocks the model actually produced. Passing it back verbatim is the safe, faithful way to record what the assistant said, and it keeps the structure the API expects. Each turn the list grows by two entries: one user message and one assistant message. That alternating, ever-growing list is the conversation.
A Reusable Conversation Class
Appending messages by hand works, but you will repeat the same three steps on every turn. A small helper class hides the bookkeeping behind a single say() method, while still storing the full history internally. Note that the system prompt stays constant across every turn — it sets the assistant’s role once and never changes.
import anthropic
class Conversation:
def __init__(self, system=None, model="claude-haiku-4-5"):
self.client = anthropic.Anthropic()
self.model = model
self.system = system # constant across all turns
self.messages = [] # the growing history
def say(self, text, max_tokens=300):
# 1. Append the new user message
self.messages.append({"role": "user", "content": text})
kwargs = {
"model": self.model,
"max_tokens": max_tokens,
"messages": self.messages,
}
if self.system:
kwargs["system"] = self.system
# 2. Call the API with the full history
response = self.client.messages.create(**kwargs)
# 3. Append the assistant's reply back into the history
self.messages.append({"role": "assistant", "content": response.content})
return response.content[0].text
chat = Conversation(system="You are a concise math tutor. Keep answers to one or two sentences.")
print(chat.say("My name is Ada and I love number theory."))
print(chat.say("Given that, suggest one topic I'd enjoy."))
print(chat.say("Remind me — what's my name?"))
print("Turns stored:", len(chat.messages))Nice to meet you, Ada! Number theory is a fascinating field—from primes and
divisibility to modular arithmetic and Diophantine equations. What area of
number theory interests you most?
You might really enjoy Fermat's Little Theorem and modular arithmetic—they're
elegant tools that unlock patterns in number theory and have beautiful proofs
that feel like "aha!" moments.
Your name is Ada!
Turns stored: 6Three calls to say(), three coherent replies — the second references number theory from the first, the third recalls the name from the first. After three exchanges, self.messages holds 6 entries (three user, three assistant). The class is doing nothing magical: it is the same append-call-append loop you wrote by hand, wrapped so you never have to think about it again.
Keep system constant
The system prompt is the assistant’s standing instructions — its role, tone, and rules. Set it once when you create the Conversation and leave it fixed. Changing it mid-conversation is rarely what you want, and (as you’ll learn in later modules) keeping it stable also helps the API cache and reuse work across turns.
The Cost of Remembering
Resending the whole history is what makes memory work — and it is also what makes long conversations expensive. Every turn, the entire list of prior messages is sent again, and you are billed for every input token on every call. The history only grows, so the per-turn input cost grows with it.
You can watch this happen with count_tokens, the same tool from Lesson 1. Here we count the input tokens of the history at turn 1, then again once two earlier messages have been added:
import anthropic
client = anthropic.Anthropic()
running = []
running.append({"role": "user", "content": "My name is Ada and I love number theory."})
count = client.messages.count_tokens(model="claude-haiku-4-5", messages=running)
print("Turn 1 — history size:", count.input_tokens, "input tokens")
running.append({"role": "assistant", "content": "Nice to meet you, Ada! Number theory is a beautiful field."})
running.append({"role": "user", "content": "What's my name and what do I love?"})
count = client.messages.count_tokens(model="claude-haiku-4-5", messages=running)
print("Turn 2 — history size:", count.input_tokens, "input tokens")Turn 1 — history size: 17 input tokens
Turn 2 — history size: 47 input tokensThe history nearly tripled — from 17 to 47 tokens — after just one round-trip, because turn 2 must resend turn 1’s question, the assistant’s reply, and the new question. Multiply that across dozens of turns and a long chat, and the input grows steadily every time.
This ties straight back to Lesson 1. The whole history has to fit inside the context window — claude-haiku-4-5’s 200,000-token budget — alongside room for the reply. A conversation that runs long enough will eventually press against that ceiling, and every token in it is billed on every turn. That is exactly why later modules introduce techniques to send a summary of old turns instead of the raw history: you keep the useful memory while cutting the token bill. For now, the takeaway is the trade-off itself.
- Memory comes from resending history — there is no other source.
- Resent history is billed every turn — longer conversations cost more per call.
- History must fit the context window — it cannot grow without limit.
Why This Matters for Building
Almost everything that feels like “the AI remembers” — chatbots, assistants, agents that carry context across many steps — is built on the plain mechanic in this lesson: a messages list that you grow and resend. Get this right and the rest of the course’s agent and tool work has a solid foundation. Get it wrong — forget to append the assistant turn, or assume the server is holding state — and your bot will mysteriously “lose its memory” mid-conversation.
The cost angle matters just as much. Because you pay for the full history every turn, conversation length is a budget decision, not just a UX one. Knowing why tokens accumulate is the first step toward managing them, which is what Modules 5 through 7 are about.
Practice Exercises
Exercise 1: Break memory on purpose
Take the two-turn memory test and delete the line that appends response.content back as an assistant turn. Run it again and observe what the model says when you ask it to recall the name in turn 2. Explain in one sentence why the answer changes.
Hint
Without that append, turn 2’s messages list contains only the two user questions — the model never sees its own turn-1 reply, and turn 1’s content is still there but the conversation flow is broken. Try also deleting turn 1 entirely to reproduce the “I don’t have any information about you” response.
Exercise 2: Add a reset() method
Extend the Conversation class with a reset() method that clears the history so you can start a fresh conversation without creating a new object. Confirm that after reset(), the model can no longer recall something you told it before the reset.
Hint
Resetting is just self.messages = []. Keep self.system untouched — the assistant’s role should survive a reset; only the conversation history is cleared.
Exercise 3: Measure the growth
Using count_tokens, have a Conversation exchange five turns, and after each call record count_tokens(...).input_tokens for the current self.messages. Print the five numbers. Do they only ever increase?
Hint
Call client.messages.count_tokens(model=..., messages=chat.messages) right after each say(). The input count should rise every turn, since the history never shrinks — that is the cost trade-off made visible.
Summary
The Claude Messages API is stateless: each call sees only the messages you send and forgets everything afterward. To hold a conversation, you maintain a messages list that grows every turn — append the user’s message, call the API, then append the model’s response.content back as an assistant turn — and resend the whole list each time. A small Conversation class makes this effortless while keeping the system prompt constant. The catch is cost: because the full history is resent and billed every turn, longer conversations use more tokens and must still fit inside the context window.
Key Concepts
- Stateless API — each request is independent; the server keeps no conversation memory between calls.
- Message history — the growing list of
userandassistantturns you resend on every request to give the model memory. - Appending
response.content— recording the model’s reply back into the history so future turns can see it. - History grows → tokens grow — every turn resends and is billed for the full prior conversation, which must fit the context window.
Why This Matters
Every “memory” in an LLM application — from a help-desk bot to a multi-step agent — is this pattern underneath. Understanding that memory is something you supply (and pay for) every turn is what lets you build conversational systems that are both coherent and affordable.
Next Steps
Continue to Lesson 5 - Tokens, Cost, and Streaming
Measure and budget tokens, understand pricing, and stream responses so users see output as it's generated.
Back to Module Overview
Return to the Working with LLMs in Python module overview
Continue Building Your Skills
You can now turn a memoryless endpoint into a real conversation: grow a messages list, resend it each turn, and keep the system prompt fixed. Next you’ll look closely at the tokens flowing through those calls — how to count them, what they cost, and how to stream replies so your users never stare at a blank screen.