Lesson 4 - Multi-Turn Conversations

Welcome to Multi-Turn Conversations

You have made single calls to Claude and seen exactly what comes back. But a single question-and-answer is not a conversation. A conversation is a back-and-forth where each new message can depend on everything said before it. To build that, you need the model to remember — and here is the surprise that trips up nearly everyone the first time: the API remembers nothing.

In this lesson you will see that statelessness firsthand, then learn the one technique that turns a memoryless endpoint into a chatbot that recalls your name twenty turns later. It is simpler than you might expect, and it ties directly back to the context window from Lesson 1.

By the end of this lesson, you will be able to:

  • Explain why the Messages API is stateless and what that means for your code
  • Build a growing messages list that gives the model memory of earlier turns
  • Wrap the pattern in a small, reusable Conversation helper class
  • Predict how conversation cost grows as the history gets longer

You should be comfortable making a basic call from Lesson 2. Let’s begin.


The API Has No Memory

When you call client.messages.create(...), the model reads the messages you send, produces a reply, and then forgets the entire exchange. There is no session, no server-side history, no hidden thread tying one call to the next. Each request is judged entirely on its own.

The cleanest way to feel this is to not give the model any history and watch it come up empty. Suppose an earlier conversation established a name — but this call doesn’t include it:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=200,
    messages=[
        {"role": "user", "content": "What's my name and what do I love?"},
    ],
)
print(response.content[0].text)
I don't have any information about you. I don't have access to previous
conversations, personal data, or any identifying information about who you
are.

If you'd like to tell me your name and what you love, I'm happy to listen!
Feel free to share.

The model is not being difficult — it genuinely has nothing to go on. The messages list held a single question and no earlier context, so that question is the entire universe the model can see. This is the core mental model for the rest of the lesson: the model knows only what is in the messages list you send on this call. Memory is not something the API provides. Memory is something you provide, by choosing what to put in that list.

Stateless is a feature, not a limitation

A stateless API is easy to scale, easy to reason about, and easy to debug — the response depends only on the request, nothing hidden. The cost is that conversation memory becomes your responsibility. The rest of this lesson is about meeting that responsibility cleanly.


Giving the Model Memory

If the model only sees what you send, the fix is direct: send the whole conversation every time. You keep a Python list of messages, and on each turn you (1) append the new user message, (2) call the API with the full list, and (3) append the model’s reply back onto the list so it is there for next time.

That third step is the one people forget. The model’s answer has to become part of the history, or the next turn won’t know it ever happened.

A two-turn memory test

Let’s prove memory works. In turn 1 we tell the model a name and an interest. In turn 2 we ask it to recall them — and because the history is resent, it can:

import anthropic

client = anthropic.Anthropic()
messages = []

# Turn 1: tell the model something
messages.append({"role": "user", "content": "My name is Ada and I love number theory."})
response = client.messages.create(model="claude-haiku-4-5", max_tokens=300, messages=messages)
print("Turn 1:", response.content[0].text)

# Append the assistant's reply back into the history
messages.append({"role": "assistant", "content": response.content})

# Turn 2: ask it to recall
messages.append({"role": "user", "content": "What's my name and what do I love?"})
response = client.messages.create(model="claude-haiku-4-5", max_tokens=300, messages=messages)
print("Turn 2:", response.content[0].text)
Turn 1: Hi Ada! It's great to meet you. Number theory is a fascinating
field—there's something deeply elegant about the properties of integers and
the patterns they follow.
...
Turn 2: Your name is Ada, and you love number theory!

Turn 2 answers correctly — not because the model remembered, but because we reminded it by resending turn 1 inside the messages list. Compare this against the first example in this lesson, where the very same turn-2 question produced “I don’t have any information about you.” The only difference is the history. Resending the history is the entire mechanism behind conversation memory.

Appending response.content, not a string

Notice that we appended response.content — the model’s reply blocks — back as an assistant turn, not a hand-written string:

messages.append({"role": "assistant", "content": response.content})

response.content is the list of content blocks the model actually produced. Passing it back verbatim is the safe, faithful way to record what the assistant said, and it keeps the structure the API expects. Each turn the list grows by two entries: one user message and one assistant message. That alternating, ever-growing list is the conversation.


A Reusable Conversation Class

Appending messages by hand works, but you will repeat the same three steps on every turn. A small helper class hides the bookkeeping behind a single say() method, while still storing the full history internally. Note that the system prompt stays constant across every turn — it sets the assistant’s role once and never changes.

import anthropic


class Conversation:
    def __init__(self, system=None, model="claude-haiku-4-5"):
        self.client = anthropic.Anthropic()
        self.model = model
        self.system = system          # constant across all turns
        self.messages = []            # the growing history

    def say(self, text, max_tokens=300):
        # 1. Append the new user message
        self.messages.append({"role": "user", "content": text})

        kwargs = {
            "model": self.model,
            "max_tokens": max_tokens,
            "messages": self.messages,
        }
        if self.system:
            kwargs["system"] = self.system

        # 2. Call the API with the full history
        response = self.client.messages.create(**kwargs)

        # 3. Append the assistant's reply back into the history
        self.messages.append({"role": "assistant", "content": response.content})
        return response.content[0].text


chat = Conversation(system="You are a concise math tutor. Keep answers to one or two sentences.")
print(chat.say("My name is Ada and I love number theory."))
print(chat.say("Given that, suggest one topic I'd enjoy."))
print(chat.say("Remind me — what's my name?"))
print("Turns stored:", len(chat.messages))
Nice to meet you, Ada! Number theory is a fascinating field—from primes and
divisibility to modular arithmetic and Diophantine equations. What area of
number theory interests you most?
You might really enjoy Fermat's Little Theorem and modular arithmetic—they're
elegant tools that unlock patterns in number theory and have beautiful proofs
that feel like "aha!" moments.
Your name is Ada!
Turns stored: 6

Three calls to say(), three coherent replies — the second references number theory from the first, the third recalls the name from the first. After three exchanges, self.messages holds 6 entries (three user, three assistant). The class is doing nothing magical: it is the same append-call-append loop you wrote by hand, wrapped so you never have to think about it again.

Keep system constant

The system prompt is the assistant’s standing instructions — its role, tone, and rules. Set it once when you create the Conversation and leave it fixed. Changing it mid-conversation is rarely what you want, and (as you’ll learn in later modules) keeping it stable also helps the API cache and reuse work across turns.


The Cost of Remembering

Resending the whole history is what makes memory work — and it is also what makes long conversations expensive. Every turn, the entire list of prior messages is sent again, and you are billed for every input token on every call. The history only grows, so the per-turn input cost grows with it.

You can watch this happen with count_tokens, the same tool from Lesson 1. Here we count the input tokens of the history at turn 1, then again once two earlier messages have been added:

import anthropic

client = anthropic.Anthropic()
running = []

running.append({"role": "user", "content": "My name is Ada and I love number theory."})
count = client.messages.count_tokens(model="claude-haiku-4-5", messages=running)
print("Turn 1 — history size:", count.input_tokens, "input tokens")

running.append({"role": "assistant", "content": "Nice to meet you, Ada! Number theory is a beautiful field."})
running.append({"role": "user", "content": "What's my name and what do I love?"})
count = client.messages.count_tokens(model="claude-haiku-4-5", messages=running)
print("Turn 2 — history size:", count.input_tokens, "input tokens")
Turn 1 — history size: 17 input tokens
Turn 2 — history size: 47 input tokens

The history nearly tripled — from 17 to 47 tokens — after just one round-trip, because turn 2 must resend turn 1’s question, the assistant’s reply, and the new question. Multiply that across dozens of turns and a long chat, and the input grows steadily every time.

This ties straight back to Lesson 1. The whole history has to fit inside the context windowclaude-haiku-4-5’s 200,000-token budget — alongside room for the reply. A conversation that runs long enough will eventually press against that ceiling, and every token in it is billed on every turn. That is exactly why later modules introduce techniques to send a summary of old turns instead of the raw history: you keep the useful memory while cutting the token bill. For now, the takeaway is the trade-off itself.

  • Memory comes from resending history — there is no other source.
  • Resent history is billed every turn — longer conversations cost more per call.
  • History must fit the context window — it cannot grow without limit.

Why This Matters for Building

Almost everything that feels like “the AI remembers” — chatbots, assistants, agents that carry context across many steps — is built on the plain mechanic in this lesson: a messages list that you grow and resend. Get this right and the rest of the course’s agent and tool work has a solid foundation. Get it wrong — forget to append the assistant turn, or assume the server is holding state — and your bot will mysteriously “lose its memory” mid-conversation.

The cost angle matters just as much. Because you pay for the full history every turn, conversation length is a budget decision, not just a UX one. Knowing why tokens accumulate is the first step toward managing them, which is what Modules 5 through 7 are about.


Practice Exercises

Exercise 1: Break memory on purpose

Take the two-turn memory test and delete the line that appends response.content back as an assistant turn. Run it again and observe what the model says when you ask it to recall the name in turn 2. Explain in one sentence why the answer changes.

Hint

Without that append, turn 2’s messages list contains only the two user questions — the model never sees its own turn-1 reply, and turn 1’s content is still there but the conversation flow is broken. Try also deleting turn 1 entirely to reproduce the “I don’t have any information about you” response.

Exercise 2: Add a reset() method

Extend the Conversation class with a reset() method that clears the history so you can start a fresh conversation without creating a new object. Confirm that after reset(), the model can no longer recall something you told it before the reset.

Hint

Resetting is just self.messages = []. Keep self.system untouched — the assistant’s role should survive a reset; only the conversation history is cleared.

Exercise 3: Measure the growth

Using count_tokens, have a Conversation exchange five turns, and after each call record count_tokens(...).input_tokens for the current self.messages. Print the five numbers. Do they only ever increase?

Hint

Call client.messages.count_tokens(model=..., messages=chat.messages) right after each say(). The input count should rise every turn, since the history never shrinks — that is the cost trade-off made visible.


Summary

The Claude Messages API is stateless: each call sees only the messages you send and forgets everything afterward. To hold a conversation, you maintain a messages list that grows every turn — append the user’s message, call the API, then append the model’s response.content back as an assistant turn — and resend the whole list each time. A small Conversation class makes this effortless while keeping the system prompt constant. The catch is cost: because the full history is resent and billed every turn, longer conversations use more tokens and must still fit inside the context window.

Key Concepts

  • Stateless API — each request is independent; the server keeps no conversation memory between calls.
  • Message history — the growing list of user and assistant turns you resend on every request to give the model memory.
  • Appending response.content — recording the model’s reply back into the history so future turns can see it.
  • History grows → tokens grow — every turn resends and is billed for the full prior conversation, which must fit the context window.

Why This Matters

Every “memory” in an LLM application — from a help-desk bot to a multi-step agent — is this pattern underneath. Understanding that memory is something you supply (and pay for) every turn is what lets you build conversational systems that are both coherent and affordable.


Next Steps

Continue to Lesson 5 - Tokens, Cost, and Streaming

Measure and budget tokens, understand pricing, and stream responses so users see output as it's generated.

Back to Module Overview

Return to the Working with LLMs in Python module overview


Continue Building Your Skills

You can now turn a memoryless endpoint into a real conversation: grow a messages list, resend it each turn, and keep the system prompt fixed. Next you’ll look closely at the tokens flowing through those calls — how to count them, what they cost, and how to stream replies so your users never stare at a blank screen.