Lesson 2 - Your First Claude Call

Welcome to Your First Claude Call

In Lesson 1 you built the mental model: a language model predicts the next token, works in tokens, sees a context window, and samples its output. Now you’ll put that model to work. By the end of this lesson you’ll have written real Python that sends a message to Claude and prints its reply — and, just as importantly, you’ll understand exactly what comes back.

The call itself is three lines. The interesting part is everything around it: installing the SDK, keeping your API key out of your code, and reading the response object so you know where the answer lives, why the model stopped, and what it cost. Get those right once and every later lesson builds on them without friction.

By the end of this lesson, you will be able to:

  • Install the Anthropic SDK and set your API key as an environment variable (never in your code)
  • Create a client and make a call with client.messages.create()
  • Read the response object — .content, .stop_reason, .usage, and the rest
  • Wrap the whole thing in a small reusable ask() helper

You’ll need Python and a terminal. Let’s begin.


Installing the SDK

The Anthropic SDK is a Python package that wraps the API in clean, typed Python so you don’t have to make raw HTTP requests by hand. Install it with pip:

pip install anthropic

That’s the only dependency you need for this lesson. To confirm it installed, you can check the version:

import anthropic
print(anthropic.__version__)
0.112.0

Your version number may be higher — the SDK is updated often, and that’s fine. Everything in this course uses stable, long-lived parts of the library.


Getting and Setting Your API Key

To talk to Claude, your code needs an API key — a secret string that identifies your account and bills usage to it. You get one from the Anthropic Console: sign in, open API Keys, and create a new key. Copy it somewhere safe the moment it’s shown, because the Console will not display the full key again.

Now the single most important rule in this lesson:

Never put your API key in your code

A key pasted directly into a .py file is a key you will eventually commit to Git, paste into a screenshot, or share by accident. Anyone who has it can spend money on your account. Keep the key in your environment, not in your source. The rest of this section shows you how.

The SDK is built around this rule. By default it looks for your key in an environment variable called ANTHROPIC_API_KEY. Set it in your terminal before running your script:

export ANTHROPIC_API_KEY="your-key-here"

Replace your-key-here with the real key from the Console. This sets the variable for your current terminal session. When the SDK starts up, it reads ANTHROPIC_API_KEY automatically — your Python code never has to mention the key at all.

For projects, a common and tidier alternative is a .env file plus the python-dotenv package. Put the key in a file named .env:

ANTHROPIC_API_KEY=your-key-here

Then load it at the top of your script, before you create the client:

from dotenv import load_dotenv

load_dotenv()  # reads .env and sets ANTHROPIC_API_KEY in the environment

Either way, the key lives outside your code. The one essential follow-up is to make sure the .env file never reaches version control.

Add .env to your .gitignore

If you use a .env file, add a line containing .env to your project’s .gitignore so Git refuses to track it. Use environment variables (or .env + python-dotenv) for the key, and never hardcode it. A leaked key should be deleted from the Console immediately and replaced — treat it like a password, because it is one.


Creating the Client

With the key in your environment, creating a client is a single line:

import anthropic

client = anthropic.Anthropic()

Notice there is no key in that code. anthropic.Anthropic() reads ANTHROPIC_API_KEY from the environment for you — which is exactly why the env-var approach is worth the small setup. The client object is your handle to the whole API; every call you make in this course goes through it.


Making Your First Call

The method you’ll use most is client.messages.create(). It sends a list of messages to a model and returns the model’s reply. Here is the smallest useful call:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=200,
    messages=[
        {"role": "user", "content": "In one sentence, what is a large language model?"}
    ],
)

print(response.content[0].text)
A large language model is an artificial intelligence system trained on vast amounts of text data to predict and generate human-like language.

That’s a real, working call to Claude. Three arguments do the work:

  • model — which model to use. We use claude-haiku-4-5 throughout this course: it is fast and inexpensive, which is exactly what you want while learning and experimenting.
  • max_tokens — the most tokens the model may generate in its reply. This is a hard cap (recall from Lesson 1 that output is measured in tokens). If the answer would be longer, it gets cut off. Set it high enough for the answer you expect; 200 is plenty for one sentence.
  • messages — the conversation, as a list. Each message is a dictionary with a role ("user" for you, "assistant" for the model) and content (the text). For a single question you send one user message.

The reply is in response.content[0].text. The next section explains why the answer is buried two levels deep like that — it’s not an accident.

Why the answer can vary

If you run this exact code, your sentence may differ from the one above. That’s the sampling from Lesson 1 at work — the model picks tokens with a little randomness, so the same prompt can produce slightly different wording each time. It’s the system working as designed, not a bug.


The Anatomy of the Response

response is not just a string — it’s a rich object with several fields, and learning to read them is most of what this lesson is about. Let’s print the important ones:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=200,
    messages=[
        {"role": "user", "content": "In one sentence, what is a large language model?"}
    ],
)

print("id:         ", response.id)
print("model:      ", response.model)
print("role:       ", response.role)
print("stop_reason:", response.stop_reason)
print("usage:      ", response.usage)
print("content:    ", response.content)
id:          msg_01TeWp413ZR9iZJWP4fLPjz9
model:       claude-haiku-4-5-20251001
role:        assistant
stop_reason: end_turn
usage:       Usage(cache_creation=CacheCreation(ephemeral_1h_input_tokens=0, ephemeral_5m_input_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', input_tokens=18, output_tokens=39, output_tokens_details=None, server_tool_use=None, service_tier='standard')
content:     [TextBlock(citations=None, text='A large language model is an artificial intelligence system trained on vast amounts of text data to predict and generate human language by learning statistical patterns in how words and concepts relate to each other.', type='text')]

Here is what each field tells you:

  • .id — a unique identifier for this message (msg_...). You’ll reference it in logs or when reporting an issue.
  • .model — the exact model that answered. You asked for claude-haiku-4-5; the API resolves that to a specific dated version, claude-haiku-4-5-20251001, and tells you which one ran.
  • .role — always assistant on a response. The model’s turn is the assistant’s turn.
  • .stop_reasonwhy the model stopped generating. end_turn means it finished naturally — it said everything it wanted to. The other one you’ll meet soon is max_tokens, which means it hit your output cap mid-answer and was cut off. Checking this field is how you catch truncated replies.
  • .usage — how many tokens this call used. The two you care about are input_tokens (your prompt) and output_tokens (the reply). This is what you’re billed on, and — straight from Lesson 1 — tokens, not words, are the unit. Here the prompt was 18 tokens and the answer 39.

Why content Is a List

Look closely at .content: it’s a list containing one TextBlock, and the actual answer is that block’s .text. That’s why the first call reached for response.content[0].textfirst block, then its text.

Why a list and not a plain string? Because a response can contain more than one block. For now every answer is a single text block, so content[0].text always works. But in later modules the model will return other kinds of blocks too — for example, when it decides to use a tool, that request arrives as its own block alongside any text. Designing content as a list from the start means your mental model doesn’t have to change when that day comes; you’ll just look at more of the list.

For this course, until tools appear, this is the line to remember:

answer = response.content[0].text

Inspect the object yourself

A fast way to learn any SDK is to print(response) or explore it in a Python REPL and poke at the fields. Everything you see above is plain Python — attributes you can read, lists you can index. There’s no magic hidden from you.


A Reusable ask() Helper

You’ll be making the same kind of call over and over: send one question, get one answer back as a string. Rather than repeat the whole messages.create(...) block every time, wrap it in a small helper function. This is the first reusable tool you’ll build, and you’ll lean on it for the rest of the module.

import anthropic

client = anthropic.Anthropic()


def ask(question):
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=300,
        messages=[{"role": "user", "content": question}],
    )
    return response.content[0].text


print(ask("What is the capital of Japan?"))
print("---")
print(ask("Give me a fun fact about octopuses."))
The capital of Japan is Tokyo. It is the country's largest city and has been the capital since 1868.
---
# Octopuses Have Blue Blood! 🐙

Instead of iron-based hemoglobin like humans, octopuses use a copper-based protein called hemocyanin to carry oxygen through their blood. This makes their blood blue instead of red!

Even cooler: this adaptation actually helps them survive in cold, low-oxygen ocean environments where they live.

The ask() function hides the boilerplate — the model name, the max_tokens, the message structure, the content[0].text unwrapping — behind a clean one-argument call. Give it a question string, get a plain answer string. The client is created once, outside the function, so every ask() call reuses it rather than building a new client each time.

This is deliberately bare-bones, and that’s the point: it’s the seed that later lessons grow. In Lesson 3 you’ll add a system prompt to steer its behavior; in Lesson 4 you’ll teach it to remember a conversation. For now, a single dependable helper is exactly enough.


Practice Exercises

Exercise 1: Make it your own

Copy the first-call example, change the question to something you actually want to know, and run it. Then change model to a deliberately misspelled string like "claude-haiku" and run it again to see what an error looks like.

Hint

A bad model name produces a NotFoundError (HTTP 404) — the API doesn’t recognize the model. Read the message; it tells you the model ID was the problem. Fixing it is just restoring the correct claude-haiku-4-5 string. Getting comfortable reading errors now pays off later.

Exercise 2: Watch the token count and stop reason

Call the model with a question whose answer is long (for example, “Explain how a rainbow forms”), but set max_tokens=30. Print response.stop_reason and response.usage. What stopped the model, and how many output tokens did it use?

Hint

With a tight max_tokens, the answer gets cut off and stop_reason becomes "max_tokens" instead of "end_turn". The output_tokens count will sit right at your cap. This is exactly how you detect a truncated reply in real code: check stop_reason, and raise max_tokens if it’s "max_tokens".

Exercise 3: Extend the helper

Add a second parameter to ask() so the caller can control the length: def ask(question, max_tokens=300). Pass it through to messages.create(). Then call ask("List 10 uses for a paperclip", max_tokens=500).

Hint

Only one line inside the function changes — max_tokens=max_tokens. The default value 300 means existing calls like ask("...") keep working unchanged, while new calls can request more room when they need it. This pattern — sensible defaults, optional overrides — is how you’ll grow ask() throughout the module.


Summary

You installed the Anthropic SDK, set your API key as an environment variable so it stays out of your code, created a client with anthropic.Anthropic(), and made a real call with client.messages.create(). Then you took the response apart: the answer lives at response.content[0].text, stop_reason tells you why generation ended, and usage reports the input and output token counts you’re billed on. Finally you wrapped it all in a reusable ask() helper.

Key Concepts

  • API key — your secret account credential; set it as the ANTHROPIC_API_KEY environment variable and never hardcode it.
  • Clientanthropic.Anthropic(), your handle to the API; it reads the key from the environment automatically.
  • messages.create() — the core call: takes model, max_tokens, and a messages list; returns a response object.
  • content — a list of blocks; the text answer is content[0].text. It’s a list because later responses can carry multiple blocks (e.g. tool requests).
  • stop_reason — why generation stopped: end_turn (finished naturally) or max_tokens (hit your cap, likely truncated).
  • usage — the token counts (input_tokens, output_tokens) that determine cost.

Why This Matters

Every technique in the rest of this course is a variation on this one call — adding a system prompt, conversation history, tools, or retrieved documents all happen inside messages.create(), and all return the same response object you just learned to read. Master this anatomy now and the advanced lessons are additions, not surprises.


Next Steps

Continue to Lesson 3 - System Prompts and Roles

Steer the model's behavior and persona with a system prompt, and understand the user and assistant roles.

Back to Module Overview

Return to the Working with LLMs in Python module overview


Continue Building Your Skills

You’ve gone from zero to a working, reusable call to Claude — and you can read everything that comes back. Next you’ll take control of how the model responds: a system prompt lets you set its role, tone, and rules before the user ever speaks.