Lesson 4 - Securing and Serving

Welcome to Securing and Serving

So far every call you’ve made has run in your own script, with your own key, against input you typed yourself. The moment you put an AI app in front of other people, two new responsibilities arrive together. The first is security: your API key is a credential that bills to your account, and the requests now come from people you don’t control. The second is serving: a script that runs once on your laptop isn’t a product — a product answers over the network, with a defined shape, predictable status codes, and graceful behavior when the provider hiccups. This lesson handles both, because in a real service they live in the same request path.

Neither concern requires new AI knowledge. It’s the ordinary engineering that turns your model call into something you’d trust with a real key and real users.

By the end of this lesson, you will be able to:

  • Keep your API key out of code by reading it from the environment, and explain why hardcoding or committing a key is dangerous
  • Validate untrusted user input before it reaches the model (reject empty, oversized, or malformed requests)
  • Serve your model behind a FastAPI endpoint with typed request and response schemas
  • Map provider errors to proper HTTP status codes so callers get a predictable contract

You’ll build on the calls you already know how to make. Let’s secure them, then serve them.


Keeping Your Key Out of Code

Your API key is the most sensitive thing in your project. Anyone who has it can spend your money. The single most common way keys leak is the simplest: someone hardcodes the key into a source file, commits it, and pushes it to a repository. Once a key reaches a commit history — even a private one, even one you later delete — treat it as compromised. The fix is to never put it in code in the first place.

The Anthropic SDK is designed for exactly this. When you write anthropic.Anthropic() with no arguments, the client reads the ANTHROPIC_API_KEY environment variable for you. Your code never sees the literal key; it only references a name.

import os
import anthropic

# The SDK reads ANTHROPIC_API_KEY from the environment automatically.
client = anthropic.Anthropic()

# You can confirm the variable is present without ever printing its value.
print("key configured:", "ANTHROPIC_API_KEY" in os.environ)
key configured: True

Notice what this code does not do: it never writes the key, never prints the key, and never accepts the key as a literal string. It checks only for presence. That presence check is the most you should ever do with a key in code — confirm it exists, then let the SDK use it.

How does the variable get set? In development, many teams keep secrets in a .env file (loaded by a library like python-dotenv) and add .env to .gitignore so it is never committed. In production, you set the variable through your platform’s secrets manager or environment configuration — the same name, ANTHROPIC_API_KEY, but supplied by the deployment environment rather than a file. Either way, the rule is the same: the key lives outside your source tree, and your code reads it by name.

Never commit a key — and rotate it the moment one leaks

Add .env (and any other secret file) to .gitignore before you write your first key into it. If a key ever lands in a commit, a log, a screenshot, or a chat message, rotating it — revoking the old one and issuing a new one in your provider console — is not optional. A leaked key is a live credential until you revoke it, no matter how briefly it was exposed.


Validating Untrusted Input

Once strangers can send requests, you can no longer assume the input is well-formed. Someone will send an empty string. Someone will send fifty thousand words. Someone will send the wrong field entirely. Every one of those reaches your model — and your bill — unless you stop it first. Validation is the gate that rejects bad requests cheaply, before they cost you a token.

The cheapest checks are the structural ones: is the required field present, is it the right type, is it within a sane length? Pydantic makes these declarative. You describe the shape you accept, and anything that doesn’t fit is rejected automatically.

from pydantic import BaseModel, Field, ValidationError

class AskRequest(BaseModel):
    # Required, must be a string, between 1 and 500 characters.
    question: str = Field(min_length=1, max_length=500)

# A reasonable request passes.
ok = AskRequest(question="What is the capital of France?")
print("accepted:", repr(ok.question))

# An oversized request is rejected before it ever reaches the model.
try:
    AskRequest(question="x" * 5000)
except ValidationError as e:
    print("rejected:", e.errors()[0]["type"])
accepted: 'What is the capital of France?'
rejected: string_too_long

The 5,000-character request never makes it past the schema. That matters for two reasons: it protects your wallet (a giant prompt is a giant input-token bill), and it protects your service (a flood of oversized requests can’t tie up your model calls). A length cap is the simplest, highest-value validation you can add — pick a maximum that fits your use case and enforce it at the boundary.

Beyond length, the other essential check is emptiness. A blank or whitespace-only question is meaningless to the model but still costs a request. Reject it explicitly. You’ll see both checks wired into the live endpoint next.


Serving the Model with FastAPI

A validated call is still just a function until something can reach it over the network. FastAPI turns it into an HTTP endpoint with very little ceremony: you declare the request and response shapes as Pydantic models, write a handler, and FastAPI gives you JSON parsing, validation, status codes, and an OpenAPI spec for free.

Here is the full service — schemas, validation, the model call, and error mapping in one place. Read it once, then we’ll walk through what each part guarantees.

import anthropic
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

client = anthropic.Anthropic()   # key from ANTHROPIC_API_KEY env var
app = FastAPI()

class AskRequest(BaseModel):
    question: str

class AskResponse(BaseModel):
    answer: str
    input_tokens: int
    output_tokens: int

@app.post("/ask", response_model=AskResponse)
def ask(req: AskRequest):
    if not req.question.strip():
        raise HTTPException(status_code=400, detail="question must not be empty")
    try:
        resp = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=80,
            messages=[{"role": "user", "content": req.question}],
        )
    except anthropic.RateLimitError:
        raise HTTPException(status_code=429, detail="rate limited, try again")
    except anthropic.APIError as e:
        raise HTTPException(status_code=502, detail=f"upstream error: {e}")
    return AskResponse(
        answer=resp.content[0].text,
        input_tokens=resp.usage.input_tokens,
        output_tokens=resp.usage.output_tokens,
    )

Three things make this a service rather than a script. The request schema (AskRequest) means FastAPI rejects any body that isn’t a JSON object with a string question — before your handler runs. The response schema (AskResponse, named via response_model) means callers get a stable, documented shape every time: an answer plus the token counts you learned to read in Lesson 1. And the explicit checks — the empty-question guard and the error mapping — turn messy real-world failures into clean HTTP responses.

You can exercise the endpoint without launching a server by using FastAPI’s TestClient, which sends requests through the app in-process. (To run it as a real server, you’d use uvicorn module_name:app --reload and send requests to it over HTTP — the TestClient proves the endpoint behaves correctly either way.)

from fastapi.testclient import TestClient

tc = TestClient(app)

print("valid:  ", tc.post("/ask", json={"question": "Capital of France? One word."}).json())
print("empty:  ", tc.post("/ask", json={"question": "   "}).status_code)
print("missing:", tc.post("/ask", json={}).status_code)
valid:   {'answer': 'Paris', 'input_tokens': 14, 'output_tokens': 4}
empty:   400
missing: 422

Three requests, three predictable outcomes. A good request returns 200 with the typed body. An empty question returns 400 because your handler rejected it. A request missing the question field never reaches your handler at all — Pydantic catches it and FastAPI returns 422 Unprocessable Entity automatically. That 422 is the schema doing its job: structural validation happens before any of your code, and before any model call.


Mapping Provider Errors and Running the Service

The model call itself can fail in ways that have nothing to do with the caller. The provider might rate-limit you. The upstream service might have a transient problem. If you let those exceptions bubble up untouched, the caller gets an opaque 500 Internal Server Error and no idea whether to retry. A predictable service translates each failure into an HTTP status that tells the caller what to do.

That’s the purpose of the except blocks in the endpoint above. A RateLimitError becomes 429 Too Many Requests — a signal the caller can honor by backing off and retrying. Any other API error becomes 502 Bad Gateway — the standard code for “I’m a server, and the service I depend on failed.” The caller now has a contract: 2xx means success, 4xx means your request was the problem (fix it — it was empty, malformed, or rate-limited), and 5xx means something upstream broke (the request was fine; retrying later may work).

This mapping is what makes the service safe and predictable. Schemas guarantee the shape going in and coming out; validation rejects bad input before it costs anything; error mapping ensures every failure mode has a defined, documented response instead of a stack trace. Together they are the difference between an endpoint that happens to work and one another team can build against.

To run it for real, save the code to a file — say service.py — and start a server:

uvicorn service:app --reload

uvicorn is the ASGI server that actually listens on a port; service:app points it at the app object in service.py; and --reload restarts on code changes during development. The server reads ANTHROPIC_API_KEY from its environment exactly as your script did — which is precisely why keeping the key in the environment, not the code, pays off the moment you deploy.


Practice Exercises

Exercise 1: Spot the leak

A teammate writes client = anthropic.Anthropic(api_key="<the-actual-secret-key-pasted-here>") directly in service.py and commits it so the app “just works” for everyone. Name two things wrong with this, and the correct fix.

Hint

First, the key is now in source — and once committed, it’s in the repository history permanently, even if deleted later; anyone with repo access (or anyone the repo ever leaks to) has a live credential. Second, every environment is now stuck with that one key. The fix is anthropic.Anthropic() with no arguments, which reads ANTHROPIC_API_KEY from the environment; the key is supplied by a .env file (in .gitignore) locally and a secrets manager in production. And since the key was committed, it must be rotated.

Exercise 2: Predict the status code

Using the /ask endpoint from this lesson, what HTTP status code does each request produce? (a) {"question": "Hello"}; (b) {"question": ""}; (c) {"prompt": "Hello"}; (d) the provider returns a rate-limit error during the call.

Hint

(a) 200 — a valid request returns the typed AskResponse. (b) 400 — your handler’s empty-question check rejects it. (c) 422 — there’s no question field, so Pydantic rejects it before your code runs. (d) 429 — the except anthropic.RateLimitError block maps it so the caller knows to back off and retry.

Exercise 3: Add a length cap

The AskRequest in the live endpoint accepts a plain str, so a 50,000-character question would reach the model. Change the schema so any question over 1,000 characters is rejected at the boundary. Which status code will an oversized request now get, and why is rejecting it here cheaper than catching it later?

Hint

Use question: str = Field(min_length=1, max_length=1000). An oversized request now returns 422 — FastAPI rejects it during schema validation, before your handler and before any model call. That’s cheaper because you never pay the input tokens for a 50,000-character prompt: the request is stopped at the gate, not after the provider has already processed (and billed) it.


Summary

Shipping an AI app means handling security and serving as one job, because they share the request path. On security, your API key is a credential: read it from the environment with anthropic.Anthropic(), never hardcode or commit it, keep .env in .gitignore, and rotate any key that leaks. On the input side, treat every request as untrusted — use Pydantic schemas to reject empty, oversized, or malformed bodies before they reach the model and your bill. On serving, FastAPI turns a validated model call into an HTTP endpoint with typed request and response schemas, automatic structural validation (422 for bad shapes), and explicit error mapping (400 for bad input, 429 for rate limits, 502 for upstream failures). The TestClient proved the contract end-to-end; uvicorn service:app --reload runs it for real. What makes the service safe and predictable isn’t the model — it’s the schemas, the validation, and the error mapping wrapped around it.

Key Concepts

  • Keys in the environmentanthropic.Anthropic() reads ANTHROPIC_API_KEY; code references a name, never the secret.
  • Never commit, always rotate.gitignore your .env; a leaked key is live until you revoke it.
  • Input validation — reject empty, oversized, and malformed requests at the boundary with Pydantic.
  • Schemas and status codes — typed request/response models give callers a stable contract; structural validation returns 422 automatically.
  • Provider-error-to-HTTP mapping429 for rate limits, 502 for upstream failures, so callers know whether to retry.

Why This Matters

The gap between a working demo and a service other people can rely on is almost entirely security and serving. A demo that hardcodes a key and trusts every input is one leaked commit or one oversized request away from a problem — a drained budget, a wedged service, an opaque crash. Treating the key as a managed secret and the request as untrusted, then wrapping the model call in schemas and error mapping, is what lets you hand the endpoint to another team, point it at real users, and trust it with a real key. It’s the engineering that makes your AI shippable.


Next Steps

Continue to Lesson 5 - Guided Project: Deploying an AI App

Put it all together: take a secured, validated, served model and walk through deploying it as a working AI application.

Back to Module Overview

Return to the Shipping AI Applications module overview


Continue Building Your Skills

You can now keep your key out of code, validate the requests strangers send, and serve your model behind a real HTTP API with schemas and error handling. Next you’ll bring every piece of this module together in a guided project — taking a secured, validated, served model all the way to a deployed AI application.