Lesson 5 - Guided Project: Robust Atlas Tools

Welcome to the Guided Project

Over the last four lessons you’ve collected every piece you need to make a tool trustworthy: a clear name and description, a tight typed schema, a Pydantic model that validates inputs, and a loop that hands failures back to Claude as is_error tool results. In this guided project you’ll put all of it together by rebuilding Atlas’s tools — the travel assistant you’ve been growing across the course — so they’re genuinely robust. You’ll define inputs as Pydantic models, generate the schemas Claude sees from those same models, wrap each tool in a validate-then-run guard, register everything, and watch the agent loop turn a bad call into a clean recovery. When you finish, you’ll have a small tools module you can drop straight into Atlas.

By the end of this project, you will be able to:

  • Define a tool’s inputs as a Pydantic model with constraints and enums, and generate its schema from that model
  • Wrap a tool in a validate-then-run function that never lets a bad input or a tool crash escape
  • Drive tools through a registry so the agent loop stays small and uniform
  • Trace a real repair loop where Claude sends an invalid call, sees the error, and self-corrects

We’ll build it in five stages, each one small and testable. Let’s start with the inputs.


Stage 1: Define the Pydantic Input Models

The foundation of a robust tool is a model that describes exactly what a valid input looks like. Here is the input model for Atlas’s currency converter. Every field is typed, the amount is constrained to be positive, and the currencies are locked to a fixed set with Literal — so an unsupported currency can’t even be expressed:

from typing import Literal
from pydantic import BaseModel, Field, ValidationError

class ConvertCurrencyInput(BaseModel):
    amount: float = Field(gt=0, description="The amount of money to convert, must be positive.")
    from_currency: Literal["JPY", "USD", "EUR"] = Field(description="Currency to convert from.")
    to_currency: Literal["JPY", "USD", "EUR"] = Field(description="Currency to convert to.")

The gt=0 constraint and the Literal enums are doing real work here. They mean Pydantic will reject a negative amount or a currency outside the supported three before your conversion code ever runs. Atlas’s other tools — get_weather, search_destinations — follow the exact same pattern: one BaseModel per tool, with Field constraints and Literal enums wherever a value is bounded or comes from a fixed set. Get this stage right and the rest of the module falls into place, because everything downstream reads from these models.


Stage 2: Write the Real Tool Functions

The model guards the inputs; the function does the work. Atlas’s currency tool is a plain Python function that looks up an exchange rate and returns a short, labeled string — exactly the kind of clear return value you want a model to read back:

_RATES = {("JPY", "USD"): 0.0067, ("USD", "EUR"): 0.92, ("USD", "JPY"): 149.3}

def convert_currency(amount, from_currency, to_currency):
    rate = _RATES[(from_currency, to_currency)]
    return f"{amount} {from_currency} = {round(amount * rate, 2)} {to_currency}"

Notice that the function itself does no validation. It doesn’t check that amount is positive or that the currencies are supported — that’s the model’s job, and we’re about to wire the two together so the function only ever runs on validated input. The function returns a concise, unambiguous result with the currencies and amounts spelled out, so when this lands back in Claude’s context the model can act on it directly.


Stage 3: Generate the Tool Definition from the Model

Atlas needs a tool definition to send to Claude — and the schema in that definition should come straight from the Pydantic model, so the two can never drift apart. model_json_schema() does that generation for you. We also build a small registry that maps each tool name to its (model, function) pair, so the loop can look up everything it needs by name:

TOOLS = [{
    "name": "convert_currency",
    "description": "Convert money between JPY, USD, and EUR. Use this when the user compares or budgets prices across currencies.",
    "input_schema": ConvertCurrencyInput.model_json_schema(),
}]
REGISTRY = {"convert_currency": (ConvertCurrencyInput, convert_currency)}

TOOLS is what you pass to Claude — the description says when to reach for the tool, and the input_schema is generated from ConvertCurrencyInput so it stays in sync automatically. REGISTRY is the loop’s lookup table: given the name Claude calls, it hands back the model to validate against and the function to run. Adding a second tool is now two lines — append a definition to TOOLS and an entry to REGISTRY — with no changes to the loop itself.


Stage 4: The Validate-Then-Run Wrapper and Loop Dispatch

This is the heart of the project: a single wrapper that validates the input against the model, runs the function only if validation passes, and turns any failure — a bad input or a crash inside the tool — into a structured result the loop can return to Claude. Nothing escapes it:

def run_tool(model_cls, fn, raw_input):
    try:
        validated = model_cls(**raw_input)
    except ValidationError as exc:
        msgs = "; ".join(f"{e['loc'][0]}: {e['msg']}" for e in exc.errors())
        return {"ok": False, "content": f"Invalid input -> {msgs}"}
    try:
        return {"ok": True, "content": fn(**validated.model_dump())}
    except Exception as exc:
        return {"ok": False, "content": f"Tool error -> {exc}"}

The wrapper has two guards. The first try constructs the model; if Claude sent a bad input, Pydantic raises ValidationError, and we flatten it into a short, readable message like from_currency: Input should be 'JPY', 'USD' or 'EUR'. The second try runs the validated function; if the tool itself throws, we catch that too. Either way the wrapper returns a uniform {"ok": ..., "content": ...} — success or failure, the shape is the same.

In the agent loop, each tool-use block dispatches through the registry and the wrapper, and the outcome becomes a tool_result. The crucial line is is_error: when the wrapper reports failure, we mark the result as an error so Claude knows the call didn’t work and can try again:

model_cls, fn = REGISTRY[block.name]
outcome = run_tool(model_cls, fn, block.input)
results.append({"type": "tool_result", "tool_use_id": block.id,
                "content": outcome["content"], "is_error": not outcome["ok"]})

This is the whole pattern in four lines: look up the tool by name, run it through the guard, and feed the result — success or is_error — back to Claude. The loop never crashes, and a failure is just information the model gets to act on.

This is your default tool pattern

The pipeline you just built — Pydantic model → schema generated from it → validate-then-run wrapper → is_error on any failure → Claude self-corrects — is reusable for every tool you’ll ever add to an agent. Don’t treat it as a one-off for the currency converter. Copy run_tool and the registry shape as your default, and adding a robust tool becomes “write a model, write a function, register them.” That uniformity is what keeps a growing agent reliable.


Stage 5: Run It and Watch the Repair Loop Recover

Now the payoff. Below is a real, verified trace from running Atlas with these tools against an SDK-shaped mock. Claude is asked to convert currencies and, on its first attempt, reaches for GBP — a currency Atlas doesn’t support. Watch what happens:

  step 1: convert_currency({'amount': 100, 'from_currency': 'GBP', 'to_currency': 'USD'}) -> ERROR (is_error): Invalid input -> from_currency: Input should be 'JPY', 'USD' or 'EUR'
  step 2: convert_currency({'amount': 100, 'from_currency': 'USD', 'to_currency': 'EUR'}) -> ok: 100.0 USD = 92.0 EUR
  final answer: 100 USD is about 92.0 EUR.

Trace it through. In step 1, Claude sends from_currency: 'GBP'. The Literal["JPY", "USD", "EUR"] constraint rejects it, the wrapper returns ok: False with a clear message, and the loop sends that back as a tool_result with is_error set. In step 2, Claude reads the error, understands GBP isn’t supported, and retries with a valid currency — which now passes validation, runs, and returns 100.0 USD = 92.0 EUR. The model then gives its final answer in plain language.

The exact wording of Claude’s final answer will vary from run to run — that’s normal, since it’s generated text. What’s guaranteed is the behavior: an invalid call produces an is_error result with a clear message, Claude retries with a valid call, and the conversation recovers without any crash or human intervention. That recovery is the entire point of the module, and you just watched all five pieces — model, schema, wrapper, registry, loop — produce it together.


Extend Atlas

Exercise 1: Add a get_weather tool

Atlas should be able to answer “what’s the weather in Kyoto?” Add a get_weather tool following the exact pattern from this project: a Pydantic input model, a real function, a tool definition with a generated schema, and a registry entry.

Hint

Write a GetWeatherInput(BaseModel) with a single city: str = Field(description="The city to get weather for."). Write a get_weather(city) function that returns a short labeled string like "Kyoto: 16°C, clear". Then append {"name": "get_weather", "description": "...", "input_schema": GetWeatherInput.model_json_schema()} to TOOLS and "get_weather": (GetWeatherInput, get_weather) to REGISTRY. You changed zero lines of the loop — that’s the registry paying off.

Exercise 2: Add a range constraint and test it

Suppose Atlas gains a book_seats tool where travelers can book 1–9 seats. Add a constrained field to its input model, then construct the model directly with an out-of-range value to confirm it’s rejected.

Hint

Use count: int = Field(ge=1, le=9, description="Number of seats, 1 to 9."). Then try run_tool(BookSeatsInput, book_seats, {"count": 12}) — the wrapper should return {"ok": False, "content": "Invalid input -> count: Input should be less than or equal to 9"}, and the loop would hand that to Claude as an is_error result so it can retry with a valid count.

Exercise 3: Sharpen an error message and watch repair

The clearer your error message, the faster Claude self-corrects. Rewrite the validation message in run_tool to include the field’s allowed values explicitly, then re-run the bad-currency scenario and compare how quickly the model recovers.

Hint

Pydantic already includes the allowed values in e['msg'] for a Literal (Input should be 'JPY', 'USD' or 'EUR'), which is why the repair in Stage 5 worked in a single retry. Experiment with adding a leading hint like f"Invalid input (fix and retry) -> {msgs}". The goal is to internalize why the repair loop works: a precise, actionable message gives Claude exactly what it needs to send a correct call on the next turn.


Summary

You rebuilt Atlas’s tools to be genuinely robust by assembling the whole module into one pipeline. Each tool starts as a Pydantic input model with Field constraints and Literal enums, so invalid values can’t be expressed. The tool definition’s schema is generated from that same model with model_json_schema(), so the schema Claude sees never drifts from what you validate against. A validate-then-run wrapper (run_tool) guards every call — catching ValidationError on bad inputs and any exception inside the tool — and returns a uniform {"ok", "content"} result. A registry maps each tool name to its (model, function) pair so the loop stays tiny and uniform. And the loop dispatch turns any failure into a tool_result with is_error set, which is what lets Claude read the error and self-correct. The verified trace showed it end to end: a bad GBP call becomes an is_error message, and Claude retries with a valid currency.

Key Concepts

  • Pydantic input model — typed fields with constraints and Literal enums; the single source of truth for what’s valid.
  • Generated schemamodel_json_schema() produces the tool’s input_schema from the model, so the two stay in sync.
  • Validate-then-run wrapper — validates against the model, runs the function only on success, and converts any failure into a structured result.
  • Registry — a name → (model, function) map that keeps the agent loop small and makes adding a tool trivial.
  • is_error repair loop — a failed call is returned to Claude as an error, which the model reads and corrects on the next turn.

Why This Matters

This pattern is the difference between an agent that’s a demo and one you can run unattended. Without it, a single bad tool call crashes the loop or, worse, corrupts an answer silently. With it, every failure becomes information Claude can act on, and the system repairs itself in-conversation. Because the pipeline is uniform — model, schema, wrapper, registry — it scales: adding the tenth tool is as safe as adding the first. You now have a tools module you can drop into Atlas and keep building on, knowing that bad inputs and tool crashes are handled by design rather than by luck.


Next Steps

Continue to Module 4 - Memory and State

Give the agent short-term conversation memory, summarization, and long-term memory via a vector store.

Back to Module Overview

Return to the Designing Tools module overview


Continue Building Your Skills

You’ve finished Designing Tools with a robust, reusable tools module for Atlas — Pydantic models, generated schemas, a validate-then-run wrapper, a registry, and a repair loop that turns failures into self-correction. Atlas can now call tools reliably and recover when a call goes wrong. The next module gives it something it’s been missing: memory. You’ll add short-term conversation memory so the agent remembers the current chat, summarization to keep long conversations within context, and long-term memory backed by a vector store so Atlas can recall facts across sessions.