<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reliability and Evaluation on DATATWEETS</title><link>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/</link><description>Recent content in Reliability and Evaluation on DATATWEETS</description><generator>Hugo</generator><language>en</language><copyright>Copyright (c) 2025 Datatweets</copyright><lastBuildDate>Sun, 28 Jun 2026 09:00:00 +0200</lastBuildDate><atom:link href="https://datatweets.com/courses/ai-agents/reliability-and-evaluation/index.xml" rel="self" type="application/rss+xml"/><item><title>Lesson 1 - Why Reliability and Evaluation</title><link>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-1-why-reliability-and-evaluation/</link><pubDate>Fri, 06 Feb 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-1-why-reliability-and-evaluation/</guid><description>The gap between &amp;lsquo;works when I try it&amp;rsquo; and &amp;lsquo;works for every user, every time, affordably&amp;rsquo; is where agents fail in the real world. This lesson maps that gap to four pillars — guardrails, retries, cost control, and evaluation — and sets up the module that closes it.</description></item><item><title>Lesson 2 - Guardrails</title><link>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-2-guardrails/</link><pubDate>Fri, 06 Feb 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-2-guardrails/</guid><description>Guardrails protect both ends of the agent: an input check refuses out-of-scope or unsafe requests before you spend anything, and an output check validates the answer and repairs it if it&amp;rsquo;s empty or incomplete. This lesson builds both in Python around the loop you already have, extending the validate-then-repair-or-refuse discipline from Module 3 to the agent&amp;rsquo;s whole boundary.</description></item><item><title>Lesson 3 - Retries, Timeouts, and Cost Control</title><link>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-3-retries-timeouts-and-cost-control/</link><pubDate>Fri, 06 Feb 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-3-retries-timeouts-and-cost-control/</guid><description>Real agents run over flaky networks and on a budget. This lesson builds two small, verified patterns that wrap the agent loop: retries with exponential backoff and an attempt cap for robustness, and a token budget for affordability — so no single run can crash on a hiccup or run away with your bill.</description></item><item><title>Lesson 4 - Evaluating Agents</title><link>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-4-evaluating-agents/</link><pubDate>Fri, 06 Feb 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-4-evaluating-agents/</guid><description>You can&amp;rsquo;t improve what you don&amp;rsquo;t measure. This lesson builds a small evaluation harness — a fixed test set of cases, an LLM-as-judge that returns PASS or FAIL against a rubric, and a pass rate you track over time. Run it on every change and regressions stop hiding until users find them.</description></item><item><title>Lesson 5 - Guided Project: Production-Ready Atlas</title><link>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-5-guided-project-production-ready-atlas/</link><pubDate>Fri, 06 Feb 2026 09:00:00 +0200</pubDate><guid>https://datatweets.com/courses/ai-agents/reliability-and-evaluation/lesson-5-guided-project-production-ready-atlas/</guid><description>The course capstone: take the Atlas you built across seven modules — loop, tools, memory, planning, retrieval, multi-agent — and wrap it in Module 8&amp;rsquo;s four protective layers. Guardrails refuse or repair, retries survive blips, a budget caps spend, and an eval harness scores quality so you can ship changes with confidence.</description></item></channel></rss>