Ship a production agentic assistant with guardrails
The RAG bot worked, so now the business wants it to *act*, look things up, call internal tools, and complete multi-step tasks. You build that agent responsibly: observable, bounded, evaluated, and safe to put in front of users.
What you'll build
A multi-tool agent that plans and executes tasks, with tool-call guardrails, PII/injection defenses, full tracing, an eval suite, and latency/cost budgets, packaged and deployed.
See how we teach, before you sign up
You don't just get code dumped on you. Every starter file and every solution is explained line-by-line, in plain English. Here's one real file from this project:
import time
MAX_STEPS, MAX_SECONDS, MAX_COST_USD = 8, 60, 0.50
def run(task: str, step_fn) -> dict:
"""step_fn(history) -> {'done': bool, 'tool': str|None, 'args': dict, 'cost': float}"""
history, spent, t0, last = [], 0.0, time.perf_counter(), None
for step in range(MAX_STEPS):
if time.perf_counter() - t0 > MAX_SECONDS:
return {"status": "timeout", "history": history}
if spent > MAX_COST_USD:
return {"status": "cost_capped", "history": history}
out = step_fn(history)
spent += out.get("cost", 0.0)
sig = (out.get("tool"), str(out.get("args")))
if sig == last and out.get("tool"): # stuck calling the same thing
return {"status": "loop_detected", "history": history}
last = sig
history.append(out)
if out["done"]:
return {"status": "ok", "history": history}
return {"status": "max_steps", "history": history}Reading this file
MAX_STEPS, MAX_SECONDS, MAX_COST_USDThree independent limits, because a runaway agent finds creative ways to blow past whichever one you forgot.if time.perf_counter() - t0 > MAX_SECONDSA wall-clock timeout so a slow task cannot hang the system indefinitely.if spent > MAX_COST_USDA hard dollar ceiling, the backstop that stops an agent from quietly running up a bill.if sig == last and out.get("tool")Detects the agent repeating the same call and breaks out, catching loops the global caps would only catch slowly.
The safety chassis: caps on steps, wall-clock, and cost, plus repeated-call detection.
That's 1 of 11 explained code blocks in this single project.
The build, milestone by milestone
- 1
Design the agent loop
5 guided stepsThe loop is the safety chassis. Without hard step/time/cost limits, a confused agent will happily burn your budget in an infinite tool-calling spiral.
- 2
Wire real tools
5 guided stepsTools are where an agent gains power, and risk. Least-privilege scoping and per-call validation are what stand between a helpful agent and one that deletes production data.
- 3
Make it observable
5 guided stepsAgents fail in non-obvious, multi-step ways. Without end-to-end tracing, every bug report is "it gave a weird answer once" with no way to reproduce it.
- 4
Harden it
5 guided stepsAn acting agent with no guardrails is a liability. Injection defense, PII handling, and a human gate on high-impact actions are what make it safe to put in front of real users.
- 5
Evaluate & budget
5 guided stepsTask success rate and cost-per-task are the numbers that decide whether this ships. "It works in the demo" is not an answer leadership accepts for an agent that spends money per run.
- 6
Deploy
5 guided stepsAn agent that only runs on your laptop is a notebook, not a system. Packaging and deploying it is what makes it something a team could actually operate.
- 7
Break it on purpose, then write it up
5 guided stepsAn agent in production will face a provider outage, a 429 storm, a hung tool, and a cost spike, not "if" but "when". Injecting those failures deliberately, while you are watching, is the only way to know your fallbacks and breakers actually fire instead of cascading into a stuck or runaway agent.
What's inside when you start
You'll walk away with
This is portfolio-grade. Build it free.
Sign up to unlock every milestone step-by-step, the code skeletons, full reference solutions, and checkable tasks, with your progress saved as you build.
Start building