Back to path
LargePortfolio centerpiece ~33h· 7 milestones

Ship a production agentic assistant with guardrails

The RAG bot worked, so now the business wants it to *act*, look things up, call internal tools, and complete multi-step tasks. You build that agent responsibly: observable, bounded, evaluated, and safe to put in front of users.

Agent architectureTool/function callingGuardrails & safetyObservability & tracingEval harnessCost/latency budgetingDeploymentChaos / failover testingMulti-model fallback & circuit breakersBlameless postmortems

What you'll build

A multi-tool agent that plans and executes tasks, with tool-call guardrails, PII/injection defenses, full tracing, an eval suite, and latency/cost budgets, packaged and deployed.

See how we teach, before you sign up

You don't just get code dumped on you. Every starter file and every solution is explained line-by-line, in plain English. Here's one real file from this project:

agent/loop.pypython
import time

MAX_STEPS, MAX_SECONDS, MAX_COST_USD = 8, 60, 0.50


def run(task: str, step_fn) -> dict:
    """step_fn(history) -> {'done': bool, 'tool': str|None, 'args': dict, 'cost': float}"""
    history, spent, t0, last = [], 0.0, time.perf_counter(), None
    for step in range(MAX_STEPS):
        if time.perf_counter() - t0 > MAX_SECONDS:
            return {"status": "timeout", "history": history}
        if spent > MAX_COST_USD:
            return {"status": "cost_capped", "history": history}
        out = step_fn(history)
        spent += out.get("cost", 0.0)
        sig = (out.get("tool"), str(out.get("args")))
        if sig == last and out.get("tool"):      # stuck calling the same thing
            return {"status": "loop_detected", "history": history}
        last = sig
        history.append(out)
        if out["done"]:
            return {"status": "ok", "history": history}
    return {"status": "max_steps", "history": history}

Reading this file

  • MAX_STEPS, MAX_SECONDS, MAX_COST_USDThree independent limits, because a runaway agent finds creative ways to blow past whichever one you forgot.
  • if time.perf_counter() - t0 > MAX_SECONDSA wall-clock timeout so a slow task cannot hang the system indefinitely.
  • if spent > MAX_COST_USDA hard dollar ceiling, the backstop that stops an agent from quietly running up a bill.
  • if sig == last and out.get("tool")Detects the agent repeating the same call and breaks out, catching loops the global caps would only catch slowly.

The safety chassis: caps on steps, wall-clock, and cost, plus repeated-call detection.

That's 1 of 11 explained code blocks in this single project.

The build, milestone by milestone

  1. 1

    Design the agent loop

    5 guided steps

    The loop is the safety chassis. Without hard step/time/cost limits, a confused agent will happily burn your budget in an infinite tool-calling spiral.

  2. 2

    Wire real tools

    5 guided steps

    Tools are where an agent gains power, and risk. Least-privilege scoping and per-call validation are what stand between a helpful agent and one that deletes production data.

  3. 3

    Make it observable

    5 guided steps

    Agents fail in non-obvious, multi-step ways. Without end-to-end tracing, every bug report is "it gave a weird answer once" with no way to reproduce it.

  4. 4

    Harden it

    5 guided steps

    An acting agent with no guardrails is a liability. Injection defense, PII handling, and a human gate on high-impact actions are what make it safe to put in front of real users.

  5. 5

    Evaluate & budget

    5 guided steps

    Task success rate and cost-per-task are the numbers that decide whether this ships. "It works in the demo" is not an answer leadership accepts for an agent that spends money per run.

  6. 6

    Deploy

    5 guided steps

    An agent that only runs on your laptop is a notebook, not a system. Packaging and deploying it is what makes it something a team could actually operate.

  7. 7

    Break it on purpose, then write it up

    5 guided steps

    An agent in production will face a provider outage, a 429 storm, a hung tool, and a cost spike, not "if" but "when". Injecting those failures deliberately, while you are watching, is the only way to know your fallbacks and breakers actually fire instead of cascading into a stuck or runaway agent.

What's inside when you start

4 starter files, ready to clone
7 guided milestones
7 full reference solutions
11 code blocks explained line-by-line
7 "is it working?" checks
4 interview questions it prepares you for

You'll walk away with

A deployed agent service with traced runs
An eval suite scoring task success and safety
A design doc covering guardrails and the cost/latency budget
A chaos/failover drill report covering provider, tool, and cost-spike failures with time-to-detect/recover
A blameless postmortem (timeline, root cause, action items) of the worst drill

This is portfolio-grade. Build it free.

Sign up to unlock every milestone step-by-step, the code skeletons, full reference solutions, and checkable tasks, with your progress saved as you build.

Start building