Building AI Agents: From One LLM Call to a Reasoning Loop
A single LLM call answers once. An agent reasons, calls tools, observes the result, and loops until the job is done. Here's how the ReAct loop works, how to build a minimal one, and how to keep it from burning your budget.
You wire up an LLM, send a prompt, get a paragraph back. It's impressive, until the task needs more than one shot. "What's our current AWS bill, and which service jumped this month?" A single call can't answer that. It has no live data, it can't run a query, and it can't check its own math. So it guesses, confidently, and you ship a wrong number.
The fix isn't a smarter prompt. It's a different shape. Instead of one call that must know everything, you give the model tools and let it loop: think about what it needs, call a tool to get it, look at the result, and decide whether it's done. That loop is what people mean by an agent.
Who this is for
Engineers who've made a few LLM calls and now want the model to *do* things, query a database, hit an API, read a file, and self-correct along the way. You should be comfortable with [structured output and tool calling](/blog/structured-output-and-tool-calling) and have a rough feel for [LLM cost and latency](/blog/llm-cost-and-latency-optimization). No agent framework required; we build the loop by hand so you can see every moving part.
What an agent actually is
An agent is an LLM running in a loop: it reasons about a goal, picks a tool, executes it, observes the result, and repeats, until it decides the goal is met or it hits a limit you set.
The pattern has a name: ReAct, *reason → act → observe → repeat*. The model reasons in plain language about what to do next, acts by emitting a tool call, then observes the tool's output, which gets fed back into the next turn. Nothing magical: it's a while loop where the LLM is the decision-maker and tools are the hands.
You hand an intern a goal, not step-by-step instructionsYou give the agent a task and a system prompt, not a fixed script
The intern looks things up, runs the report, calls the APIThe agent calls tools, search, SQL, HTTP, a calculator
They read the result and decide what to do nextTool output is fed back; the model reasons about the next step
They stop when the task is done, or ask when stuckThe loop ends on a 'final answer' or a step / cost limit
A bad intern loops forever re-checking the same thingAn unguarded agent loops, re-calls tools, and burns tokens
An agent is an intern who can use tools and check their own work.
The agent loop, drawn out
Every agent, no matter the framework, is this cycle. The model never touches the outside world directly; it asks for a tool, your code runs the tool, and the result comes back as the next observation.
The ReAct loop: reason, act, observe, repeat, with a guard that forces an exit.
1
Start with a goal
The user's task plus a system prompt describing the role and the tools available. This is the only fixed input.
2
Reason
The model thinks about what it needs next, in plain language. "To find the cost spike I need this month's bill broken down by service."
3
Choose a tool
It emits a structured tool call: a name (`query_billing`) and arguments (`{ month: "2026-06" }`). This is just structured output.
4
Execute
Your code, not the model, runs the tool, hits the API or DB, and captures the result. The model is sandboxed; it only asks.
5
Observe
The tool's output is appended to the conversation as an observation and handed back to the model for the next turn.
6
Done?
The model decides: is the goal met? If yes, it returns a final answer. If no, the loop repeats, bounded by a max-step guard so it can never run forever.
One call vs. chain vs. agent, pick the smallest thing that works
An agent is the most powerful shape and the most expensive, slowest, and least predictable. Reach for it only when the *number of steps isn't known up front*. If you can hardcode the steps, don't make the model decide them.
Shape
How it works
Use when
Cost & risk
Single LLM call
One prompt, one response. No tools, no loop.
The model already knows the answer or it's pure text-in / text-out (summarize, classify, rewrite).
Cheapest, fastest, most predictable. Bounded by one call.
Chain
A fixed sequence of calls / steps you wire yourself (e.g. retrieve → answer).
You know the steps in advance and the order never changes. RAG is a chain.
Predictable cost (N steps). You control the flow, not the model.
Agent
The model decides which tools to call and how many times, in a loop.
The path depends on intermediate results, unknown number of steps, branching, self-correction.
Variable cost, can loop, hardest to debug. Needs guardrails.
Match the shape to the task. Default to the simplest one that fits.
The honest default
Most "agent" projects are really a chain in disguise. If you can draw the steps on a whiteboard and they never branch, build a chain, it's cheaper, faster, and you can actually test it. Save the agent for the genuinely open-ended cases.
A minimal agent loop you can read
Here's the whole idea in ~50 lines: one tool, the loop, and a hard step cap. No framework. The model gets a tool definition, and on each turn it either calls the tool or returns a final answer. The MAX_STEPS guard is the single most important line, it's the difference between a bounded agent and a runaway bill.
agent.py
python
import anthropic
client = anthropic.Anthropic()
MAX_STEPS = 6# hard guard: the loop can NEVER run longer than this# One tool the model is allowed to use.
TOOLS = [{
"name": "query_billing",
"description": "Return this month's cloud spend, broken down by service.",
"input_schema": {
"type": "object",
"properties": {"month": {"type": "string", "description": "YYYY-MM"}},
"required": ["month"],
},
}]
defrun_tool(name, args):
# YOUR code runs the tool, the model only asked for it.if name == "query_billing":
return {"EC2": 4200, "S3": 180, "RDS": 2100, "NAT": 950}
raiseValueError(f"unknown tool: {name}")
defagent(task: str) -> str:
messages = [{"role": "user", "content": task}]
for step inrange(MAX_STEPS):
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system="You are a FinOps assistant. Use tools to get real numbers; never guess.",
tools=TOOLS,
messages=messages,
)
# No tool call -> the model is done. Return its answer.if resp.stop_reason != "tool_use":
return"".join(b.text for b in resp.content if b.type == "text")
# Otherwise: execute every requested tool, feed results back, loop.
messages.append({"role": "assistant", "content": resp.content})
results = []
for block in resp.content:
if block.type == "tool_use":
out = run_tool(block.name, block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(out),
})
messages.append({"role": "user", "content": results})
# Hit the cap without finishing, fail loud, don't loop forever.return"Stopped: hit MAX_STEPS without a final answer."print(agent("Which service drove our cost up this month, and by how much?"))
Read the control flow, not the API. The model reasons, asks for query_billing, your run_tool executes it, the result goes back, and the model reasons again, this time with real numbers, and returns the answer. The for step in range(MAX_STEPS) is the loop *and* the guard in one line. Everything an agent framework adds is built on exactly this skeleton.
Planning, memory, and context
Three things separate a toy loop from something that survives real tasks.
Planning
For multi-step tasks, ask the model to lay out a plan *first*, then execute step by step. A plan in the context window keeps the agent on-rails and gives it something to check progress against, instead of re-deciding the whole approach on every turn (which is how agents drift and loop).
Memory and context
The model has no memory between turns except what's in the message list. Every tool result you append is context the model now "remembers", and context isn't free. Each turn re-sends the entire growing history, so a 10-step agent pays for the conversation 10 times over. The practical levers: summarize old tool results once they're no longer needed verbatim, trim giant payloads before appending them, and keep the system prompt tight. This is the same cost math from LLM cost and latency optimization, just amplified by the loop.
Stopping conditions
Natural stop, the model returns a final answer with no tool call (stop_reason != "tool_use"). The good case.
Step cap, MAX_STEPS hit. Non-negotiable; it's your circuit breaker.
Budget cap, track cumulative tokens / dollars and bail when you cross a threshold, independent of step count.
Wall-clock timeout, for user-facing agents, cap total latency so a slow tool can't hang the request forever.
Guardrails: tools are the attack surface
The model decides *what* to call; your code decides *whether to allow it*. Never trust tool arguments blindly, the model can hallucinate a DROP TABLE just as easily as a SELECT. Treat every tool call as untrusted input.
Validate arguments against the schema and your own rules before executing, reject anything out of bounds.
Scope tool permissions, give a support agent read-only DB access, not write. The blast radius of a bad call is whatever the tool can do.
Make destructive tools confirm, require a human approval step for anything that deletes, pays, or emails.
Return errors as observations, not exceptions, when a tool fails, feed the error back so the model can recover instead of crashing the loop.
Log every reason → tool → result triple, when an agent does something weird, this trace is the only way you'll understand why.
Failure modes that bite in production
Agents fail in ways single calls never do, because the loop compounds mistakes. The four below cause most of the pain.
Infinite / stuck loops. The model keeps re-calling the same tool with the same args, never converging, often because the tool result didn't actually answer its question. Without a step cap, this runs until you notice the bill.
No step cap. The single most common omission. One off-by-one in the model's reasoning and you've got an unbounded while True. Always set MAX_STEPS and fail loud when you hit it.
Cost blowups. Every turn re-sends the full history, so cost grows roughly quadratically with step count. A chatty agent that takes 12 turns can cost 20× a single call. Track cumulative tokens, summarize old context, and cap the budget.
Bad tool use. The model calls the wrong tool, passes malformed args, or invents a tool that doesn't exist. Clear tool descriptions, strict schemas, and good error-as-observation handling fix most of it, but expect it and test for it.
Over-agenting. Using an agent where a chain or a single call would do. You inherit all the failure modes above for a task whose steps you already knew. The cure is upstream: pick the right shape.
Takeaways
The whole article in seven lines
An agent is an LLM in a loop: **reason → act → observe → repeat** (ReAct) until done.
The model only *asks* for tools; **your code executes them** and feeds results back as observations.
Pick the smallest shape: **single call** if it knows the answer, **chain** if the steps are fixed, **agent** only when the path is unknown.
A **max-step guard** is mandatory, it's the line between a bounded agent and a runaway bill.
Context isn't free: every turn re-sends the history, so **summarize and trim** as the loop grows.
Tools are the attack surface, **validate args, scope permissions, confirm destructive actions.**
The classic failures are **infinite loops, no step cap, cost blowups, bad tool use, and over-agenting**, design against all five.
Where to go next
Build the minimal loop above with one real tool against an API you control, then add a step cap and a token counter before you add a second tool. You'll learn more from one hand-built agent than from any framework tutorial.
Tighten the foundation: Structured Output & Tool Calling, tool calls are just structured output, so reliability there is reliability here.
Control the loop's economics: LLM Cost & Latency Optimization, the techniques matter most once every turn re-sends the context.
See where agents fit in the broader role: the AI Engineer career path walks the path from prompting to retrieval to agentic systems.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.