Classic RAG grabs the top-k chunks once and hopes they're enough. Agentic RAG lets the model decide what to fetch, reflect, and re-query; GraphRAG walks a knowledge graph for multi-hop questions. Here's when each beats classic RAG, and what they cost.
Your support bot has indexed every internal doc. A user asks: "Which of our enterprise customers are affected by the auth library CVE we patched last quarter, and who owns those accounts?" You watch the trace. The retriever pulls the top 5 chunks for that query, the model reads them, and answers: "I don't have information about which customers are affected."
It's not wrong to give up, the answer doesn't live in any single chunk. To get it you must connect the CVE to the patched library, the library to the services that depend on it, the services to the customers running them, and the customers to their account owners. That's a multi-hop question: four joins across four different documents. Classic RAG does one retrieval, with one query, against one similarity index. It never had a chance.
Two patterns fix this from different angles. Agentic RAG lets the model run retrieval in a loop, search, look at what came back, decide it needs more, and search again with a sharper query. GraphRAG indexes your knowledge as entities and relationships, so a multi-hop question becomes a graph traversal instead of a prayer over cosine similarity.
Who this is for
Engineers who already shipped a classic RAG pipeline (embed, top-k, stuff the prompt) and keep hitting questions it can't answer. You know what an embedding and a vector store are. You want to know when to reach for agentic loops or graphs, and when that's overkill. If you're brand new to RAG, start with the sibling article on [Advanced RAG](/blog/advanced-rag-reranking-hybrid-search) first.
The principle: retrieval is a process, not a lookup
Classic RAG treats retrieval as a single lookup. Agentic and graph retrieval treat it as a process, one where the system can follow a trail, notice it's incomplete, and go back for more.
Picture two people answering a hard research question. The first one googles it, grabs the very first result, and writes their answer from that single page. If the page is incomplete, the answer is incomplete, they never look again. That's classic RAG: one query, top-k results, done.
The second person is a real researcher. They read a paper, notice it cites an earlier study for a key claim, follow that citation, find it points at a dataset, look up the dataset, and only then write a grounded answer. They followed a trail across sources, and they knew when they had enough. Agentic RAG is the researcher who re-queries; GraphRAG is the researcher who follows citations, except the citations are pre-built edges in a graph.
Grabbing the first search result and writing from itClassic RAG: one query, top-k chunks, single generation
A researcher who reads, spots a gap, and searches againAgentic RAG: retrieve → reflect → re-query loop
Following a chain of citations from paper to sourceGraphRAG: traversing entity-relationship edges (multi-hop)
Deciding "I have enough now" and stoppingA termination condition: confidence, budget, or max steps
Same goal, a complete, grounded answer, three ways of getting there.
The picture: the agentic loop (with a graph branch)
The agentic loop reflects and re-queries until it has enough; one of its retrieval tools can be a graph traversal for multi-hop questions.
1
Plan
The agent reads the question and decides what to look for first, and which tool fits. A factual lookup goes to the vector store; a relationship question ("who depends on X") goes to the graph.
2
Retrieve
It runs the chosen tool. Vector search returns top-k chunks by similarity; graph traversal returns connected entities and the edges between them.
3
Reflect
The agent inspects what came back. Does this actually answer the question, or did it surface a new entity I now need to look up? This is the step classic RAG skips entirely.
4
Re-query or answer
If there's a gap, it loops back with a sharper, narrower query (now informed by what it just learned). If it has enough, or hits its step/budget limit, it writes the final grounded answer with citations.
Classic vs agentic vs GraphRAG, side by side
The three aren't competitors so much as points on a cost/capability curve. You move up the curve only when the question shape demands it.
Classic RAG
Agentic RAG
GraphRAG
Best at
Single-fact lookups
Open-ended, decompose-able questions
Multi-hop relationship questions
Retrieval
One query, top-k, once
Many queries, model-driven loop
Traversal over entity edges
Latency
Low (1 LLM call)
High (N retrieve+reflect rounds)
Medium (build cost is up front)
Cost per query
$, one generation
$$$, multiple LLM calls
$$, query cheap, indexing dear
Build complexity
Low
Medium (loop, tools, stop rule)
High (extract + maintain graph)
Fails when
Answer spans >1 doc
Question is trivial (wasted spend)
Graph is stale or sparse
Pick the cheapest pattern that can actually answer your question.
Default to classic
Most production questions are single-hop. Ship classic RAG first, log the questions it fails, and only then decide whether the failures cluster around decomposition (reach for agentic) or relationships (reach for a graph). Don't pay for a loop you don't need.
A code sketch: the retrieve-reflect loop
Agentic RAG is less a framework than a control flow. Strip away the libraries and it's a while loop with three moving parts: a step budget, a reflection that returns a structured decision, and a re-query that feeds the gap back in. Here's the skeleton in Python.
agentic_rag.py
python
from dataclasses import dataclass
MAX_STEPS = 5# the runaway-loop guardrail@dataclassclass Reflection:
enough: bool # do we have what we need?
next_query: str # if not, what to search for nextdefagentic_answer(question: str) -> str:
context: list[str] = []
query = question
for step inrange(MAX_STEPS):
# 1. RETRIEVE, vector for facts, graph for relationships
chunks = retrieve(query, context)
context.extend(chunks)
# 2. REFLECT, ask the model to judge its own context
r: Reflection = reflect(question, context)
if r.enough:
break# 3. RE-QUERY, the gap, sharpened by what we just learned
query = r.next_query
# 4. ANSWER, grounded in everything we gatheredreturngenerate(question, context)
defreflect(question: str, context: list[str]) -> Reflection:
# The model returns STRUCTURED output, not prose, so the# loop can branch on it deterministically.
prompt = (
f"Question: {question}\n"
f"Context so far:\n{chr(10).join(context)}\n\n""Can you fully answer with this context? ""Reply as JSON: {\"enough\": bool, \"next_query\": str}"
)
returnparse_reflection(llm_json(prompt))
MAX_STEPS is not optional
Without a hard step cap, a confused agent will re-query forever, burning tokens and latency on a question it can't answer. Always pair the loop with a budget (steps, tokens, or wall-clock) and a graceful "I couldn't fully resolve this" fallback.
GraphRAG: entities, relations, multi-hop
GraphRAG changes what you index. Instead of slicing documents into chunks and embedding them, you run an extraction pass that pulls out entities (CVE-2024-1234, the auth-lib package, the billing service, Acme Corp) and the relations between them (patched-by, depends-on, runs, owned-by). The result is a knowledge graph: nodes and labeled edges.
Now the multi-hop question from the opening becomes a traversal. Start at the CVE node, hop along patched-by to the library, along depends-on to the affected services, along runs to the customers, and along owned-by to the account managers. Four hops, each one a deterministic edge-walk, no guessing which chunk happened to mention all four facts together (none did).
Extraction, an LLM (or NER model) reads each document and emits (entity, relation, entity) triples. This is the expensive, error-prone step.
Storage, triples land in a graph store (Neo4j, a triple store) or a graph layer over your existing DB.
Retrieval, a question maps to a starting entity, then a traversal (or a generated graph query) walks the edges to gather a connected subgraph.
Generation, the subgraph, serialized to text, becomes the context the model answers from, with the full chain visible as citations.
The payoff is reasoning over connections that no single chunk contains. The price is the graph itself: extraction is lossy, schemas drift, and the graph must be kept in sync with your source data or it quietly answers from a stale world. See Building AI Agents for how the agent-as-orchestrator pattern wraps both retrieval styles behind one planner.
Common mistakes that cost hours (or dollars)
Going agentic when classic suffices. A retrieve-reflect loop on a single-fact FAQ is 4x the cost and latency for an identical answer. Profile your real questions before adding loops, most are single-hop.
No loop guardrail. Without a step/token/time budget, a stuck agent re-queries indefinitely. Every loop needs a hard cap and a "couldn't resolve" exit.
Reflection that returns prose. If the model's self-assessment is free text, your loop can't branch on it reliably. Force structured output (JSON with enough + next_query) so control flow is deterministic.
Underestimating graph upkeep. A knowledge graph is a second source of truth that decays. Budget for re-extraction when source docs change, and monitor for orphaned or stale nodes, a graph that lies is worse than no graph.
Sparse-graph disappointment. GraphRAG only shines when entities are densely connected. If your docs are isolated facts with few relationships, the traversal finds nothing the vector store wouldn't, you paid the build cost for no multi-hop benefit.
Skipping the trace. Both patterns are loops or walks; you cannot debug them blind. Log every query, every reflection decision, and every hop so you can see exactly where an answer went wrong.
Takeaways
The whole article in seven lines
Classic RAG = one query, top-k, once. It fails when the answer spans multiple documents.
Agentic RAG = the model loops: retrieve → reflect → re-query → answer, until it has enough.
GraphRAG = index entities and relations, then traverse edges to answer multi-hop questions.
Use agentic for open-ended, decompose-able questions; use a graph for relationship/multi-hop questions.
Default to classic RAG, it's the cheapest pattern that answers most single-hop questions.
Every agentic loop needs a hard budget (steps/tokens/time) and structured reflection output.
A graph is a second source of truth: budget for extraction errors and keeping it in sync.
Where to go next
Get the retrieval fundamentals solid before you add loops or graphs on top, re-ranking and hybrid search often fix "bad retrieval" cheaper than going agentic. Then layer in the agentic orchestration once you've confirmed your failures are multi-hop, not just low-recall.
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.