Back to Blog
AI Engineering13 min readJun 2026

Agentic RAG & GraphRAG: Retrieval Beyond One-Shot Top-K

Classic RAG grabs the top-k chunks once and hopes they're enough. Agentic RAG lets the model decide what to fetch, reflect, and re-query; GraphRAG walks a knowledge graph for multi-hop questions. Here's when each beats classic RAG, and what they cost.

AIRAGGraphRAGAgents
SB

Sri Balaji

Founder · TheSimplifiedTech

On this page

The question classic RAG quietly fails

Your support bot has indexed every internal doc. A user asks: "Which of our enterprise customers are affected by the auth library CVE we patched last quarter, and who owns those accounts?" You watch the trace. The retriever pulls the top 5 chunks for that query, the model reads them, and answers: "I don't have information about which customers are affected."

It's not wrong to give up, the answer doesn't live in any single chunk. To get it you must connect the CVE to the patched library, the library to the services that depend on it, the services to the customers running them, and the customers to their account owners. That's a multi-hop question: four joins across four different documents. Classic RAG does one retrieval, with one query, against one similarity index. It never had a chance.

Two patterns fix this from different angles. Agentic RAG lets the model run retrieval in a loop, search, look at what came back, decide it needs more, and search again with a sharper query. GraphRAG indexes your knowledge as entities and relationships, so a multi-hop question becomes a graph traversal instead of a prayer over cosine similarity.

Who this is for

Engineers who already shipped a classic RAG pipeline (embed, top-k, stuff the prompt) and keep hitting questions it can't answer. You know what an embedding and a vector store are. You want to know when to reach for agentic loops or graphs, and when that's overkill. If you're brand new to RAG, start with the sibling article on [Advanced RAG](/blog/advanced-rag-reranking-hybrid-search) first.

The principle: retrieval is a process, not a lookup

Classic RAG treats retrieval as a single lookup. Agentic and graph retrieval treat it as a process, one where the system can follow a trail, notice it's incomplete, and go back for more.
The mental shift behind both patterns

Picture two people answering a hard research question. The first one googles it, grabs the very first result, and writes their answer from that single page. If the page is incomplete, the answer is incomplete, they never look again. That's classic RAG: one query, top-k results, done.

The second person is a real researcher. They read a paper, notice it cites an earlier study for a key claim, follow that citation, find it points at a dataset, look up the dataset, and only then write a grounded answer. They followed a trail across sources, and they knew when they had enough. Agentic RAG is the researcher who re-queries; GraphRAG is the researcher who follows citations, except the citations are pre-built edges in a graph.

Grabbing the first search result and writing from itClassic RAG: one query, top-k chunks, single generation
A researcher who reads, spots a gap, and searches againAgentic RAG: retrieve → reflect → re-query loop
Following a chain of citations from paper to sourceGraphRAG: traversing entity-relationship edges (multi-hop)
Deciding "I have enough now" and stoppingA termination condition: confidence, budget, or max steps
Same goal, a complete, grounded answer, three ways of getting there.

The picture: the agentic loop (with a graph branch)

queryre-query (gap found)enough
User question

multi-hop

Agent / planner

LLM decides next action

Retrieve

vector + graph tools

Reflect

enough? gaps?

Answer

grounded + cited

Vector store

top-k chunks

Knowledge graph

multi-hop traversal

The agentic loop reflects and re-queries until it has enough; one of its retrieval tools can be a graph traversal for multi-hop questions.

  1. 1

    Plan

    The agent reads the question and decides what to look for first, and which tool fits. A factual lookup goes to the vector store; a relationship question ("who depends on X") goes to the graph.

  2. 2

    Retrieve

    It runs the chosen tool. Vector search returns top-k chunks by similarity; graph traversal returns connected entities and the edges between them.

  3. 3

    Reflect

    The agent inspects what came back. Does this actually answer the question, or did it surface a new entity I now need to look up? This is the step classic RAG skips entirely.

  4. 4

    Re-query or answer

    If there's a gap, it loops back with a sharper, narrower query (now informed by what it just learned). If it has enough, or hits its step/budget limit, it writes the final grounded answer with citations.

Classic vs agentic vs GraphRAG, side by side

The three aren't competitors so much as points on a cost/capability curve. You move up the curve only when the question shape demands it.

Classic RAGAgentic RAGGraphRAG
Best atSingle-fact lookupsOpen-ended, decompose-able questionsMulti-hop relationship questions
RetrievalOne query, top-k, onceMany queries, model-driven loopTraversal over entity edges
LatencyLow (1 LLM call)High (N retrieve+reflect rounds)Medium (build cost is up front)
Cost per query$, one generation$$$, multiple LLM calls$$, query cheap, indexing dear
Build complexityLowMedium (loop, tools, stop rule)High (extract + maintain graph)
Fails whenAnswer spans >1 docQuestion is trivial (wasted spend)Graph is stale or sparse
Pick the cheapest pattern that can actually answer your question.

Default to classic

Most production questions are single-hop. Ship classic RAG first, log the questions it fails, and only then decide whether the failures cluster around decomposition (reach for agentic) or relationships (reach for a graph). Don't pay for a loop you don't need.

A code sketch: the retrieve-reflect loop

Agentic RAG is less a framework than a control flow. Strip away the libraries and it's a while loop with three moving parts: a step budget, a reflection that returns a structured decision, and a re-query that feeds the gap back in. Here's the skeleton in Python.

agentic_rag.py
python
from dataclasses import dataclass

MAX_STEPS = 5  # the runaway-loop guardrail

@dataclass
class Reflection:
    enough: bool        # do we have what we need?
    next_query: str     # if not, what to search for next

def agentic_answer(question: str) -> str:
    context: list[str] = []
    query = question

    for step in range(MAX_STEPS):
        # 1. RETRIEVE, vector for facts, graph for relationships
        chunks = retrieve(query, context)
        context.extend(chunks)

        # 2. REFLECT, ask the model to judge its own context
        r: Reflection = reflect(question, context)
        if r.enough:
            break

        # 3. RE-QUERY, the gap, sharpened by what we just learned
        query = r.next_query

    # 4. ANSWER, grounded in everything we gathered
    return generate(question, context)

def reflect(question: str, context: list[str]) -> Reflection:
    # The model returns STRUCTURED output, not prose, so the
    # loop can branch on it deterministically.
    prompt = (
        f"Question: {question}\n"
        f"Context so far:\n{chr(10).join(context)}\n\n"
        "Can you fully answer with this context? "
        "Reply as JSON: {\"enough\": bool, \"next_query\": str}"
    )
    return parse_reflection(llm_json(prompt))

MAX_STEPS is not optional

Without a hard step cap, a confused agent will re-query forever, burning tokens and latency on a question it can't answer. Always pair the loop with a budget (steps, tokens, or wall-clock) and a graceful "I couldn't fully resolve this" fallback.

GraphRAG: entities, relations, multi-hop

GraphRAG changes what you index. Instead of slicing documents into chunks and embedding them, you run an extraction pass that pulls out entities (CVE-2024-1234, the auth-lib package, the billing service, Acme Corp) and the relations between them (patched-by, depends-on, runs, owned-by). The result is a knowledge graph: nodes and labeled edges.

Now the multi-hop question from the opening becomes a traversal. Start at the CVE node, hop along patched-by to the library, along depends-on to the affected services, along runs to the customers, and along owned-by to the account managers. Four hops, each one a deterministic edge-walk, no guessing which chunk happened to mention all four facts together (none did).

  • Extraction, an LLM (or NER model) reads each document and emits (entity, relation, entity) triples. This is the expensive, error-prone step.
  • Storage, triples land in a graph store (Neo4j, a triple store) or a graph layer over your existing DB.
  • Retrieval, a question maps to a starting entity, then a traversal (or a generated graph query) walks the edges to gather a connected subgraph.
  • Generation, the subgraph, serialized to text, becomes the context the model answers from, with the full chain visible as citations.

The payoff is reasoning over connections that no single chunk contains. The price is the graph itself: extraction is lossy, schemas drift, and the graph must be kept in sync with your source data or it quietly answers from a stale world. See Building AI Agents for how the agent-as-orchestrator pattern wraps both retrieval styles behind one planner.

Common mistakes that cost hours (or dollars)

  1. Going agentic when classic suffices. A retrieve-reflect loop on a single-fact FAQ is 4x the cost and latency for an identical answer. Profile your real questions before adding loops, most are single-hop.
  2. No loop guardrail. Without a step/token/time budget, a stuck agent re-queries indefinitely. Every loop needs a hard cap and a "couldn't resolve" exit.
  3. Reflection that returns prose. If the model's self-assessment is free text, your loop can't branch on it reliably. Force structured output (JSON with enough + next_query) so control flow is deterministic.
  4. Underestimating graph upkeep. A knowledge graph is a second source of truth that decays. Budget for re-extraction when source docs change, and monitor for orphaned or stale nodes, a graph that lies is worse than no graph.
  5. Sparse-graph disappointment. GraphRAG only shines when entities are densely connected. If your docs are isolated facts with few relationships, the traversal finds nothing the vector store wouldn't, you paid the build cost for no multi-hop benefit.
  6. Skipping the trace. Both patterns are loops or walks; you cannot debug them blind. Log every query, every reflection decision, and every hop so you can see exactly where an answer went wrong.

Takeaways

The whole article in seven lines

  • Classic RAG = one query, top-k, once. It fails when the answer spans multiple documents.
  • Agentic RAG = the model loops: retrieve → reflect → re-query → answer, until it has enough.
  • GraphRAG = index entities and relations, then traverse edges to answer multi-hop questions.
  • Use agentic for open-ended, decompose-able questions; use a graph for relationship/multi-hop questions.
  • Default to classic RAG, it's the cheapest pattern that answers most single-hop questions.
  • Every agentic loop needs a hard budget (steps/tokens/time) and structured reflection output.
  • A graph is a second source of truth: budget for extraction errors and keeping it in sync.

Where to go next

Get the retrieval fundamentals solid before you add loops or graphs on top, re-ranking and hybrid search often fix "bad retrieval" cheaper than going agentic. Then layer in the agentic orchestration once you've confirmed your failures are multi-hop, not just low-recall.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.