Distributed Tracing: Following a Request Across Services
One request, ten services, no idea which hop is slow. Distributed tracing stitches the whole journey into a single timeline so you can find the bottleneck in seconds, not hours.
A user hits Checkout. The page takes nine seconds to load. Behind that one click, the request fans out: the API gateway calls the orders service, which calls the inventory service, which calls the payments service, which talks to a database, a cache, and a third-party fraud API. Ten hops, maybe more.
Your dashboards say everything is "healthy." CPU is fine. Every service reports a green light. Yet the request is slow. So which hop ate the eight seconds? Without a way to follow that *one* request across *every* service, you are reduced to grepping logs in seven repos and guessing. That guessing is what costs you the night.
Metrics tell you *something* is slow. Logs tell you what one service did. Distributed tracing tells you the story of a single request end to end, who called whom, in what order, and exactly how long each hop took.
Who this is for
Backend and platform engineers running more than one service who keep asking "where did the time go?" If you have ever stared at a slow request and had no idea which service to blame, this is for you. No prior tracing experience assumed.
What distributed tracing actually is
A distributed trace is the complete, time-ordered record of a single request as it travels through every service that touches it, broken into spans, where each span is one unit of work with a start time, a duration, and a parent.
The key word is *single*. A trace is not a summary of a thousand requests like a metric is. It is one request, reconstructed across process and network boundaries, so you can see the whole timeline as if it had happened in a single function call.
Tracking number on a parcelTrace ID, one unique ID for the whole request
Each depot scan (arrived, sorted, departed)A span, one unit of work in one service
Timestamp on each scanSpan start time + duration
"Out for delivery from Depot B"Parent/child link between spans
The full scan history pageThe flame / waterfall view of the trace
A trace is a parcel-tracking number that records every depot the package passed through.
When a parcel is late, you do not call every depot. You open the tracking page and instantly see it sat in Depot B for two days. Tracing gives your requests the same page.
The picture: one request, one trace, many spans
A single request creates one trace. Each service it touches adds a child span, nested under the span of its caller.
Every service emits its own spans to a collector (the dashed lines). The collector groups them by trace ID and reassembles the tree. Here is how one request becomes that tree:
1
The gateway starts a trace
The first service to see the request finds no incoming trace context, so it mints a new trace ID and a root span. That root span's clock starts now.
2
It propagates context downstream
Before calling Orders, the gateway injects the trace ID and its own span ID into the outgoing request, typically the W3C `traceparent` HTTP header.
3
Each service creates a child span
Orders reads the incoming `traceparent`, sees it belongs to an existing trace, and opens a child span whose parent is the gateway's span. It repeats the inject step when it calls Payments.
4
Leaf work gets its own span
The database query, the cache lookup, the external API call, each becomes a short leaf span so you can see exactly how long the I/O took.
5
Spans are exported and stitched
As each span ends, it is sent to the collector. The collector keys everything off the shared trace ID and rebuilds the parent/child tree, the waterfall you read later.
The four words you have to know
Tracing has a small vocabulary. Get these four straight and everything else falls into place.
Concept
What it is
Scope
Trace
The whole request, end to end, a tree of spans sharing one trace ID.
One request, all services
Span
One unit of work: a name, start time, duration, status, and key/value attributes.
One operation in one service
Span context
The small bundle (trace ID + span ID + flags) passed across boundaries so the next service can link its span to yours.
Carried in headers between hops
Sampling
The decision to keep or drop a trace, so you store useful traces without paying to keep all of them.
Per trace, at the start or the end
The core tracing concepts and what each one is responsible for.
Pro tip
A trace is the *what happened*. A span is the *one thing that happened*. Span context is the *envelope* that keeps spans from different services in the same story. Sampling is the *budget*.
Context propagation: the part everyone breaks
A trace only holds together if the span context travels with the request across every boundary. Drop it on one hop and the trace splits in two, the downstream work shows up as a separate, orphaned trace, and your waterfall has a hole exactly where you needed to look.
The good news: with OpenTelemetry's auto-instrumentation, propagation is usually automatic for standard HTTP and gRPC clients. The example below shows what it does under the hood, extract context on the way in, start a child span, inject context on the way out, so you can fix it when a custom client or a queue breaks the chain.
orders_service.py
python
from opentelemetry import trace
from opentelemetry.propagate import extract, inject
import requests
tracer = trace.get_tracer("orders-service")
defhandle_checkout(request):
# 1. EXTRACT the incoming span context from request headers.# If a traceparent header is present, this span joins the# existing trace; if not, a brand-new trace is started.
ctx = extract(request.headers)
# 2. Start a CHILD span under that context.with tracer.start_as_current_span("checkout", context=ctx) as span:
# Attributes make the span searchable + readable later.
span.set_attribute("order.id", request.order_id)
span.set_attribute("user.tier", request.user_tier)
# 3. INJECT the current context into OUTGOING headers so the# payments service can continue the same trace.
outgoing = {}
inject(outgoing) # writes the traceparent header for us
resp = requests.post(
"http://payments/charge",
json={"order_id": request.order_id},
headers=outgoing, # <-- without this, the trace breaks here
)
if resp.status_code >= 500:
span.set_status(trace.StatusCode.ERROR)
return resp
The three labeled steps, extract, start child span, inject, are the whole pattern. Every framework integration is just an automated version of these three lines. When a trace breaks, it is almost always because one of these three was skipped on one hop.
Sampling: head vs tail
At scale you cannot keep every trace, a busy service produces millions an hour, and storing them all is expensive and mostly useless (who reads the trace of a fast, successful request?). Sampling decides which traces to keep. There are two moments to make that call.
Head-based sampling
The keep/drop decision is made at the start, by the first service, before anything has happened. Usually a fixed percentage, "keep 5% of traces." That decision rides along in the span context flags, so every downstream service honors it and the trace stays whole. It is cheap and simple, but it is blind: it might drop the one slow, failing request you actually needed, because the decision was made before the request went wrong.
Tail-based sampling
The decision is made at the end, after all spans for a trace have arrived at the collector, when you can see the whole picture: "keep every trace that errored or took over one second, plus 1% of the rest." This keeps exactly the interesting traces, but the collector must buffer all spans for a trace in memory until it is complete, which costs more compute and infrastructure. Most teams start head-based and graduate to tail-based once they care about catching rare failures.
Reading the waterfall to find the slow hop
Open a trace in Jaeger, Tempo, or your APM and you get a waterfall (also drawn as a flame graph): each span is a horizontal bar, the x-axis is time, and child spans nest indented under their parent. Reading it is a skill, and it is fast once you know where to look.
Scan the bar lengths, not the names. The longest bar is your suspect. Width is duration; ignore everything that is thin.
Find the gap. If a parent span is long but its visible children are all short, the missing time is work *inside* that service (not a downstream call), slow code, lock contention, GC.
Look for serial vs parallel. Bars that stack diagonally one after another are sequential calls; if three downstream calls could run in parallel but render staircase-style, you found an easy win.
Read the attributes on the slow span. A two-second DB span with db.statement set tells you the exact query. A span with no attributes tells you nothing, which is why attributes matter.
Check the status colors. A red span marks where the error originated; the bars to its left are just the callers waiting on it.
Pro tip
Ninety percent of latency hunts end the same way: the widest bar, two levels deep, is a single slow database query or external API call. The trace points straight at it.
Common mistakes that cost hours
Broken propagation. A custom HTTP client, a message queue, or a background job that does not forward traceparent silently splits the trace. Your waterfall ends abruptly and the rest shows up as orphans. Always inject context on every outbound call, including async ones.
Sampling at 100% in production. Tempting ("keep everything!"), but it floods your collector and your storage bill. The cost is real and rarely worth it. Sample head-based at a few percent, or go tail-based to keep only the interesting traces.
Spans with no attributes. A span named query with zero attributes tells you it was slow but not *what* was slow. Add db.statement, http.url, order.id, user.tier, the fields you will want to filter and read by at 2 a.m.
Too many trivial spans. Wrapping every tiny function in a span makes the waterfall unreadable and inflates cost. Span the boundaries that matter: inbound requests, outbound calls, DB queries, queue operations.
Treating traces like logs. Do not dump giant payloads into span attributes. Attributes are for low-cardinality, searchable facts, not for full request bodies or secrets.
Takeaways
The whole article in seven lines
A **trace** is one request reconstructed across every service; a **span** is one unit of work inside it.
Spans share a **trace ID** and link through **span context** carried in the `traceparent` header.
Context propagation is **extract → start child span → inject**, break any one on any hop and the trace splits.
**Head sampling** decides at the start (cheap, blind); **tail sampling** decides at the end (keeps the interesting traces, costs more).
Read the **waterfall** by bar width: the widest bar two levels deep is usually your culprit.
**Attributes** turn a slow span into an actionable one, name the query, the URL, the IDs.
Never sample at 100% in prod, and never forget propagation on queues and background jobs.
Where to go next
Tracing is one of the three pillars of observability, it answers *where* a request slowed down, while metrics tell you *that* something is wrong and logs tell you *what* a service did. Read them together.
OpenTelemetry in Practice, the vendor-neutral way to actually emit the spans, attributes, and context shown here.
SRE career path, where tracing sits in a broader reliability practice: SLOs, alerting, and incident response.
Next time a request is slow across ten services, you will not grep seven repos. You will open one trace, scan for the widest bar, and read the attribute that names the slow hop. That is the whole point.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.