OpenTelemetry in Practice

On this page

Every vendor wanted their own agent
One standard, three signals
The picture: how telemetry flows
Instrument a service: the steps
The four pieces, and what each does
Instrument a handler in code
Auto vs manual, and how context travels
Common mistakes that cost hours
Takeaways
Where to go next

Every vendor wanted their own agent

There was a time when adding observability to a service meant picking a vendor and surrendering. You installed *their* agent, imported *their* SDK, sprinkled *their* tracer all over your handlers, and shipped data to *their* backend in *their* proprietary format. Then a year later the bill tripled, or the tool got acquired, or a new team standardized on something else, and ripping that instrumentation back out touched every file you owned.

The instrumentation, the actual lines of code that say "this is a request, time it, tag it", was the expensive, hand-written part. And it was welded to a vendor. That coupling is exactly the problem OpenTelemetry (OTel) was built to kill: you instrument *once*, in a standard way, and decide *later* and *separately* where the data goes.

Who this is for

Backend, DevOps, and SRE folks who already know roughly what a trace or a metric is and now have to actually wire up observability, without marrying a vendor. If "trace," "span," and "metric" are brand new, read [Observability: Metrics, Logs & Traces](/blog/observability-metrics-logs-traces) first, then come back.

One standard, three signals

OpenTelemetry is a single, vendor-neutral set of APIs, SDKs, and tools for generating, collecting, and exporting telemetry, traces, metrics, and logs, so you instrument your code once and send it anywhere.
The mental model in one sentence

OTel covers three signals. Traces follow a single request as it hops across services (each hop is a *span*). Metrics are aggregated numbers over time, request rate, error count, latency histograms, queue depth. Logs are timestamped event records, now correlated to the trace that produced them. One project, one wire format, three kinds of data.

A wall socket shape that differs in every countryEach vendor's proprietary agent and data format

A universal travel adapter you plug into onceThe OTel API/SDK you instrument your code with once

Swapping the country plug on the far endSwapping the Collector's exporter to a new backend

Your laptop never knows which country it's inYour code never knows which vendor it's shipping to

OTel is a universal power adapter for telemetry.

The picture: how telemetry flows

Your app uses the OTel SDK to produce spans, metrics, and logs. It exports them over OTLP (the OpenTelemetry Protocol) to a Collector, a standalone process that receives, processes, and exports telemetry to one or more backends. The Collector is the seam that decouples your code from any vendor.

App + OTel SDK → OTLP → Collector (receive → process → export) → backends.

The three boxes inside the Collector, receiver, processor, exporter, are the pipeline. Everything to the left of the exporter is standard and vendor-neutral. Only the exporter knows where data ultimately lands, and swapping it is a config change, not a code change.

Instrument a service: the steps

1
Add the SDK
Install the language SDK plus the instrumentation packages for your web framework, HTTP client, and database driver.
2
Configure a resource
Set service.name, service.version, and deployment.environment. Without service.name your spans show up as "unknown_service" and nothing groups correctly.
3
Turn on auto-instrumentation
Let the agent wrap your framework and libraries so inbound requests, outbound calls, and DB queries produce spans automatically, zero handler edits.
4
Add manual spans where it matters
Wrap the business logic auto-instrumentation can't see, a pricing calculation, a batch job step, in your own spans with meaningful attributes.
5
Point the exporter at a Collector
Set OTEL_EXPORTER_OTLP_ENDPOINT to your Collector's OTLP address. Your app talks only to the Collector, never directly to a vendor.
6
Run the Collector
Deploy it as a sidecar or a shared service with a receive → process → export pipeline, and choose backends in its config.

The four pieces, and what each does

OTel is easier once you separate the parts. The API is what your code calls; the SDK is the engine behind it; OTLP is the wire format; the Collector is the routing layer. You can adopt them incrementally.

Component	What it is	What it does
API	Stable interfaces	What your code (and libraries) call to create spans, record metrics, and emit logs, no implementation, so libraries can depend on it safely.
SDK	The implementation	Plugs in behind the API: samples, batches, adds resource attributes, and exports. This is where you configure behavior.
OTLP	The wire protocol	The vendor-neutral format (gRPC or HTTP) for shipping all three signals from app to Collector, and Collector to backend.
Collector	Standalone proxy	Receives OTLP, processes it (batch, filter, redact, tail-sample), and exports to one or more backends. The vendor-decoupling seam.

The OpenTelemetry components and their jobs.

Why the API/SDK split matters

Libraries instrument against the **API** only. If no SDK is configured, those calls become cheap no-ops. That is what makes it safe for a database driver or web framework to ship OTel instrumentation by default, it costs nothing until *you* turn on an SDK.

Instrument a handler in code

Here is a minimal Python service: configure the SDK to export over OTLP, then wrap a request handler in a manual span with useful attributes. In real life auto-instrumentation would create the outer HTTP span for you, this shows the manual layer you add on top.

app.py

python

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
    OTLPSpanExporter,
)

# 1. Identify this service, then wire the SDK to export over OTLP.
resource = Resource.create({"service.name": "checkout"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(
    BatchSpanProcessor(
        # Talks to the Collector, NOT a vendor.
        OTLPSpanExporter(endpoint="http://otel-collector:4317")
    )
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("checkout")


def handle_checkout(cart):
    # 2. A manual span around the business logic.
    with tracer.start_as_current_span("checkout.process") as span:
        span.set_attribute("cart.items", len(cart.items))
        span.set_attribute("cart.total_eur", cart.total)
        try:
            receipt = charge(cart)
            span.set_attribute("payment.id", receipt.id)
            return receipt
        except PaymentError as e:
            # 3. Record the failure on the span so it shows up in the trace.
            span.record_exception(e)
            span.set_status(trace.Status(trace.StatusCode.ERROR))
            raise

Notice what is *not* here: no vendor name, no API key, no proprietary client. The code targets the OTLP endpoint of a Collector. Where the data goes after that is decided entirely in the Collector's config, which is the whole point.

Auto vs manual, and how context travels

Auto-instrumentation wraps the libraries you already use, the web framework, HTTP client, and database driver, so common operations produce spans without you editing handlers. It is the fastest way to get 80% of the value: every inbound request and outbound call becomes a span for free.

Manual instrumentation is where you add what auto can't infer: spans around your own business logic, custom attributes (customer tier, feature flag, cart size), and domain events. The right answer is almost always *both*, auto for the plumbing, manual for the meaning.

Context propagation

A trace only stays connected across services because the trace context travels with the request. When service A calls service B, the SDK injects the trace id and current span id into outbound HTTP headers (the W3C traceparent header); service B's SDK extracts them and makes its new spans children of A's. That handoff is context propagation, and it is what turns a pile of unrelated spans into one end-to-end trace. Auto-instrumentation handles inject/extract for standard protocols, you only do it by hand for custom transports or message queues. For the deeper story on how spans link into a tree, see Distributed Tracing.

Break the chain, lose the trace

If even one hop drops the `traceparent` header, a service that strips unknown headers, a queue you forgot to propagate through, the trace splits in two and the end-to-end view silently disappears. Propagation is all-or-nothing across a path.

Common mistakes that cost hours

Forgetting `service.name`. Spans land as "unknown_service" and nothing groups by service. Set the resource before anything else.
Exporting straight to a vendor from the app. You re-couple your code to a backend and lose the Collector's batching, retry, and redaction. Always export to a Collector.
Dropping context across a queue. Auto-instrumentation covers HTTP and gRPC, not your bespoke message format, inject and extract traceparent yourself for async hops.
No batching or sampling, then a surprise bill. Use a batch processor, and sample high-volume traces; 100% of everything at scale is expensive and rarely necessary.
Putting high-cardinality values in metric labels. User id or request id as a metric attribute explodes your time series. Those belong on spans, not metrics.
Treating logs as separate. Emit logs through OTel (or correlate via trace id) so a log line links back to the trace that produced it.

Takeaways

The whole article in seven lines

OpenTelemetry is the vendor-neutral standard: instrument once, send anywhere.
Three signals: traces (one request's journey), metrics (aggregates), logs (events).
Four pieces: API (what code calls), SDK (the engine), OTLP (the wire format), Collector (the router).
The Collector pipeline, receive → process → export, is the seam that decouples you from any vendor.
Use auto-instrumentation for plumbing, manual spans for business meaning. Do both.
Context propagation (the `traceparent` header) is what stitches spans into one end-to-end trace.
Export to a Collector, never straight to a vendor, that's what keeps you free to switch.

Where to go next

OTel is the *how* of observability; pair it with the *what* and the *why*. Start with the signals, then go deep on tracing, then bring it together on the SRE path.

Observability: Metrics, Logs & Traces, the three signals and when to reach for each.
Distributed Tracing, spans, parent/child links, and reading a trace across services.
The SRE career path, where instrumentation, SLOs, and incident response fit together.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

DevOps

Observability: Metrics, Logs & Traces (The Three Pillars)

Read

SRE

The Four Golden Signals: The Minimal Set of Metrics That Catch Almost Everything

Read

AI Engineering

LLMOps: Productionizing LLM Apps

Read