Frontend Observability: RUM, Web Vitals, and Error Tracking
Your app works on your machine, but real users on a mid-range Android are suffering in silence. Here is how to see what the browser sees, Web Vitals, JS errors, session replay, and traces from click to backend.
You ship a feature, it works on your machine, the demo is flawless. Then a support ticket says 'the checkout button does nothing.' You can't reproduce it. Somewhere a real user on a mid-range Android, on a flaky 4G connection, with three ad-blockers, is suffering, and you have **zero** signal. This article is for frontend and SRE engineers who want eyes inside the browser: what real users experience, what's breaking, and why.
Backend observability is a solved conversation: metrics, logs, traces, dashboards. But the browser is the last dark mile. The code runs on hardware you don't own, on networks you can't test, in a runtime that varies wildly. Frontend observability closes that gap, it streams the real experience back to you so you stop guessing.
Synthetic tests tell you whether the app *can* be fast. Real User Monitoring tells you whether it *is* fast, for the people actually using it.
A dashcam for your app
A test-track lap in perfect weatherSynthetic monitoring, scripted, clean lab, repeatable
A dashcam recording every real commuteRUM, every real session, real device, real network
The crash you only understand from the footageSession replay of the broken session
The black box flight recorderError + Web Vitals telemetry beaconed on unload
Synthetic monitoring is the test-track lap; RUM is the dashcam recording every real drive.
Both matter. The test track catches regressions before release with zero noise. The dashcam catches what only happens at 9pm on a Samsung A14 in a basement. You want both running, but if you only have budget for one, the dashcam is where the truth lives.
Synthetic vs RUM, lab vs field
The single most useful mental model is lab vs field data. Synthetic (lab) runs in a controlled environment on a schedule. RUM (field) is sampled from real sessions. Google's own CrUX dataset that powers ranking signals is field data, so RUM isn't optional if you care about real outcomes.
Dimension
Synthetic (lab)
RUM (field)
Source
Scripted bot, fixed device/network
Real users, real devices/networks
Best for
Pre-release regression gates, uptime checks
Real distributions, the long tail, surprises
Noise
Low, deterministic, repeatable
High, needs aggregation and percentiles
Catches
Known flows breaking
Unknown unknowns, device-specific bugs
Latency to signal
Every run (e.g. every 5 min)
Continuous, but per-session
Weakness
Never sees the real long tail
Can't run before users exist (pre-prod)
Pick the right tool for the question you're asking.
Use percentiles, never averages
An average LCP hides the suffering. The p75 and p95 are where real users live. A 2.0s average can easily hide a p95 of 9s, that p95 is a quarter of your traffic having a bad time.
The telemetry pipeline
Everything flows the same shape: the browser collects three signal types, Web Vitals, errors, and traces, batches them, and beacons them to a collector. The collector enriches and samples, forwards to a backend, and the backend powers dashboards, SLOs, and alerts.
From the browser to alerts: one pipeline, three signal types.
1
Collect in the browser
Hook Web Vitals, global errors, and unhandled rejections. Optionally record a redacted session replay buffer.
2
Batch and sample on the SDK
Queue events and flush on visibilitychange/pagehide via sendBeacon so you never block navigation or lose data on unload.
3
Ingest at the collector
An OTel Collector or vendor ingest endpoint authenticates, enriches (geo, device, release), and applies server-side sampling.
4
Stitch the trace
The browser propagates a traceparent header to your API, so a single trace spans click → fetch → backend → DB.
5
Store, alert, replay
Aggregate into percentiles, drive SLOs and alerts, and link each error to its session replay for root cause.
Capturing Core Web Vitals from real users
The web-vitals library is the canonical way to measure the metrics Google actually ranks on. As of 2024, INP (Interaction to Next Paint) replaced FID as the responsiveness Core Web Vital, FID only measured the delay of the *first* input; INP measures the worst interaction latency across the whole visit, which is far closer to felt sluggishness. For the metric definitions and thresholds, see core web vitals and frontend performance.
src/observability/vitals.ts
typescript
import { onLCP, onINP, onCLS, onTTFB, onFCP, type Metric } from'web-vitals';
const ENDPOINT = '/rum/vitals';
// Buffer metrics and flush as one beacon, one request, not five.const queue = new Set<Metric>();
functionflush() {
if (queue.size === 0) return;
const body = JSON.stringify({
release: process.env.NEXT_PUBLIC_RELEASE,
url: location.pathname,
metrics: [...queue].map((m) => ({
name: m.name, // 'LCP' | 'INP' | 'CLS' | ...
value: m.value,
rating: m.rating, // 'good' | 'needs-improvement' | 'poor'
id: m.id, // stable per page load, dedupes retries
})),
});
queue.clear();
// sendBeacon survives unload; fetch with keepalive is the fallback.if (!navigator.sendBeacon(ENDPOINT, body)) {
fetch(ENDPOINT, { body, method: 'POST', keepalive: true });
}
}
exportfunctiontrackWebVitals() {
// reportAllChanges: false → report the final value (the one that ranks).const opts = { reportAllChanges: false };
onLCP((m) => queue.add(m), opts);
onINP((m) => queue.add(m), opts);
onCLS((m) => queue.add(m), opts);
onTTFB((m) => queue.add(m));
onFCP((m) => queue.add(m));
// INP/CLS finalize only on the way out, flush on hide, not unload.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') flush();
});
}
Flush on 'hidden', not 'unload'
INP and CLS are cumulative, their final value only exists when the user leaves. The classic mistake is sending on page load; you ship a half-measured number. Always flush on **visibilitychange → hidden** (and pagehide), which fires reliably on mobile where unload does not.
Tracking JS errors and unhandled rejections
Most production exceptions never reach you. A user hits one, the page is broken, they leave, no log, no ticket. Two global hooks catch the vast majority: window.onerror for synchronous throws and unhandledrejection for promises nobody caught (the silent killers, a failed await fetch() with no try/catch).
src/observability/errors.ts
typescript
const ENDPOINT = '/rum/errors';
const seen = new Set<string>(); // de-dupe identical errors per sessionfunctionreport(err: { message: string; stack?: string; kind: string }) {
const key = err.kind + ':' + err.message + ':' + (err.stack ?? '');
if (seen.has(key)) return; // don't beacon the same loop 500 times
seen.add(key);
const body = JSON.stringify({
...err,
release: process.env.NEXT_PUBLIC_RELEASE, // ties stack to source maps
url: location.href,
userAgent: navigator.userAgent,
ts: Date.now(),
});
navigator.sendBeacon(ENDPOINT, body);
}
exportfunctiontrackErrors() {
// Synchronous exceptions.
window.addEventListener('error', (e) => {
report({ kind: 'error', message: e.message, stack: e.error?.stack });
});
// The silent killers: rejected promises with no .catch / no try-await.
window.addEventListener('unhandledrejection', (e) => {
const reason = e.reason;
report({
kind: 'unhandledrejection',
message: reason?.message ?? String(reason),
stack: reason?.stack,
});
});
}
Stamp every event with the release
The single most valuable field is the **release/build id**. It tells you whether an error spiked with a deploy (instant rollback signal) and tells the backend which source map to use to symbolicate the stack. Set it once at build time and attach it everywhere.
Source maps: make stack traces readable
Production JS is minified, so a raw stack trace reads a.b is not a function at t (main.4f8c.js:1:24817), useless. Source maps reverse the minification back to your real file, function, and line. The catch: you must upload them privately to your error backend, never ship them to the browser.
Never serve source maps publicly
A public .map file hands attackers your readable, commented source, business logic, internal endpoints, sometimes secrets. Upload maps to your observability backend at deploy time and **strip the `//# sourceMappingURL` comment** (or delete the .map files) from what you serve. The backend symbolicates server-side; the browser never needs them.
deploy.sh
bash
# Build with source maps, then upload them privately and delete from the bundle.
npm run build # emits .js + .js.map, tagged with the release id# Upload maps to the error backend keyed by release (Sentry shown as example).
npx sentry-cli sourcemaps upload \
--release "$RELEASE" \
--url-prefix "~/_next/static" \
./.next/static
# Critically: do NOT deploy the .map files to your CDN.
find ./.next/static -name '*.map' -delete
Session replay and PII redaction
Session replay reconstructs what the user saw, not a video, but a lightweight stream of DOM mutations plus input/scroll/click events, re-rendered later. It turns 'the button does nothing' into a watchable recording with the console and network attached. It's the closest thing to looking over the user's shoulder.
Redact PII before it leaves the browser
Replay can capture everything a user types, emails, card numbers, health data. Redaction must happen **client-side, before the data is sent**, or you've built a compliance breach. This is a legal requirement under GDPR/CCPA, not a nice-to-have.
Mask all inputs by default, block-list is safer than allow-list; opt specific fields *in*, not *out*.
Class-based masking, tag sensitive nodes (.rr-block, .sensitive) so they record as a redacted block, never their contents.
Scrub network bodies, strip auth headers, tokens, and request/response payloads with PII before they attach to the replay.
Sample replay hard, full-fidelity recording is heavy; record a small percentage of sessions and 100% of error sessions (record-on-error).
Distributed tracing into the browser
The most powerful move is extending your backend traces into the browser. When the SDK starts a span for a user click and propagates a W3C `traceparent` header on the resulting fetch, your backend continues the *same* trace. Now one waterfall shows: click → 40ms of JS → 220ms fetch → 180ms API handler → 90ms DB query. The slow checkout becomes a single, undeniable picture instead of a finger-pointing match between frontend and backend.
src/observability/trace.ts
typescript
// Wrap fetch so every request carries the active span's trace context.// The backend reads traceparent and continues the SAME trace.import { context, propagation, trace } from'@opentelemetry/api';
const realFetch = window.fetch;
window.fetch = (input, init = {}) => {
const tracer = trace.getTracer('browser');
const span = tracer.startSpan('http.client');
const headers = newHeaders(init.headers);
// Inject W3C traceparent (and tracestate) into the outgoing request.
propagation.inject(trace.setSpan(context.active(), span), headers, {
set: (carrier, k, v) => (carrier as Headers).set(k, v),
});
returnrealFetch(input, { ...init, headers })
.then((res) => {
span.setAttribute('http.status_code', res.status);
return res;
})
.finally(() => span.end());
};
To make the waterfall trustworthy you need to understand what the 40ms of browser work actually is, main-thread blocking, layout, paint. The mental model lives in how the browser renders a page.
Sampling and cost control
RUM bills by event volume, and a popular site generates a flood. The goal is to keep statistically valid signal while throwing away redundant data. The art is sampling the *boring* and keeping the *interesting*.
Sample by category, not uniformly
Sample healthy Web Vitals at, say, 10%, the aggregate percentiles stay accurate at scale. But keep **100% of errors** and **100% of error-session replays**: those are rare and each one is precious. A flat 10% across everything throws away nine out of ten of your most valuable signals.
Head sampling in the SDK for vitals, cheap, decided before sending, keeps client overhead and bills low.
Tail sampling at the collector for traces, decide *after* seeing the whole trace, so you keep every slow or errored one and drop the fast-and-boring.
Always-keep rules for errors, slow INP/LCP outliers, and any session that hit an exception.
Cap per-session events so one runaway error loop can't beacon 10,000 events and blow the budget.
What to alert on (and what not to)
Alert on user-centric symptoms, not internal vanity metrics. The discipline of alerting on symptoms over causes is the same one covered in security logging and monitoring, page a human only when real users are hurting.
INP p75 crosses 200ms, responsiveness is degrading; interactions feel laggy.
LCP p75 crosses 2.5s, the page feels slow to load for a quarter of users.
JS error rate spikes vs the 7-day baseline, usually a regression in the last deploy; cross-reference the release id.
A new error type appears post-deploy, even at low volume, a brand-new signature right after a release is a rollback flag.
Crash-free session rate drops below SLO (e.g. < 99.5%), your top-line frontend health number.
Turn metrics into SLOs
An SLO like '99% of sessions have good LCP over 28 days' gives you an error budget. Spend it on shipping fast; when it's burning too quickly, the burn-rate alert tells you to slow down and fix. That's how raw RUM data becomes an engineering decision instead of a noisy dashboard nobody reads.
Common mistakes that cost hours
Averaging Web Vitals. The mean hides the suffering tail. Always report p75/p95.
Sending vitals on load. INP and CLS aren't final yet, you ship wrong numbers. Flush on visibilitychange → hidden.
Forgetting `unhandledrejection`. Half your real errors are uncaught promises, and onerror never sees them.
Shipping source maps to the CDN. Stack traces become readable, for attackers too. Upload privately, delete from the bundle.
Recording replays without redaction. That's a compliance incident waiting to happen. Mask inputs by default, before sending.
Flat-sampling everything. You throw away your rarest, most valuable errors. Keep 100% of errors, sample the healthy traffic.
Not stamping the release id. Without it you can't tie a spike to a deploy or pick the right source map.
Blocking navigation with synchronous reporting. Use sendBeacon / keepalive so telemetry never slows the user.
Takeaways
The whole article in nine lines
Synthetic is the lab (regression gates); RUM is the field (real truth). Run both.
Measure with the web-vitals library; INP replaced FID in 2024 for responsiveness.
Report percentiles (p75/p95), never averages, the tail is where users suffer.
Catch errors with `onerror` **and** `unhandledrejection`; de-dupe and beacon them.
Upload source maps privately so traces are readable; never serve .map files publicly.
Session replay is gold for root cause, but redact PII client-side before sending.
Propagate `traceparent` so one trace spans click → API → DB.
Sample healthy traffic; keep 100% of errors and error-session replays.
Alert on user symptoms (INP, LCP, error rate) and wrap them in SLOs with error budgets.
Where to go next
Observability tells you *that* something is slow or broken; the next step is knowing *why* and fixing it. Start with the metrics themselves, then go deeper into the rendering pipeline, then connect it to the broader monitoring discipline.
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.