DORA Metrics: Measuring Delivery Performance

On this page

Why "are we shipping well?" is so hard to answer
What DORA actually measures
The picture: one change, four measurements
The four keys at a glance
Computing the metrics from CI + git data
Improving each metric, without gaming it
Don't game the metrics: measure the system, not the people
Common mistakes that cost hours
Takeaways
Where to go next

Why "are we shipping well?" is so hard to answer

Someone in leadership asks: "Is our delivery getting faster or slower?" The room goes quiet. One person says deploys feel slow. Another says they feel fine. A third pulls up a Jira burndown that measures something else entirely. Everyone has a vibe; nobody has a number. The conversation ends with "let's circle back" and nothing changes.

DORA metrics end that argument. They come out of the DevOps Research and Assessment program (the team behind the *Accelerate* book and the annual *State of DevOps* reports), which spent years correlating engineering practices with business outcomes. The headline finding: four metrics, and only four, reliably separate high-performing teams from the rest. They are simple to define, hard to fake when measured honestly, and they pull in the same direction as the things you actually care about: speed and stability.

Who this is for

Senior engineers, tech leads, and platform/SRE folks who want to measure delivery performance with numbers instead of vibes. You should already know what a CI/CD pipeline and a production incident are. No statistics background needed, if you can run a git command and read a SQL query, you're set.

What DORA actually measures

DORA's four keys answer two questions at once: how fast can you deliver change (throughput), and how safely (stability)? Elite teams are good at both, speed and stability are not a trade-off.
Paraphrasing the Accelerate / State of DevOps findings

The most common mistake is treating speed and stability as a dial you slide between, "we could ship faster if we accepted more breakage." DORA's data says the opposite: the same practices (small batches, automated testing, fast pipelines) make you both faster *and* safer. The four metrics split neatly into those two halves.

Speedometer, how fast are you going?Deployment frequency, how often you ship to prod

Time from key-turn to highway speedLead time for changes, commit to production

How often the trip ends in a breakdownChange failure rate, % of deploys that break prod

How long the breakdown keeps you strandedRecovery time (MTTR), time to restore service

DORA is your delivery dashboard, the same gauges every car has.

The picture: one change, four measurements

All four metrics are timestamps and counts pulled from the same journey: a commit travels through a pipeline, deploys to production, and occasionally causes an incident that someone has to recover from. The diagram below overlays each metric on the point in the flow where it's measured.

A single change flowing from commit to deploy to incident, with the four DORA metrics overlaid on where each is measured.

1
Pick your two source systems
Almost everything comes from your version control (git) and your CI/CD tool. Incidents come from your alerting/on-call tool (PagerDuty, Opsgenie) or an incidents table. You rarely need anything else.
2
Define a "deploy" precisely
Agree on one signal that means "this reached production", a successful prod pipeline run, a GitHub deployment event, or a release tag. Every metric anchors to this definition, so write it down.
3
Capture commit and deploy timestamps
Lead time = deploy time minus the timestamp of the earliest commit in that release. Deploy frequency = count of prod deploy events per day/week.
4
Capture failures and recoveries
A deploy that triggers a rollback, hotfix, or incident is a failure. Recovery time = restored timestamp minus incident-start timestamp.
5
Aggregate over a rolling window
Report all four as a trailing 30- or 90-day rolling figure. Single data points are noise; the trend is the signal.

The four keys at a glance

Here are the four metrics, how each is measured, and what DORA's "elite" tier looks like. Use elite as a north star, not a stick, most teams should aim to move *up one tier* before chasing the top.

Metric	How it's measured	What "elite" looks like
Deployment frequency	Count of successful production deploys per unit of time	On-demand, multiple deploys per day
Lead time for changes	Time from code committed to that code running in production	Less than one day
Change failure rate	% of deploys that cause a failure needing remediation (rollback/hotfix/incident)	0–15%
Recovery time (MTTR)	Time from a failed deploy / incident start to service restored	Less than one hour

The four DORA metrics: definition, measurement, and the elite benchmark.

Pro tip

The first two metrics (frequency, lead time) measure **throughput**; the last two (failure rate, recovery) measure **stability**. A healthy team improves both columns together. If one column races ahead while the other rots, you're optimizing in isolation.

Computing the metrics from CI + git data

You don't need a commercial dashboard to start. If your deploys and commits land in a database (or you export CI logs to one), two SQL queries get you the throughput half. Assume a deployments table, one row per successful prod deploy, with the deploy time and the earliest commit time for that release.

dora_throughput.sql

sql

-- Deployment frequency: prod deploys per day over the last 30 days
SELECT
  date_trunc('day', deployed_at) AS day,
  count(*)                       AS deploys
FROM deployments
WHERE environment = 'production'
  AND deployed_at >= now() - interval '30 days'
GROUP BY day
ORDER BY day;

-- Lead time for changes: median commit -> deploy, last 30 days
-- first_commit_at = timestamp of the earliest commit in the release
SELECT
  percentile_cont(0.5) WITHIN GROUP (
    ORDER BY extract(epoch FROM deployed_at - first_commit_at) / 3600
  ) AS median_lead_time_hours
FROM deployments
WHERE environment = 'production'
  AND deployed_at >= now() - interval '30 days';

No database yet? You can approximate lead time straight from git and your CI tool with a few shell commands. This pulls the merge time of the last commit and the time the deploy tag was pushed, crude, but enough to see a trend on day one.

lead_time.sh

bash

#!/usr/bin/env bash
set -euo pipefail

# Earliest commit time on this release (epoch seconds)
commit_epoch=$(git log -1 --format=%ct "${RELEASE_BASE:-HEAD~1}")

# Deploy time = now (run this at the end of your prod deploy job)
deploy_epoch=$(date +%s)

lead_seconds=$(( deploy_epoch - commit_epoch ))
lead_hours=$(awk "BEGIN { printf \"%.1f\", ${lead_seconds} / 3600 }")

echo "lead_time_hours=${lead_hours}"

# Emit a metric your monitoring stack can scrape/aggregate
curl -fsS -X POST "${METRICS_URL}/dora/lead_time" \
  -d "value=${lead_hours}" -d "service=${SERVICE_NAME}" || true

Watch out

Pick a **median or 90th percentile**, never a mean. One stuck release that sat in review for three weeks will drag a mean lead time into nonsense while the median stays honest. The same goes for recovery time, averages hide your worst incidents.

Improving each metric, without gaming it

The genius and the danger of DORA is that all four are easy to move *on paper*. Make deploys tiny and trivial and your frequency soars. Stop counting hotfixes as failures and your failure rate drops to zero. Redefine "recovered" loosely and MTTR shrinks. Each of these makes the dashboard greener while delivery gets no better, or gets worse. Real improvement comes from changing the *system*, not the *counting*.

Deployment frequency, make deploys boring

The honest lever is batch size. Smaller, more frequent merges to main reduce risk per deploy and naturally raise frequency. Pair this with trunk-based development and feature flags so half-finished work can ship dark. The gaming version, splitting one change into ten no-op deploys, inflates the number without shipping value, so anchor frequency to deploys that contain real merged changes.

Lead time, shorten the wait, not the work

Most lead time is *waiting*, not *working*: code sitting in review, builds queuing, manual approval gates. Attack the queue, faster CI/CD pipelines, smaller PRs, automated tests replacing manual sign-off. The anti-pattern is measuring lead time from PR-open instead of first-commit, which quietly hides the days work sat on a branch. Measure from the commit.

Change failure rate, fewer failures, honestly counted

Lower it with better tests, progressive delivery (canary catches bad releases before full rollout), and strong pre-prod environments. The gaming version is reclassifying incidents as "not really a failure" or excluding hotfixes. Define a failure once, in writing, *any* deploy needing remediation, and never relitigate it during a bad week.

Recovery time (MTTR), practice getting back up

The biggest lever is the ability to roll back instantly, a one-command revert or an automated rollback on health-check failure beats any heroic forward-fix. Good runbooks, clear ownership, and observability that points at the cause all shave minutes. Don't game it by marking incidents "resolved" the moment the alert clears while users are still hurting, anchor "recovered" to a real service-health signal.

Don't game the metrics: measure the system, not the people

The fastest way to ruin DORA metrics is to turn them into individual performance targets. The moment "deploy frequency" lands on someone's review, you'll get deploy-frequency, and nothing else. This is Goodhart's Law: when a measure becomes a target, it stops being a good measure. DORA metrics are diagnostics for a *delivery system*, not scorecards for *humans*.

Report all four together. Any single metric in isolation invites a tradeoff that wrecks another, fast deploys with a soaring failure rate is not progress.
Track trends, not absolutes. "We cut lead time 40% this quarter" is the win. Hitting an arbitrary elite threshold is not.
Keep definitions stable. Changing what counts as a failure or a deploy mid-stream makes every comparison meaningless. Decide once, document it.
Tie metrics to outcomes, not bonuses. Use them to find bottlenecks and justify investment in tooling, never to rank engineers.

Common mistakes that cost hours

Using averages instead of percentiles. One outlier release or one bad incident skews the mean into a lie. Use median and p90.
Measuring lead time from PR-open. That hides the days a branch sat untouched. Measure from the first commit in the change.
Counting non-prod deploys. Staging deploys inflate frequency without shipping anything to users. Filter to the production environment only.
Quietly redefining "failure." Excluding hotfixes or reclassifying incidents during a rough week corrupts the one metric people watch most.
Turning the dashboard into a leaderboard. Per-team or per-person rankings guarantee gaming. Aggregate at the system level.
Chasing elite on day one. Moving up *one tier* is the realistic goal. Skip the vanity sprint to multiple-deploys-per-day on a codebase that isn't ready.

Takeaways

DORA metrics in eight lines

Four metrics, two halves: deployment frequency + lead time (throughput), change failure rate + recovery time (stability).
Speed and stability are not a trade-off, the same practices improve both.
All four come from two systems you already have: git/CI and your incident tool.
Define a "deploy" and a "failure" once, in writing, and never move the goalposts.
Use medians and p90, never means, outliers ruin averages.
Report all four together and track trends, not absolute thresholds.
Improve the system (smaller batches, faster pipelines, instant rollback), not the counting.
Never make DORA metrics an individual performance target, Goodhart's Law will punish you.

Where to go next

DORA metrics measure the pipeline, so the next step is making that pipeline faster and safer. Start with the mechanics, then the deployment patterns that move failure rate and recovery time directly.

CI/CD Fundamentals: What a Pipeline Really Does, the throughput half of DORA lives in your pipeline; this is how it works.
Deployment Strategies: Blue-Green, Canary & Progressive Delivery, the patterns that cut change failure rate and recovery time.
Practice the plumbing in the CI/CD lab and the Git lab, both feed the timestamps your metrics depend on.
Follow the full DevOps Engineer career path to put measurement, pipelines, and delivery practices together.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

DevOps

What DevOps Actually Is (It's Not a Job Title)

Read

DevOps

CI/CD Fundamentals: What a Pipeline Really Does

Read

DevOps

Your First CI Pipeline with GitHub Actions

Read