Vulnerability Management That Actually Works

On this page

The 10,000-CVE backlog
Think like an ER, not a to-do list
The lifecycle, end to end
Four signals, four different questions
Reachability: the multiplier nobody uses
Turn signals into SLAs
Filter the scanner output yourself
Patch, mitigate, or accept
Common mistakes that cost you months
Measure the program, not the noise
Where to go next

The 10,000-CVE backlog

Who this is for

Engineers and security folks who have *already* turned on scanning, Trivy, Snyk, Dependabot, a cloud posture tool, and are now drowning. You open the dashboard, see five-digit numbers, and have no honest way to decide what to touch first. This article is about prioritization and process, not how to run a scanner.

Here is the uncomfortable math. A typical container image plus its OS packages will surface hundreds to thousands of CVEs. Multiply by every service you ship and you are staring at tens of thousands of findings. Your team can realistically remediate a few dozen a sprint. So the entire game is not *finding* vulnerabilities, that part is solved and automated. The game is deciding which 1% actually matter and getting those fixed before an attacker notices.

Most teams fail here in the same way: they sort by CVSS, start at the 'Criticals', and grind. Six months later the backlog is *bigger*, morale is gone, and the one bug that actually got exploited was a CVSS 6.5 nobody looked at. Let's fix the method.

Vulnerability management is not a scan you run. It is a decision system that converts a stream of CVEs into a ranked, owned, time-bound list of work.

Think like an ER, not a to-do list

An emergency room never treats patients in the order they arrived, and it never treats them by how loudly they complain. It triages: a quiet patient turning blue jumps the queue ahead of a screaming one with a sprained wrist. Vulnerability management is the same discipline applied to CVEs.

The waiting room full of patientsYour raw scanner backlog of CVEs

How serious the injury *could* beCVSS base score (severity ceiling)

Vital signs, is this getting worse right now?EPSS, probability it gets exploited soon

Ambulances already inbound with this exact caseCISA KEV, it is being exploited in the wild today

Is the injured limb one the patient actually uses?Reachability, is the vulnerable code path even called?

Triage nurse assigning a color tagYour prioritization policy assigning an SLA

ER triage maps almost perfectly onto vulnerability prioritization.

The mental shift

CVSS tells you how bad a wound *could* be. It does not tell you who is bleeding. You need the vital signs (EPSS), the inbound ambulances (KEV), and whether the patient even uses that limb (reachability) before you spend a doctor on it.

The lifecycle, end to end

Scanning is one box in a six-box loop. A program is the whole loop running continuously, with an owner for each box. If any box is missing, findings either never get prioritized, never get assigned, or never get verified, and the backlog quietly rots.

The vulnerability-management lifecycle, a continuous loop, not a one-shot scan.

1
Discover
Continuously inventory what you run and what is in it. Scanners and an SBOM give you the raw findings; you cannot manage what you cannot see.
2
Enrich & prioritize
Join each CVE to threat intel (EPSS, KEV) and to your own environment (is it reachable? internet-facing? holding PII?). This is where the firehose becomes a shortlist.
3
Assign
Route each prioritized finding to the team that owns the asset, with an SLA attached. An unowned finding is a finding nobody fixes.
4
Remediate
Pick one of three outcomes: patch, mitigate, or formally accept. 'Ignore' is not on the menu.
5
Verify
Rescan and confirm the finding is gone. Trust nothing until the scanner agrees the version moved.
6
Report
Track MTTR and backlog age over time. Metrics tell you whether the program is winning or drowning, and they feed the next discovery cycle.

Four signals, four different questions

The core insight: CVSS, EPSS, KEV, and reachability answer different questions. They are not competing scores you pick one of, they are layers you stack. CVSS is the only one most teams use, and it is the weakest for prioritization because it is a static measure of *potential* impact, computed once when the CVE is published, with no knowledge of your environment or of what attackers are actually doing.

Signal	Answers	Source / range	Weakness alone
CVSS	How bad could this be?	FIRST, 0.0–10.0 static	Ignores real-world exploitation; ~60% of CVEs are 'High/Critical'
EPSS	How likely is exploitation in 30 days?	FIRST, 0–1 probability, daily	Probabilistic, not a guarantee; needs a threshold
CISA KEV	Is it being exploited right now?	CISA catalog, yes/no, updated often	Binary; absence is not proof of safety
Reachability	Is the vulnerable code path used here?	SCA tool / call-graph analysis	Tool-dependent; not all ecosystems supported

Each signal answers a question the others cannot. Use them together.

Why CVSS alone over-counts

Most CVEs never get a public exploit, and a tiny fraction are ever exploited at scale. EPSS data consistently shows that the *vast* majority of high-CVSS CVEs have a low probability of exploitation. If you patch by CVSS you spend 90% of your effort on bugs no attacker will ever touch, while a 'Medium' on the KEV list sits open.

Reachability: the multiplier nobody uses

A CVE in a dependency only matters if your code actually executes the vulnerable function. If you import a sprawling library but only call two helpers, a CVE in a code path you never touch is, for you, not exploitable. Reachability analysis (call-graph / data-flow based, offered by modern SCA tools) tells you whether the vulnerable symbol is on a path your application can reach.

This is the single biggest backlog reducer most teams have never turned on. In practice it can mark a large share of dependency findings as not-reachable, letting you defer them honestly rather than ignoring them blindly. It pairs naturally with the deeper SCA topic, see DevSecOps: SAST, DAST, SCA for how the scanners themselves work.

Reachability is a deferral signal, not a dismissal

'Not currently reachable' can flip the moment someone adds a new call site. Treat unreachable findings as *lower priority with a recheck on every build*, never as permanently closed. Reachability lowers urgency; it does not delete the vulnerability.

Turn signals into SLAs

Prioritization has to be a *policy*, not a vibe. Write down a small decision table that maps the combination of signals to a remediation SLA, publish it, and apply it mechanically. The point is to make the urgent obvious and to give teams air cover to *deprioritize* the rest without guilt.

Severity	On KEV?	Reachable?	SLA to remediate
Any	Yes	Yes / unknown	Drop everything, 24–72 hours
Critical / High	No	Yes (high EPSS)	7 days
Critical / High	No	No	30 days, recheck each build
Medium	No	Yes	30–60 days
Medium / Low	No	No	Batch quarterly or accept

An example prioritization policy. Tune the windows to your risk appetite, but write it down.

KEV is your trump card

If a CVE is on the CISA Known Exploited Vulnerabilities catalog, severity and reachability become tie-breakers, not gates. Known-exploited means attackers already have working code. That row jumps to the top no matter what CVSS says.

Filter the scanner output yourself

You do not need a six-figure platform to start. A scanner that emits JSON plus the public EPSS and KEV feeds gets you most of the way. Here is a small Python filter that takes a Trivy report and the two feeds, and prints only the findings that are known-exploited or high-EPSS, the rows your policy says are urgent.

fetch-feeds.sh

bash

# Scan an image to JSON, then grab the public threat-intel feeds.
trivy image --format json -o report.json myorg/api:latest

# CISA KEV catalog (known-exploited CVEs)
curl -s https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json \
  -o kev.json

# EPSS scores (exploit-prediction, one row per CVE)
curl -s https://epss.cyentia.com/epss_scores-current.csv.gz | gunzip > epss.csv

prioritize.py

python

import csv, json

EPSS_THRESHOLD = 0.10  # 10% chance of exploitation in 30 days

# --- load EPSS into {cve: probability} ---
epss = {}
with open("epss.csv") as f:
    for row in csv.reader(f):
        if row and row[0].startswith("CVE-"):
            epss[row[0]] = float(row[1])

# --- load the KEV catalog into a set of CVE ids ---
with open("kev.json") as f:
    kev = {v["cveID"] for v in json.load(f)["vulnerabilities"]}

# --- walk the Trivy report and keep only what matters ---
with open("report.json") as f:
    report = json.load(f)

urgent = []
for result in report.get("Results", []):
    for vuln in result.get("Vulnerabilities", []):
        cid = vuln["VulnerabilityID"]
        score = epss.get(cid, 0.0)
        on_kev = cid in kev
        # reachability would come from your SCA tool; treat missing as unknown
        reachable = vuln.get("Reachable", "unknown")
        if on_kev or score >= EPSS_THRESHOLD:
            urgent.append((cid, vuln["Severity"], on_kev, round(score, 3), reachable))

urgent.sort(key=lambda r: (not r[2], -r[3]))  # KEV first, then EPSS desc
print(f"{len(urgent)} urgent of report \u2014 KEV or EPSS >= {EPSS_THRESHOLD}\n")
for cid, sev, on_kev, score, reach in urgent:
    flag = "KEV" if on_kev else f"epss={score}"
    print(f"{cid:18} {sev:9} {flag:12} reachable={reach}")

Run that against a real image and the five-digit backlog usually collapses to a one-screen list. *That* list is what you assign and SLA, not the raw report. The same idea wires straight into CI; see container image security scanning for gating builds on it.

Patch, mitigate, or accept

Once a finding is prioritized, there are exactly three legitimate outcomes. 'Leave it open and hope' is not one. Every urgent finding must land in one of these buckets with a name and a date attached.

Patch, bump to a fixed version. Always the default. Cheapest long-term, and it actually removes the vulnerability instead of hiding it.
Mitigate, when you cannot patch now (no fix yet, breaking change, frozen release): reduce exploitability with a WAF rule, network policy, feature flag, or config change. Buys time; the underlying CVE is still there, so it keeps an SLA to patch later.
Accept, a formal, *time-boxed, signed-off* risk acceptance for low-risk findings (not reachable, low EPSS, not on KEV). Accepting risk is a legitimate decision; *silently ignoring* it is not. Record who accepted, why, and when it gets re-reviewed.

Do not prioritize by CVSS alone

If your remediation queue is sorted purely by CVSS, you are working the wrong list. You will burn sprints on high-severity bugs with near-zero exploitation probability while a known-exploited 'Medium' stays open. CVSS sets the *ceiling* of impact; EPSS, KEV, and reachability decide the *order*. Sort by the combination, never by CVSS by itself.

Common mistakes that cost you months

Treating the scanner as the program. Buying Trivy or Snyk and calling it 'vulnerability management'. The scanner is one box of six, without ownership, SLAs, and verification it just generates guilt.
Sorting by CVSS and starting at the top. The classic failure. You optimize for severity instead of risk and never reach the bugs attackers actually use.
No owner per finding. A finding routed to 'security' or to nobody never gets fixed. Map every asset to a team *before* findings arrive.
Ignoring reachability. Patching dead code paths is real effort spent on zero real risk. Turn reachability on and defer the unreachable majority.
Accept-by-silence. Letting findings age out without a decision. If you are not going to fix it, *formally* accept it with an expiry, don't let it quietly rot.
No verification step. Marking things 'done' from the ticket without rescanning. Half of 'fixed' findings come back because the base image never actually moved.
Vanity metrics. Reporting 'total findings' (which only ever goes up) instead of MTTR and backlog age, which tell you if you are actually winning.

Measure the program, not the noise

Two metrics tell you almost everything. MTTR (mean time to remediate), measured separately per severity and ideally for KEV findings on their own, tells you how fast the loop turns. Backlog age, how long open findings have been sitting, and how many are past SLA, tells you whether you are gaining or losing ground. Total finding count is a vanity metric; it grows forever and means nothing.

The whole article in seven lines

Finding CVEs is solved and automated; *prioritizing* them is the actual job.
CVSS measures potential impact, it is a weak priority signal on its own.
Stack four signals: CVSS (ceiling), EPSS (likelihood), KEV (exploited now), reachability (used here?).
KEV findings jump the queue regardless of CVSS; unreachable findings get deferred, not deleted.
Write prioritization down as a policy mapping signals to SLAs, then apply it mechanically.
Every urgent finding ends as patch, mitigate, or *formal* accept, never silent ignore.
Track MTTR and backlog age, not total finding count.

Where to go next

Vulnerability management sits in the middle of a wider security practice. Once your prioritization loop runs, push it left into the pipeline and back into the supply chain so fewer vulnerable artifacts reach production in the first place.

Container image security scanning, gate builds on the urgent-only list this article produces.
DevSecOps: SAST, DAST, SCA, the scanner families that feed discovery, and where reachability comes from.
Securing the software supply chain, SBOMs, provenance, and stopping vulnerable dependencies upstream.
Practice the pipeline mechanics in the CI/CD lab and the Docker lab.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

Security

Security as a Non-Functional Requirement

Read

Security

Zero Trust Networking for Beginners, From "Never Trust" to a Working Policy

Read

Security

What Is Application Security?

Read