Site Reliability Engineer (SRE)
From Linux internals to running planet-scale systems — the complete path to a job-ready Site Reliability Engineer.
Curated from the best — MDN · Kubernetes · AWS · OWASP · Google SRE & more
SREs command $180K-$350K+ at FAANG. The role is expanding as every company realizes they need production excellence, not just more features.
The complete path — 12 of 133 topics have lessons here; the other 121 are marked learn anywhere. We won't pretend we cover everything.
Foundations: What SRE Is
Understand the discipline, its origins, and how SRE differs from and relates to DevOps and traditional ops.
Linux & Operating System Internals
The OS is the substrate of everything SRE. Master processes, memory, I/O, and the kernel boundary.
Networking Fundamentals
Distributed systems are networked systems. Know the stack from cables to TLS to load balancers.
Programming & Automation
SREs write software. Build solid coding skills plus the scripting that automates operations away.
Distributed Systems Theory
The mental models behind why large systems fail in surprising ways.
Cloud Platforms
Modern SRE runs on the cloud. Know the core service categories and at least one provider deeply.
Containers
Packaging and isolating workloads — the foundation under orchestration.
Kubernetes & Orchestration
The dominant orchestration platform. Operating it well is central to most SRE roles.
Infrastructure as Code & Config Mgmt
Declarative, version-controlled infrastructure is non-negotiable at scale.
CI/CD & Release Engineering
Safe, fast, repeatable delivery is how reliability ships to production.
Observability & Monitoring
You cannot operate what you cannot see. The instrumentation core of SRE.
SLIs, SLOs & Reliability Engineering
The quantitative heart of SRE: defining, measuring, and budgeting reliability.
Incident Management & On-Call
When things break, this is the SRE's defining moment. Respond, mitigate, and learn.
Capacity, Performance & Scalability
Ensuring systems have the headroom to serve load — and finding bottlenecks when they don't.
Resilience & Chaos Engineering
Designing systems that degrade gracefully and proving it deliberately.
Databases & Stateful Systems
Stateful services are the hardest to operate reliably — and where outages hurt most.
Security & Compliance
Reliability includes security. SREs own the operational side of keeping systems safe.
Platform Engineering & Service Mesh
Advanced infrastructure SREs increasingly build internal platforms and run meshes.
Automation, AIOps & Modern Practice
Where SRE is heading: heavy automation, ML-assisted ops, and running AI systems.
Career, Interviews & Soft Skills
Landing and thriving in an SRE role takes more than technical depth.
You're job-ready.
Clear every stage, earn the certificate, and walk into interviews prepared. The complete path — nothing hidden, no gaps.