On this page
The gap between 'it runs' and 'it survives'
Every Kubernetes tutorial ends the same way: kubectl apply, a pod goes Running, you curl it, victory. Then you put it in production and discover everything the tutorial left out. One pod with a memory leak takes down its neighbours. A deploy goes out while a node is draining and you serve 502s. Traffic triples and nothing scales. A bad config gets applied cluster-wide because everything shares one namespace with god-mode access.
None of these are exotic. They're the default failure modes of a cluster run the way tutorials teach it. Production Kubernetes is mostly a handful of guardrails, resource requests and limits, health probes, autoscaling, disruption budgets, namespaces and RBAC, that the tutorial skipped because they're boring until the night they save you. This article is those guardrails, explained and shown in real YAML.
Who this is for
Engineers who can deploy to Kubernetes but haven't yet operated it under real traffic, real incidents, and real teammates. You should know what a Pod, Deployment, and Service are; we cover the production layer on top.
The first guardrail: requests and limits
A pod with no resource requests is a pod the scheduler is guessing about. Requests are what the pod is guaranteed, the scheduler uses them to decide which node has room. Limits are the ceiling, exceed CPU and you get throttled; exceed memory and you get OOMKilled. Without them, one greedy pod can starve every other pod on its node, and the scheduler can pack a node past the point of stability.
| Request | Limit | |
|---|---|---|
| Means | Guaranteed minimum | Hard ceiling |
| Used by | The scheduler (placement) | The kubelet (enforcement) |
| Hit CPU ceiling | - | Throttled (slowed) |
| Hit memory ceiling | - | OOMKilled (restarted) |
| Set too low | Pod evicted under pressure | Killed under normal load |
Memory limits are not like CPU limits
CPU is compressible, over the limit, you're just throttled and slow. Memory is not, over the limit, your pod is killed outright. Set memory requests and limits equal for critical workloads so the scheduler reserves exactly what the pod can use, and you never get surprise OOMKills under contention.
The second guardrail: liveness and readiness probes
These two probes sound similar and do opposite things, confusing them causes some of the nastiest production incidents. Readiness answers "should this pod receive traffic right now?" If it fails, the pod is pulled from the Service's load-balancing pool but left running. Liveness answers "is this pod broken beyond recovery?" If it fails, the pod is killed and restarted.
The classic outage: liveness that's too aggressive
Point liveness at a heavy endpoint, or set the timeout too tight, and a brief slowdown makes the probe fail, so Kubernetes kills the pod. Under load, every pod slows, every probe fails, and the cluster restarts your entire fleet mid-traffic-spike. Keep liveness cheap and forgiving; use readiness for the strict checks.
A production-grade Deployment
Here's a Deployment with all the guardrails wired in. Compare it to the three-line tutorial version, every extra block earns its place.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: payments
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # add before removing: zero-downtime
template:
spec:
containers:
- name: api
image: registry.example.com/api:v2
ports:
- containerPort: 8080
resources:
requests: { cpu: "250m", memory: "256Mi" }
limits: { cpu: "500m", memory: "256Mi" } # mem = request
readinessProbe: # gate traffic
httpGet: { path: /readyz, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe: # restart if wedged
httpGet: { path: /healthz, port: 8080 }
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 3
failureThreshold: 3 # forgiving, not trigger-happy
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: trueNote the deliberate asymmetry between the probes: readiness checks often (every 5s, starts at 5s) so traffic is gated tightly; liveness checks rarely (every 20s, starts at 15s, tolerates 3 failures) so a transient hiccup never triggers a needless restart. That single difference prevents the most common self-inflicted Kubernetes outage.
Autoscaling and disruption budgets
Three replicas is a guess. The Horizontal Pod Autoscaler adjusts replica count to match load, scale out when CPU climbs, back in when it's quiet, so you neither fall over at peak nor pay for idle capacity at 3am.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api
namespace: payments
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 65 }The HPA needs your requests set correctly, averageUtilization: 65 means 65% *of the request*. This is why the guardrails compound: get requests wrong and your autoscaler does the wrong thing. The other half of staying up during change is the PodDisruptionBudget, which protects you during *voluntary* disruptions like node drains and cluster upgrades.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api
namespace: payments
spec:
minAvailable: 2 # never let a drain take us below 2 pods
selector:
matchLabels: { app: api }Pro tip
Without a PDB, a routine node upgrade can evict all your pods at once, the cluster is just doing maintenance and has no idea your three pods all sat on the same node. minAvailable: 2 forces the drain to wait until replacements are ready. This is the cheapest reliability win in all of Kubernetes.
Namespaces and RBAC: blast-radius control
Running everything in default with shared credentials is how a single mistake becomes a cluster-wide outage. Namespaces partition the cluster into isolated environments, separate quotas, separate network policies, separate RBAC. RBAC then grants each human and service account the *least* access it needs. The payments team's CI should be able to deploy to the payments namespace and nothing else.
# A role scoped to ONE namespace, allowing only what's needed
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: payments
name: deployer
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: payments
name: payments-ci
subjects:
- kind: ServiceAccount
name: ci
namespace: payments
roleRef:
kind: Role
name: deployer
apiGroup: rbac.authorization.k8s.iocluster-admin is not a starting point
It's tempting to bind everything to cluster-admin to make errors go away. Don't. A leaked cluster-admin token is the entire cluster. Start from zero permissions and add exactly what each workload needs, a Role (namespace-scoped) beats a ClusterRole every time you can get away with it.
Common mistakes that cost hours
- No resource requests. The scheduler packs nodes blindly and one leaky pod starves the rest. Always set requests; set memory request = limit for critical pods.
- Liveness probe doing real work. A liveness check that hits the database or a slow endpoint turns a brief slowdown into a fleet-wide restart storm. Keep liveness trivially cheap.
- maxUnavailable left at the default during rollouts. The 25% default can drop a quarter of your capacity mid-deploy. Set it to 0 with a positive maxSurge.
- No PodDisruptionBudget. A routine node drain evicts all your replicas at once. A one-line PDB prevents the most surprising self-inflicted outage there is.
- HPA with wrong or missing requests. Utilization targets are a percentage of the request, wrong requests mean the autoscaler scales on a lie.
- Everything in default with broad RBAC. No isolation means no blast-radius control. Namespace per team/env, least-privilege RBAC, no casual cluster-admin.
- Ignoring `kubectl describe` and events. When a pod won't start, the answer is almost always in the events, ImagePullBackOff, OOMKilled, FailedScheduling. Read them first, guess never.
Takeaways
Production Kubernetes in seven lines
- Requests are for scheduling; limits protect the neighbours. Memory over limit = OOMKilled.
- Readiness gates traffic (no kill); liveness restarts a wedged pod (kill). Don't swap them.
- Keep liveness cheap and forgiving, aggressive liveness causes restart storms under load.
- maxUnavailable: 0 + maxSurge gives true zero-downtime rollouts.
- HPA scales on a percentage of the request, so correct requests come first.
- A PodDisruptionBudget is the cheapest reliability win, it survives node drains.
- Namespaces + least-privilege RBAC keep one mistake from becoming a cluster outage.
Where to go next
Production Kubernetes connects to how you decided to use containers, how you deliver to the cluster, and how you observe it once it's live:
- Docker vs Kubernetes: When to Use Each, make sure Kubernetes is actually the right tool first.
- GitOps: Declarative Delivery with ArgoCD & Flux, deliver these manifests safely and auditably.
- The Three Pillars of Observability, you can't operate what you can't see.
- Hands-on kubectl lab, practise probes, scaling, and reading events on a live cluster.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.