Kubernetes in Production: Beyond the Tutorial

On this page

The gap between 'it runs' and 'it survives'
The first guardrail: requests and limits
The second guardrail: liveness and readiness probes
A production-grade Deployment
Autoscaling and disruption budgets
Namespaces and RBAC: blast-radius control
Common mistakes that cost hours
Takeaways
Where to go next

The gap between 'it runs' and 'it survives'

Every Kubernetes tutorial ends the same way: kubectl apply, a pod goes Running, you curl it, victory. Then you put it in production and discover everything the tutorial left out. One pod with a memory leak takes down its neighbours. A deploy goes out while a node is draining and you serve 502s. Traffic triples and nothing scales. A bad config gets applied cluster-wide because everything shares one namespace with god-mode access.

None of these are exotic. They're the default failure modes of a cluster run the way tutorials teach it. Production Kubernetes is mostly a handful of guardrails, resource requests and limits, health probes, autoscaling, disruption budgets, namespaces and RBAC, that the tutorial skipped because they're boring until the night they save you. This article is those guardrails, explained and shown in real YAML.

Who this is for

Engineers who can deploy to Kubernetes but haven't yet operated it under real traffic, real incidents, and real teammates. You should know what a Pod, Deployment, and Service are; we cover the production layer on top.

The first guardrail: requests and limits

A pod with no resource requests is a pod the scheduler is guessing about. Requests are what the pod is guaranteed, the scheduler uses them to decide which node has room. Limits are the ceiling, exceed CPU and you get throttled; exceed memory and you get OOMKilled. Without them, one greedy pod can starve every other pod on its node, and the scheduler can pack a node past the point of stability.

	Request	Limit
Means	Guaranteed minimum	Hard ceiling
Used by	The scheduler (placement)	The kubelet (enforcement)
Hit CPU ceiling	-	Throttled (slowed)
Hit memory ceiling	-	OOMKilled (restarted)
Set too low	Pod evicted under pressure	Killed under normal load

Requests are for scheduling and guarantees; limits are for protecting the neighbours.

Memory limits are not like CPU limits

CPU is compressible, over the limit, you're just throttled and slow. Memory is not, over the limit, your pod is killed outright. Set memory requests and limits equal for critical workloads so the scheduler reserves exactly what the pod can use, and you never get surprise OOMKills under contention.

The second guardrail: liveness and readiness probes

These two probes sound similar and do opposite things, confusing them causes some of the nastiest production incidents. Readiness answers "should this pod receive traffic right now?" If it fails, the pod is pulled from the Service's load-balancing pool but left running. Liveness answers "is this pod broken beyond recovery?" If it fails, the pod is killed and restarted.

🪑 Still in onboarding, don't send them customers yetReadiness failing: no traffic, still running

✅ Onboarded and ready, start routing workReadiness passing: added to the pool

🚑 Collapsed at their desk, call an ambulanceLiveness failing: killed and restarted

A new hire on their first day.

The classic outage: liveness that's too aggressive

Point liveness at a heavy endpoint, or set the timeout too tight, and a brief slowdown makes the probe fail, so Kubernetes kills the pod. Under load, every pod slows, every probe fails, and the cluster restarts your entire fleet mid-traffic-spike. Keep liveness cheap and forgiving; use readiness for the strict checks.

A production-grade Deployment

Here's a Deployment with all the guardrails wired in. Compare it to the three-line tutorial version, every extra block earns its place.

deployment.yaml

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: payments
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0          # add before removing: zero-downtime
  template:
    spec:
      containers:
        - name: api
          image: registry.example.com/api:v2
          ports:
            - containerPort: 8080
          resources:
            requests: { cpu: "250m", memory: "256Mi" }
            limits:   { cpu: "500m", memory: "256Mi" }  # mem = request
          readinessProbe:                    # gate traffic
            httpGet: { path: /readyz, port: 8080 }
            initialDelaySeconds: 5
            periodSeconds: 5
          livenessProbe:                     # restart if wedged
            httpGet: { path: /healthz, port: 8080 }
            initialDelaySeconds: 15
            periodSeconds: 20
            timeoutSeconds: 3
            failureThreshold: 3              # forgiving, not trigger-happy
          securityContext:
            runAsNonRoot: true
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true

Note the deliberate asymmetry between the probes: readiness checks often (every 5s, starts at 5s) so traffic is gated tightly; liveness checks rarely (every 20s, starts at 15s, tolerates 3 failures) so a transient hiccup never triggers a needless restart. That single difference prevents the most common self-inflicted Kubernetes outage.

Autoscaling and disruption budgets

Three replicas is a guess. The Horizontal Pod Autoscaler adjusts replica count to match load, scale out when CPU climbs, back in when it's quiet, so you neither fall over at peak nor pay for idle capacity at 3am.

hpa.yaml

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
  namespace: payments
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 65 }

The HPA needs your requests set correctly, averageUtilization: 65 means 65% *of the request*. This is why the guardrails compound: get requests wrong and your autoscaler does the wrong thing. The other half of staying up during change is the PodDisruptionBudget, which protects you during *voluntary* disruptions like node drains and cluster upgrades.

pdb.yaml

yaml

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api
  namespace: payments
spec:
  minAvailable: 2            # never let a drain take us below 2 pods
  selector:
    matchLabels: { app: api }

Pro tip

Without a PDB, a routine node upgrade can evict all your pods at once, the cluster is just doing maintenance and has no idea your three pods all sat on the same node. minAvailable: 2 forces the drain to wait until replacements are ready. This is the cheapest reliability win in all of Kubernetes.

Namespaces and RBAC: blast-radius control

Running everything in default with shared credentials is how a single mistake becomes a cluster-wide outage. Namespaces partition the cluster into isolated environments, separate quotas, separate network policies, separate RBAC. RBAC then grants each human and service account the *least* access it needs. The payments team's CI should be able to deploy to the payments namespace and nothing else.

rbac.yaml

yaml

# A role scoped to ONE namespace, allowing only what's needed
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: payments
  name: deployer
rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["get", "list", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: payments
  name: payments-ci
subjects:
  - kind: ServiceAccount
    name: ci
    namespace: payments
roleRef:
  kind: Role
  name: deployer
  apiGroup: rbac.authorization.k8s.io

cluster-admin is not a starting point

It's tempting to bind everything to cluster-admin to make errors go away. Don't. A leaked cluster-admin token is the entire cluster. Start from zero permissions and add exactly what each workload needs, a Role (namespace-scoped) beats a ClusterRole every time you can get away with it.

Common mistakes that cost hours

No resource requests. The scheduler packs nodes blindly and one leaky pod starves the rest. Always set requests; set memory request = limit for critical pods.
Liveness probe doing real work. A liveness check that hits the database or a slow endpoint turns a brief slowdown into a fleet-wide restart storm. Keep liveness trivially cheap.
maxUnavailable left at the default during rollouts. The 25% default can drop a quarter of your capacity mid-deploy. Set it to 0 with a positive maxSurge.
No PodDisruptionBudget. A routine node drain evicts all your replicas at once. A one-line PDB prevents the most surprising self-inflicted outage there is.
HPA with wrong or missing requests. Utilization targets are a percentage of the request, wrong requests mean the autoscaler scales on a lie.
Everything in default with broad RBAC. No isolation means no blast-radius control. Namespace per team/env, least-privilege RBAC, no casual cluster-admin.
Ignoring `kubectl describe` and events. When a pod won't start, the answer is almost always in the events, ImagePullBackOff, OOMKilled, FailedScheduling. Read them first, guess never.

Takeaways

Production Kubernetes in seven lines

Requests are for scheduling; limits protect the neighbours. Memory over limit = OOMKilled.
Readiness gates traffic (no kill); liveness restarts a wedged pod (kill). Don't swap them.
Keep liveness cheap and forgiving, aggressive liveness causes restart storms under load.
maxUnavailable: 0 + maxSurge gives true zero-downtime rollouts.
HPA scales on a percentage of the request, so correct requests come first.
A PodDisruptionBudget is the cheapest reliability win, it survives node drains.
Namespaces + least-privilege RBAC keep one mistake from becoming a cluster outage.

Where to go next

Production Kubernetes connects to how you decided to use containers, how you deliver to the cluster, and how you observe it once it's live:

Docker vs Kubernetes: When to Use Each, make sure Kubernetes is actually the right tool first.
GitOps: Declarative Delivery with ArgoCD & Flux, deliver these manifests safely and auditably.
The Three Pillars of Observability, you can't operate what you can't see.
Hands-on kubectl lab, practise probes, scaling, and reading events on a live cluster.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

DevOps

What DevOps Actually Is (It's Not a Job Title)

Read

DevOps

CI/CD Fundamentals: What a Pipeline Really Does

Read

DevOps

Your First CI Pipeline with GitHub Actions

Read