Back to path
ExpertStorefront · Project 9 of 11 ~12h· 5 milestones

Autoscale it to ride real traffic spikes

Continues from the last build: The Storefront is meshed, progressively delivered, and chaos-tested, but it still runs a fixed pod count on a fixed node pool, so a traffic spike either tips it over or burns money sitting idle.

The Storefront is resilient on paper, but it is sized for a guess.

HorizontalPodAutoscaler (CPU + custom metrics)Custom/external metrics adapters (Prometheus Adapter)KEDA event-driven autoscalingScale-to-zero workloadsCluster Autoscaler / KarpenterResource requests as the basis for scalingLoad testing with k6Capacity planning & cost vs reliability

What you'll build

A Storefront that scales on three axes from one load test: a HorizontalPodAutoscaler on the web tier that reacts to both CPU and a custom requests-per-second metric, a KEDA ScaledObject on the order worker that scales on queue depth and scales to zero when idle, and a cluster autoscaler (or Karpenter) that provisions new nodes when pods go Pending and reclaims them when load drops. You can show a graph of replicas and nodes rising under a k6 load test and falling back afterwards, with no manual intervention and no fixed-capacity waste.

See how we teach, before you sign up

You don't just get code dumped on you. Every starter file and every solution is explained line-by-line, in plain English. Here's one real file from this project:

base/web.yamlyaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  namespace: storefront
spec:
  replicas: 3            # fixed today; the HPA will own this soon
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: ghcr.io/your-org/web:3.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: "1"
              memory: 512Mi

Reading this file

  • replicas: 3A fixed guess at capacity, the HPA will take ownership of this number once it is attached so you stop hardcoding it.
  • requests:CPU/memory requests are the denominator the HPA divides usage by, without them CPU-based autoscaling cannot compute a utilization percentage.
  • cpu: 200mThe per-pod CPU request, when the HPA target is 60 percent utilization it acts when pods average above 120m of actual use.

The web Deployment as it runs today: a hardcoded replica count and, crucially, resource requests the HPA will scale against.

That's 1 of 8 explained code blocks in this single project.

The build, milestone by milestone

  1. 1

    Scale the web tier on CPU with an HPA, and prove it with load

    5 guided steps

    CPU-based HPA is the foundation every other axis builds on, and it is the one most often broken by a missing piece (no metrics-server, no requests). Getting a clean scale-out under real load first means later milestones add metrics, not debug plumbing.

  2. 2

    Scale on a custom requests-per-second metric, not just CPU

    5 guided steps

    CPU lags the thing you actually care about. A tier can be saturating its connection pool or latency budget while CPU still looks fine. Scaling on requests-per-second (or latency) tracks user-facing demand directly, and the HPA takes the max across all its metrics so you get the best of both.

  3. 3

    Scale the order worker on queue depth with KEDA, down to zero

    5 guided steps

    A worker that polls a queue does not have CPU that tracks demand, it has a backlog. The right signal is queue depth, and the right idle behavior is zero pods, not a warm-but-useless one. KEDA gives you both: event-driven scale-out and scale-to-zero, which a plain HPA cannot do (an HPA floors at one).

  4. 4

    Add cluster/node autoscaling so Pending pods summon nodes

    5 guided steps

    Pod autoscaling is only half the story. If the HPA wants 18 web pods but the node pool only fits 12, six pods sit Pending and the scale-out is a lie. Cluster autoscaling closes the loop: a Pending pod is the signal to grow the cluster, and an empty node is the signal to shrink it.

  5. 5

    Run one load test end to end and capture it breathing

    5 guided steps

    Three autoscalers configured in isolation can each look fine and still fail together (a fast HPA that node scaling cannot keep up with, a worker that never drains). One end-to-end run is the only proof that the axes cooperate, and the graph is the artifact that makes this rung portfolio-worthy.

What's inside when you start

3 starter files, ready to clone
5 guided milestones
5 full reference solutions
8 code blocks explained line-by-line
5 "is it working?" checks
4 interview questions it prepares you for

You'll walk away with

An autoscale/ folder with the web HPA (CPU + custom rps), the Prometheus Adapter rule, the KEDA ScaledObject, and the k6 load script
A web tier that scales on both CPU and a custom requests-per-second metric, proven with kubectl get hpa under load
A KEDA-driven order worker that scales on queue depth and scales to zero when idle
Cluster/node autoscaling (Cluster Autoscaler or Karpenter) that adds nodes for Pending pods and reclaims them when idle
A recorded end-to-end timeline (graph or watch log) of web replicas, worker replicas, and node count rising under a load test and falling back afterwards

This is portfolio-grade. Build it free.

Sign up to unlock every milestone step-by-step, the code skeletons, full reference solutions, and checkable tasks, with your progress saved as you build.

Start building