Back to path
LargePortfolio centerpiece ~33h· 6 milestones

Run a resilient multi-service app with progressive delivery + chaos

The full cloud-native picture: multiple services, packaged with Helm and delivered via GitOps so the cluster always matches Git, released progressively through the mesh with automated promotion or rollback on real metrics, observable end-to-end, and proven resilient because you deliberately broke it and watched it degrade gracefully instead of falling over.

HelmIstioProgressive deliveryGitOpsObservability (Kiali/Grafana)HPAChaos engineeringInfra cost modelingBlameless postmortems

What you'll build

A Helm-packaged, multi-service application on Kubernetes delivered via GitOps, with mesh-based canary releases that auto-rollback on bad metrics, strict mTLS, full observability (Kiali + Grafana), autoscaling, and a documented chaos experiment proving the system degrades gracefully under pod kills and injected latency.

See how we teach, before you sign up

You don't just get code dumped on you. Every starter file and every solution is explained line-by-line, in plain English. Here's one real file from this project:

chart/Chart.yamlyaml
apiVersion: v2
name: platform
description: Multi-service app (frontend, API, worker)
type: application
version: 0.1.0
appVersion: "1.0.0"

Reading this file

  • apiVersion: v2Marks this as a Helm 3 chart, the current chart format you should be using.
  • type: applicationSays this chart deploys a runnable app, as opposed to a library chart meant only for reuse.
  • version: 0.1.0The chart's own version, bump it on every change so GitOps and Helm see a new release.
  • appVersion: "1.0.0"Tracks the version of the app being deployed, separate from the chart version itself.

The chart identity. Bump version on every change so GitOps sees a new release.

That's 1 of 10 explained code blocks in this single project.

The build, milestone by milestone

  1. 1

    Package the app with Helm

    5 guided steps

    Hand-maintained YAML per service per environment doesn’t scale and drifts. A Helm chart makes the whole app one versioned, parameterized unit, the artifact GitOps will deploy.

  2. 2

    Deliver via GitOps

    5 guided steps

    GitOps makes deployments auditable, reproducible, and self-correcting, drift is reverted automatically, rollback is a git revert, and every change has a reviewer and a history.

  3. 3

    Release progressively with metric-gated promotion

    5 guided steps

    Manual canary watching doesn’t scale and humans miss tail regressions. Tying promotion to live success-rate/latency metrics makes releases safe by default, bad versions roll themselves back.

  4. 4

    Observe end-to-end

    5 guided steps

    A multi-service system fails in ways no single service’s logs explain. End-to-end observability, golden signals plus distributed traces, is how you find which hop in a chain is actually slow or failing.

  5. 5

    Model the cost and load-test to failure

    5 guided steps

    A multi-service platform with a mesh, GitOps controller, and observability stack has real and often surprising running costs, and you don’t know its capacity until you push it past the point where it falls over.

  6. 6

    Break it on purpose

    6 guided steps

    Resilience you haven’t tested is a hope, not a property. A chaos experiment turns “it should survive a pod dying” into evidence, and surfaces the gaps (missing retries, no PDB, a single point of failure) before users do, while a blameless postmortem turns each gap into a tracked fix instead of a one-off scare.

What's inside when you start

4 starter files, ready to clone
6 guided milestones
6 full reference solutions
10 code blocks explained line-by-line
6 "is it working?" checks
4 interview questions it prepares you for

You'll walk away with

A Helm + GitOps repo that deploys the whole platform from Git
A recorded canary release that auto-promotes when healthy and auto-rolls-back on a bad metric
A chaos-experiment report with hypothesis, results, observability evidence, and the resilience gaps found
A cost model and load-test-to-failure report (app vs platform overhead, monthly figure, capacity ceiling)
A blameless postmortem for a surfaced gap (timeline, systemic contributing factors, dated action items)

This is portfolio-grade. Build it free.

Sign up to unlock every milestone step-by-step, the code skeletons, full reference solutions, and checkable tasks, with your progress saved as you build.

Start building