How senior engineers ship code safely and reliably: automated pipelines, container orchestration, zero-downtime deployments, and the IaC practices that prevent configuration drift.
How senior engineers ship code safely and reliably: automated pipelines, container orchestration, zero-downtime deployments, and the IaC practices that prevent configuration drift.
Lesson outline
Every manual deployment step is a risk: the wrong branch checked out, a step skipped, a config not updated. Teams that deploy manually accumulate "deployment fear" — they delay releases because deployments are risky. This creates a vicious cycle: longer delays → bigger changes → higher risk.
Automated CI/CD eliminates this. Every commit is buildable, testable, and deployable. You deploy small, frequent changes. When something goes wrong, rollback is one click. This is why high-performing engineering teams (DORA metrics) deploy 46x more frequently with 2,555x faster recovery time.
A CI/CD pipeline is a series of automated stages that transform source code into a running service:
01
Trigger: push to branch or PR opened
02
Build: compile, type-check, lint (fast feedback — under 2 minutes)
03
Test: unit tests, integration tests (parallel where possible)
04
Security scan: SAST (static analysis), dependency audit, container image scan
05
Build artifact: Docker image, tagged with commit SHA
06
Deploy to staging: update staging environment, run smoke tests
07
Deploy to production: gate on approval (if needed) or automatic on main
08
Post-deploy: verify SLOs, automated rollback if error rate spikes
Trigger: push to branch or PR opened
Build: compile, type-check, lint (fast feedback — under 2 minutes)
Test: unit tests, integration tests (parallel where possible)
Security scan: SAST (static analysis), dependency audit, container image scan
Build artifact: Docker image, tagged with commit SHA
Deploy to staging: update staging environment, run smoke tests
Deploy to production: gate on approval (if needed) or automatic on main
Post-deploy: verify SLOs, automated rollback if error rate spikes
Golden rule: fail fast. Put the fastest checks first (lint, type-check). Run tests in parallel. A pipeline that takes 45 minutes is not used — developers skip it.
Tag images with commit SHA, not "latest"
"latest" is mutable — you cannot roll back to it reliably. Tag every image with the commit SHA: registry/app:abc1234. This makes rollbacks precise: "deploy the image from commit abc1234."
1name: CI/CD Pipeline23on:4push:5branches: [main]6pull_request:7branches: [main]89jobs:10build-and-test:11runs-on: ubuntu-latest12steps:13- uses: actions/checkout@v41415- name: Setup Node.js16uses: actions/setup-node@v417with:18node-version: '20'19cache: 'npm'2021- name: Install dependenciesnpm ci is reproducible (uses package-lock.json) and faster than npm install22run: npm ci # ci is faster and more reliable than install2324- name: Type check25run: npm run typecheck # Fast feedback — runs in ~10s2627- name: Lint28run: npm run lint29Image tagged with commit SHA — enables precise rollbacks30- name: Unit tests (parallel)31run: npm run test -- --maxWorkers=43233- name: Build Docker image34run: |35docker build -t ${{ env.REGISTRY }}/app:${{ github.sha }} .36docker push ${{ env.REGISTRY }}/app:${{ github.sha }}3738deploy-staging:39needs: build-and-test40runs-on: ubuntu-latest41if: github.ref == 'refs/heads/main'42steps:43- name: Deploy to staging44run: |Manual approval gate for production — require explicit sign-off45kubectl set image deployment/app-staging \46app=${{ env.REGISTRY }}/app:${{ github.sha }}47kubectl rollout status deployment/app-staging --timeout=120s4849- name: Smoke test staging50run: npm run test:smoke -- --env=staging5152deploy-production:53needs: deploy-staging54runs-on: ubuntu-latest55environment: production # Requires manual approval in GitHub56steps:57- name: Deploy to production (rolling update)58run: |59kubectl set image deployment/app-prod \60app=${{ env.REGISTRY }}/app:${{ github.sha }}61kubectl rollout status deployment/app-prod --timeout=300s6263- name: Verify SLOs post-deploy64run: npm run verify:slos -- --window=5m
Rolling update: Replace old pods one at a time. Default Kubernetes strategy. Zero downtime, but both old and new versions run simultaneously — backward-compatible API changes only.
Blue-green deployment: Maintain two identical environments (blue = current, green = new). Switch traffic from blue to green. Instant rollback by switching back. Double the infrastructure cost.
Canary deployment: Route small percentage of traffic (1-5%) to new version. Monitor SLOs. Gradually increase to 100% if metrics are healthy. Automatically rollback if error rate spikes. Best for high-risk changes.
Feature flags: Ship code dark (disabled). Enable for specific users or cohorts. Separate deployment from feature launch. Roll back features without rolling back code.
| Strategy | Rollback Speed | Infrastructure Cost | Risk | Best For |
|---|---|---|---|---|
| Rolling update | Minutes (rollback deployment) | 1x | Low | Most deployments |
| Blue-green | Instant (switch LB) | 2x | Medium | Database migrations, major changes |
| Canary | Instant (drain canary) | 1.1x | Very low | High-risk changes, algorithm updates |
| Feature flags | Instant (toggle off) | 1x | Very low | A/B tests, gradual rollouts |
Kubernetes (k8s) is a container orchestration platform that handles: scheduling (which node runs this container?), scaling (how many replicas?), self-healing (restart crashed containers), service discovery (how do services find each other?), and rolling updates.
Core objects: Pod (one or more containers sharing network/storage), Deployment (manages desired replica count and rolling updates), Service (stable DNS + load balancing for a set of pods), ConfigMap (non-secret config), Secret (sensitive config, base64-encoded), Ingress (HTTP routing from outside the cluster to Services).
Resource limits are mandatory in production: Without CPU/memory limits, one runaway pod can starve all other pods on the node. Set requests (what the pod needs) and limits (what it is allowed to use).
1apiVersion: apps/v12kind: Deployment3metadata:4name: orders-api5labels:6app: orders-api7spec:8replicas: 39selector:10matchLabels:11app: orders-api12strategy:13type: RollingUpdate14rollingUpdate:15maxSurge: 1 # Allow 1 extra pod during updatemaxUnavailable: 0 = zero-downtime rolling update16maxUnavailable: 0 # Zero-downtime: never kill a pod before new one is ready1718template:19metadata:20labels:21app: orders-api22spec:23containers:24- name: api25image: registry/orders-api:abc1234 # Always use specific tag, never 'latest'26ports:27- containerPort: 3000Resource limits are mandatory — without them, one pod can starve others2829resources:30requests:31cpu: 250m # 0.25 CPU cores guaranteed32memory: 256Mi # 256MB RAM guaranteed33limits:34cpu: 500m # Never use more than 0.5 cores35memory: 512Mi # OOMKilled if exceeded — set carefullyreadinessProbe gates traffic — pod is NOT ready until this passes3637readinessProbe: # Pod receives traffic only when this passes38httpGet:39path: /health/ready40port: 300041initialDelaySeconds: 542periodSeconds: 1043failureThreshold: 34445livenessProbe: # Pod is restarted if this fails46httpGet:47path: /health/live48port: 300049initialDelaySeconds: 1550periodSeconds: 205152env:53- name: DATABASE_URL54valueFrom:55secretKeyRef:56name: app-secrets57key: database-url
Infrastructure as Code means your cloud resources (VPCs, databases, load balancers, Kubernetes clusters) are defined in code, version-controlled, peer-reviewed, and applied automatically.
Terraform workflow: `terraform plan` (shows what will change — review before applying), `terraform apply` (make the changes), `terraform destroy` (tear down). State is stored remotely (S3 + DynamoDB lock) — never commit terraform.tfstate.
Why IaC matters: Reproducible environments (staging matches production), drift detection (catch manual changes), disaster recovery (rebuild from scratch in minutes), audit trail (who changed what and when).
Never make manual cloud console changes
Every manual cloud console change creates drift between your IaC code and reality. The next terraform apply may destroy your manual change. All infrastructure changes go through IaC and code review.
1# Terraform: Production ECS + RDS setup2terraform {3required_providers {4aws = { source = "hashicorp/aws", version = "~> 5.0" }5}6backend "s3" {7bucket = "mycompany-terraform-state"8key = "production/app/terraform.tfstate"9region = "us-east-1"10dynamodb_table = "terraform-state-lock" # Prevents concurrent appliesDynamoDB lock prevents two engineers from running terraform apply simultaneously11encrypt = true12}13}1415# RDS PostgreSQL16resource "aws_db_instance" "postgres" {17identifier = "prod-postgres"18engine = "postgres"19engine_version = "16.1"20instance_class = "db.r6g.xlarge"21allocated_storage = 10022storage_type = "gp3"2324db_name = "production"Fetch password from Secrets Manager — never hardcode in Terraform25username = "postgres"26password = data.aws_secretsmanager_secret_version.db_password.secret_string27multi_az = true for production — single AZ = single point of failure28multi_az = true # High availability — standby in another AZ29deletion_protection = true # Prevent accidental destroy30backup_retention_period = 30 # 30-day automated backups31skip_final_snapshot = false # Take snapshot before destroy3233vpc_security_group_ids = [aws_security_group.rds.id]34db_subnet_group_name = aws_db_subnet_group.main.name3536tags = {37Environment = "production"38Terraform = "true"39}40}
Deployment questions test whether you understand reliability engineering, not just Docker commands.
Common questions:
Strong answers include:
Red flags:
Quick check · Deployment & Infrastructure: From Code to Production
1 / 1
Key takeaways
From the books
Accelerate: The Science of Lean Software and DevOps — Forsgren, Humble, Kim (2018)
Chapter 2: Measuring Performance
The four DORA metrics — deployment frequency, lead time for changes, time to restore service, change failure rate — are the best predictors of organizational performance. High performers deploy multiple times per day with 2,555x faster recovery than low performers.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.