Skip to main content
Career Paths
Concepts
Fsp Deployment Infrastructure
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Deployment & Infrastructure: From Code to Production

How senior engineers ship code safely and reliably: automated pipelines, container orchestration, zero-downtime deployments, and the IaC practices that prevent configuration drift.

🎯Key Takeaways
CI/CD pipelines are mandatory — manual deployments create fear and risk
Tag Docker images with commit SHA — "latest" makes rollbacks impossible
Zero-downtime deployments: rolling update (default), blue-green (instant rollback), canary (gradual rollout)
Kubernetes resource limits are mandatory — without them, one pod can starve all others
Infrastructure as Code with Terraform — never make manual cloud console changes
readinessProbe gates traffic; livenessProbe triggers restart — both are required in production

Deployment & Infrastructure: From Code to Production

How senior engineers ship code safely and reliably: automated pipelines, container orchestration, zero-downtime deployments, and the IaC practices that prevent configuration drift.

~5 min read
Be the first to complete!
What you'll learn
  • CI/CD pipelines are mandatory — manual deployments create fear and risk
  • Tag Docker images with commit SHA — "latest" makes rollbacks impossible
  • Zero-downtime deployments: rolling update (default), blue-green (instant rollback), canary (gradual rollout)
  • Kubernetes resource limits are mandatory — without them, one pod can starve all others
  • Infrastructure as Code with Terraform — never make manual cloud console changes
  • readinessProbe gates traffic; livenessProbe triggers restart — both are required in production

Lesson outline

The cost of manual deployments

Every manual deployment step is a risk: the wrong branch checked out, a step skipped, a config not updated. Teams that deploy manually accumulate "deployment fear" — they delay releases because deployments are risky. This creates a vicious cycle: longer delays → bigger changes → higher risk.

Automated CI/CD eliminates this. Every commit is buildable, testable, and deployable. You deploy small, frequent changes. When something goes wrong, rollback is one click. This is why high-performing engineering teams (DORA metrics) deploy 46x more frequently with 2,555x faster recovery time.

CI/CD pipeline design

A CI/CD pipeline is a series of automated stages that transform source code into a running service:

→

01

Trigger: push to branch or PR opened

→

02

Build: compile, type-check, lint (fast feedback — under 2 minutes)

→

03

Test: unit tests, integration tests (parallel where possible)

→

04

Security scan: SAST (static analysis), dependency audit, container image scan

→

05

Build artifact: Docker image, tagged with commit SHA

→

06

Deploy to staging: update staging environment, run smoke tests

→

07

Deploy to production: gate on approval (if needed) or automatic on main

08

Post-deploy: verify SLOs, automated rollback if error rate spikes

1

Trigger: push to branch or PR opened

2

Build: compile, type-check, lint (fast feedback — under 2 minutes)

3

Test: unit tests, integration tests (parallel where possible)

4

Security scan: SAST (static analysis), dependency audit, container image scan

5

Build artifact: Docker image, tagged with commit SHA

6

Deploy to staging: update staging environment, run smoke tests

7

Deploy to production: gate on approval (if needed) or automatic on main

8

Post-deploy: verify SLOs, automated rollback if error rate spikes

Golden rule: fail fast. Put the fastest checks first (lint, type-check). Run tests in parallel. A pipeline that takes 45 minutes is not used — developers skip it.

Tag images with commit SHA, not "latest"

"latest" is mutable — you cannot roll back to it reliably. Tag every image with the commit SHA: registry/app:abc1234. This makes rollbacks precise: "deploy the image from commit abc1234."

.github/workflows/deploy.yml
1name: CI/CD Pipeline
2
3on:
4 push:
5 branches: [main]
6 pull_request:
7 branches: [main]
8
9jobs:
10 build-and-test:
11 runs-on: ubuntu-latest
12 steps:
13 - uses: actions/checkout@v4
14
15 - name: Setup Node.js
16 uses: actions/setup-node@v4
17 with:
18 node-version: '20'
19 cache: 'npm'
20
21 - name: Install dependencies
npm ci is reproducible (uses package-lock.json) and faster than npm install
22 run: npm ci # ci is faster and more reliable than install
23
24 - name: Type check
25 run: npm run typecheck # Fast feedback — runs in ~10s
26
27 - name: Lint
28 run: npm run lint
29
Image tagged with commit SHA — enables precise rollbacks
30 - name: Unit tests (parallel)
31 run: npm run test -- --maxWorkers=4
32
33 - name: Build Docker image
34 run: |
35 docker build -t ${{ env.REGISTRY }}/app:${{ github.sha }} .
36 docker push ${{ env.REGISTRY }}/app:${{ github.sha }}
37
38 deploy-staging:
39 needs: build-and-test
40 runs-on: ubuntu-latest
41 if: github.ref == 'refs/heads/main'
42 steps:
43 - name: Deploy to staging
44 run: |
Manual approval gate for production — require explicit sign-off
45 kubectl set image deployment/app-staging \
46 app=${{ env.REGISTRY }}/app:${{ github.sha }}
47 kubectl rollout status deployment/app-staging --timeout=120s
48
49 - name: Smoke test staging
50 run: npm run test:smoke -- --env=staging
51
52 deploy-production:
53 needs: deploy-staging
54 runs-on: ubuntu-latest
55 environment: production # Requires manual approval in GitHub
56 steps:
57 - name: Deploy to production (rolling update)
58 run: |
59 kubectl set image deployment/app-prod \
60 app=${{ env.REGISTRY }}/app:${{ github.sha }}
61 kubectl rollout status deployment/app-prod --timeout=300s
62
63 - name: Verify SLOs post-deploy
64 run: npm run verify:slos -- --window=5m

Zero-downtime deployment strategies

Rolling update: Replace old pods one at a time. Default Kubernetes strategy. Zero downtime, but both old and new versions run simultaneously — backward-compatible API changes only.

Blue-green deployment: Maintain two identical environments (blue = current, green = new). Switch traffic from blue to green. Instant rollback by switching back. Double the infrastructure cost.

Canary deployment: Route small percentage of traffic (1-5%) to new version. Monitor SLOs. Gradually increase to 100% if metrics are healthy. Automatically rollback if error rate spikes. Best for high-risk changes.

Feature flags: Ship code dark (disabled). Enable for specific users or cohorts. Separate deployment from feature launch. Roll back features without rolling back code.

StrategyRollback SpeedInfrastructure CostRiskBest For
Rolling updateMinutes (rollback deployment)1xLowMost deployments
Blue-greenInstant (switch LB)2xMediumDatabase migrations, major changes
CanaryInstant (drain canary)1.1xVery lowHigh-risk changes, algorithm updates
Feature flagsInstant (toggle off)1xVery lowA/B tests, gradual rollouts

Kubernetes fundamentals for full stack engineers

Kubernetes (k8s) is a container orchestration platform that handles: scheduling (which node runs this container?), scaling (how many replicas?), self-healing (restart crashed containers), service discovery (how do services find each other?), and rolling updates.

Core objects: Pod (one or more containers sharing network/storage), Deployment (manages desired replica count and rolling updates), Service (stable DNS + load balancing for a set of pods), ConfigMap (non-secret config), Secret (sensitive config, base64-encoded), Ingress (HTTP routing from outside the cluster to Services).

Resource limits are mandatory in production: Without CPU/memory limits, one runaway pod can starve all other pods on the node. Set requests (what the pod needs) and limits (what it is allowed to use).

k8s-deployment.yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: orders-api
5 labels:
6 app: orders-api
7spec:
8 replicas: 3
9 selector:
10 matchLabels:
11 app: orders-api
12 strategy:
13 type: RollingUpdate
14 rollingUpdate:
15 maxSurge: 1 # Allow 1 extra pod during update
maxUnavailable: 0 = zero-downtime rolling update
16 maxUnavailable: 0 # Zero-downtime: never kill a pod before new one is ready
17
18 template:
19 metadata:
20 labels:
21 app: orders-api
22 spec:
23 containers:
24 - name: api
25 image: registry/orders-api:abc1234 # Always use specific tag, never 'latest'
26 ports:
27 - containerPort: 3000
Resource limits are mandatory — without them, one pod can starve others
28
29 resources:
30 requests:
31 cpu: 250m # 0.25 CPU cores guaranteed
32 memory: 256Mi # 256MB RAM guaranteed
33 limits:
34 cpu: 500m # Never use more than 0.5 cores
35 memory: 512Mi # OOMKilled if exceeded — set carefully
readinessProbe gates traffic — pod is NOT ready until this passes
36
37 readinessProbe: # Pod receives traffic only when this passes
38 httpGet:
39 path: /health/ready
40 port: 3000
41 initialDelaySeconds: 5
42 periodSeconds: 10
43 failureThreshold: 3
44
45 livenessProbe: # Pod is restarted if this fails
46 httpGet:
47 path: /health/live
48 port: 3000
49 initialDelaySeconds: 15
50 periodSeconds: 20
51
52 env:
53 - name: DATABASE_URL
54 valueFrom:
55 secretKeyRef:
56 name: app-secrets
57 key: database-url

Infrastructure as Code with Terraform

Infrastructure as Code means your cloud resources (VPCs, databases, load balancers, Kubernetes clusters) are defined in code, version-controlled, peer-reviewed, and applied automatically.

Terraform workflow: `terraform plan` (shows what will change — review before applying), `terraform apply` (make the changes), `terraform destroy` (tear down). State is stored remotely (S3 + DynamoDB lock) — never commit terraform.tfstate.

Why IaC matters: Reproducible environments (staging matches production), drift detection (catch manual changes), disaster recovery (rebuild from scratch in minutes), audit trail (who changed what and when).

Never make manual cloud console changes

Every manual cloud console change creates drift between your IaC code and reality. The next terraform apply may destroy your manual change. All infrastructure changes go through IaC and code review.

terraform/main.tf
1# Terraform: Production ECS + RDS setup
2terraform {
3 required_providers {
4 aws = { source = "hashicorp/aws", version = "~> 5.0" }
5 }
6 backend "s3" {
7 bucket = "mycompany-terraform-state"
8 key = "production/app/terraform.tfstate"
9 region = "us-east-1"
10 dynamodb_table = "terraform-state-lock" # Prevents concurrent applies
DynamoDB lock prevents two engineers from running terraform apply simultaneously
11 encrypt = true
12 }
13}
14
15# RDS PostgreSQL
16resource "aws_db_instance" "postgres" {
17 identifier = "prod-postgres"
18 engine = "postgres"
19 engine_version = "16.1"
20 instance_class = "db.r6g.xlarge"
21 allocated_storage = 100
22 storage_type = "gp3"
23
24 db_name = "production"
Fetch password from Secrets Manager — never hardcode in Terraform
25 username = "postgres"
26 password = data.aws_secretsmanager_secret_version.db_password.secret_string
27
multi_az = true for production — single AZ = single point of failure
28 multi_az = true # High availability — standby in another AZ
29 deletion_protection = true # Prevent accidental destroy
30 backup_retention_period = 30 # 30-day automated backups
31 skip_final_snapshot = false # Take snapshot before destroy
32
33 vpc_security_group_ids = [aws_security_group.rds.id]
34 db_subnet_group_name = aws_db_subnet_group.main.name
35
36 tags = {
37 Environment = "production"
38 Terraform = "true"
39 }
40}
How this might come up in interviews

Deployment questions test whether you understand reliability engineering, not just Docker commands.

Common questions:

  • Walk me through your CI/CD pipeline design.
  • How would you deploy a breaking API change with zero downtime?
  • What is the difference between a readiness probe and a liveness probe?
  • How do you handle database migrations in a Kubernetes deployment?

Strong answers include:

  • Knows canary vs blue-green vs rolling and when to use each
  • Understands expand-contract for breaking changes
  • Mentions resource limits for Kubernetes
  • Uses IaC — never manually edits cloud console

Red flags:

  • Deploys by SSH-ing into servers
  • Does not know what a readiness probe does
  • Uses "latest" as the Docker tag
  • Cannot explain how to deploy a breaking API change safely

Quick check · Deployment & Infrastructure: From Code to Production

1 / 1

You need to deploy a breaking API change (removes a field that old clients depend on). What is the safest strategy?

Key takeaways

  • CI/CD pipelines are mandatory — manual deployments create fear and risk
  • Tag Docker images with commit SHA — "latest" makes rollbacks impossible
  • Zero-downtime deployments: rolling update (default), blue-green (instant rollback), canary (gradual rollout)
  • Kubernetes resource limits are mandatory — without them, one pod can starve all others
  • Infrastructure as Code with Terraform — never make manual cloud console changes
  • readinessProbe gates traffic; livenessProbe triggers restart — both are required in production

From the books

Accelerate: The Science of Lean Software and DevOps — Forsgren, Humble, Kim (2018)

Chapter 2: Measuring Performance

The four DORA metrics — deployment frequency, lead time for changes, time to restore service, change failure rate — are the best predictors of organizational performance. High performers deploy multiple times per day with 2,555x faster recovery than low performers.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.