Interactive Explainer

Relevant for:JuniorMid-levelSenior

Why this matters at your level

Junior

Know the historical arc: Borg -> Kubernetes -> CNCF ecosystem. Understand why containers became popular before Kubernetes existed. Know what the CRI, CNI, and CSI interfaces enable.

Mid-level

Understand the design decisions inherited from Borg: declarative desired state, reconciliation loops, separation of control and data plane. Know why these matter when you're debugging why a pod did not get scheduled.

Senior

Evaluate the CNCF ecosystem critically -- understand which projects solve real problems vs adding complexity. Know the Kubernetes release cycle and its impact on deprecation planning (APIs removed 12-18 months after deprecation notice).

Kubernetes Evolution: From Borg to the CNCF Ecosystem

Kubernetes did not emerge in a vacuum. It was shaped by a decade of operating Borg and Omega at Google scale, distilled into an open-source project in 2014. Understanding the design decisions inherited from Borg -- and the ones deliberately changed -- explains why Kubernetes behaves the way it does under failure.

~5 min read

Be the first to complete!

LIVEControl Plane Failure — Google Borg — Design Decision Lesson

Breaking News

Borg era

Google's Borg manages hundreds of thousands of machines across cells -- tightly coupled control and data plane

2006

Google introduces cgroups to the Linux kernel -- the foundation for container resource management

2013

Docker democratises containers -- makes Linux namespace/cgroup stack accessible to developers

June 2014

Google open-sources Kubernetes at DockerCon -- based on Borg lessons, designed from day one with control/data plane separation

2016

CNCF (Cloud Native Computing Foundation) formed -- Kubernetes donated as founding project

2018

Kubernetes 1.11 -- CRI, CNI, CSI interfaces stable -- pluggable runtime, network, and storage architecture

—Direct design ancestor

—Year Kubernetes was open-sourced

—Kubernetes core resource types

—Custom resources in ecosystem today

The question this raises

Why do existing Kubernetes pods keep running even when etcd is down and the API server is unreachable -- and what does this tell you about the control plane / data plane separation that Borg taught the Kubernetes designers?

Test your assumption first

Your cluster's etcd is down and the API server is unreachable. What happens to pods that were already running before the outage?

Lesson outline

What Problem Kubernetes Solves

The scale problem that created Kubernetes

By 2010, Google was running billions of containers per week on Borg. The lessons: manually placing workloads does not scale, tightly coupled control and data planes create failure cascades, and the same container runtime should run anywhere (test, staging, prod, across cloud providers). Kubernetes was designed to solve exactly these problems -- with open interfaces so the community could build the runtime, network, and storage layers independently.

Declarative desired state

Use for: You declare what should run (Deployment with 3 replicas). Controllers continuously reconcile actual state toward that desired state. No imperative "start this container now" -- instead "ensure 3 replicas always exist".

CRI (Container Runtime Interface)

Use for: gRPC API between kubelet and container runtime. Enables swapping runtimes (Docker -> containerd -> CRI-O -> gVisor) without changing Kubernetes code. Created after learning that tight coupling to Docker caused integration problems.

CNI (Container Network Interface)

Use for: Plugin spec for container networking setup. Enables swapping network implementations (Flannel, Calico, Cilium) without changing Kubernetes. Called by kubelet at pod creation to set up veth pairs and IP assignment.

CSI (Container Storage Interface)

Use for: Plugin spec for persistent storage. Enables storage vendors (AWS EBS, GCP Persistent Disk, Portworx, Ceph) to integrate without modifying Kubernetes source code. Replaced the old in-tree volume plugins.

The System View: From Borg to Kubernetes

BORG (2003-2014)                   KUBERNETES (2014-present)
+---------------------------+      +----------------------------------+
| Borgmaster (control)      |      | Control Plane                    |
| - Task scheduling         |      | - kube-apiserver (REST + auth)   |
| - Health monitoring       |      | - etcd (distributed state)       |
| - Tightly coupled to      |      | - kube-scheduler (placement)     |
|   Borglet (data plane)    |      | - controller-manager (reconcile) |
+---------------------------+      +-----------|----------------------+
| Borglet (per-machine)     |      |           | (independent)        |
| - Runs tasks              |      | Data Plane (per node)            |
| - Reports to Borgmaster   |      | - kubelet (local reconciliation) |
| - Cannot operate if       |      | - container runtime (CRI)        |
|   Borgmaster unreachable  |      | - kube-proxy (network rules)     |
+---------------------------+      | RUNS INDEPENDENTLY FROM CP!      |
                                   +----------------------------------+

KEY DIFFERENCE:
Borg: data plane depends on control plane -- CP down = DP disrupted
K8s:  data plane operates autonomously -- CP down = no new pods
      but existing pods keep running until control plane recovers

The control plane / data plane separation is the key design lesson from Borg. Existing workloads continue running autonomously even during complete control plane outages.

Design model evolution

Situation

Before

After

The etcd cluster loses quorum and the API server becomes unavailable

“All workloads stop immediately because the orchestration system is unavailable. Similar to a database being down -- nothing works.”

“Existing pods continue running. The kubelet on each node continues to maintain its locally known pod state. Pods that crash are restarted by the local kubelet using its cached spec. New scheduling, new deployments, and config changes are blocked -- but the data plane runs autonomously.”

How It Actually Works: The Reconciliation Loop

How Kubernetes maintains desired state

→

1. You declare desired state via the API -- kubectl apply -f deployment.yaml sends a REST request to the API server, which validates and stores the Deployment object in etcd. No container has been created yet.

→

2. Controllers watch for changes -- the Deployment controller (inside controller-manager) watches the API server for Deployment objects. It sees the new Deployment and calculates: "desired replicas=3, actual replicas=0, need to create 3 ReplicaSets and Pods".

→

3. Scheduler assigns pods to nodes -- the scheduler watches for unscheduled pods (pod.spec.nodeName is empty). It evaluates node resources, affinity rules, and taints. It assigns each pod to a node by writing pod.spec.nodeName.

→

4. Kubelet creates the container -- the kubelet on each node watches for pods assigned to it (pod.spec.nodeName == this node). It calls CRI to pull the image and create the container. It reports status back to the API server.

5. Continuous reconciliation -- if a pod crashes, the kubelet reports the failure to the API server. The ReplicaSet controller sees "desired=3, actual=2" and creates a new pod. The scheduler assigns it. The kubelet runs it. This loop runs continuously, forever.

reconciliation-watch.sh

1# Watch the reconciliation loop in action
2# Terminal 1: watch pod count
3$ kubectl get pods -w
4 
5# Terminal 2: manually delete a pod (the control plane will fight this)
Manually deleting a pod does not reduce replica count -- it just triggers a replacement. The desired state (3 replicas) is in the Deployment spec in etcd.
6$ kubectl delete pod my-app-abc123
7pod "my-app-abc123" deleted
8# Immediately in Terminal 1:
9# NAME               READY   STATUS
10# my-app-abc123      1/1     Running     <- being deleted
11# my-app-abc123      0/1     Terminating <- deletion in progress
12# my-app-xyz789      0/1     Pending     <- NEW pod created by ReplicaSet controller
13# my-app-xyz789      0/1     ContainerCreating
14# my-app-xyz789      1/1     Running     <- back to 3 replicas in ~10 seconds
15 
16# The ReplicaSet controller saw desired=3, actual=2, created 1 new pod.
To actually reduce replicas, you must change the desired state. The controller then reconciles actual state toward the new desired state.
17# This is the reconciliation loop -- it runs continuously.
18 
19# To ACTUALLY reduce replicas, change desired state:
20$ kubectl scale deployment my-app --replicas=2
21# Now desired=2, actual=3, controller DELETES one pod.

What Breaks in Production: Blast Radius

Blast radius when Kubernetes design decisions are misunderstood

Treating kubectl delete pod as a scale-down operation — Pod is replaced immediately by the ReplicaSet controller. Use kubectl scale or edit the Deployment replicas to actually reduce the count.
Applying a Deployment change without understanding rollout strategy — Default maxUnavailable=25% means 25% of pods are replaced simultaneously -- can cause brief capacity reduction. Set maxUnavailable=0 and maxSurge=1 for zero-downtime rolling updates.
Using kubectl apply with server-side apply disabled on shared resources — Multiple controllers applying to the same resource can cause field ownership conflicts and unexpected reversions -- use server-side apply (kubectl apply --server-side) for shared managed resources.
Not understanding the CNCF graduated vs sandbox distinction — CNCF sandbox projects are experimental; graduated projects (Prometheus, Envoy, Kubernetes, Fluentd, Jaeger) are production-proven. Evaluating ecosystem tools requires knowing where they sit on the maturity spectrum.

Imperative kubectl commands in CI/CD instead of declarative GitOps

Bug

# Imperative: state lives in CI/CD history, not in a source of truth
# Different team members run different versions of this script
#!/bin/bash
kubectl scale deployment my-app --replicas=5
kubectl set image deployment/my-app app=my-app:v2
kubectl set resources deployment/my-app --limits=cpu=500m,memory=512Mi
# If any command fails, state is partially applied.
# No audit trail. No reconciliation. No self-healing.

Fix

# Declarative: desired state in git, reconciled continuously
# deployment.yaml in git (source of truth):
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 5                   # scale: change here and commit
  selector:
    matchLabels:
      app: my-app
  template:
    spec:
      containers:
      - name: app
        image: my-app:v2        # image: change here and commit
        resources:
          limits:
            cpu: "500m"
            memory: "512Mi"
# kubectl apply -f deployment.yaml
# OR let ArgoCD/Flux apply from git automatically

Imperative commands create state that exists only in shell history. Declarative YAML in git creates state that is: auditable (git blame), reconciled (controllers fix drift), and reproducible (re-apply restores desired state). The reconciliation loop is only as useful as the quality of your declared desired state stored in version control.

Decision Guide: CNCF Project Evaluation

Is the CNCF project at "Graduated" maturity level?

YesProduction-safe to evaluate. Examples: Prometheus, Envoy, Fluentd, Jaeger, Argo CD, Flux. These have passed TOC review, production adoption requirements, and security audits.

NoContinue -- evaluate Incubating or Sandbox projects more carefully.

Is the project at "Incubating" maturity level with 3+ major production users?

YesMay be appropriate for non-critical paths. Evaluate API stability, release cadence, and community health before adopting in critical infrastructure.

NoContinue.

Is this a Sandbox project or non-CNCF tool for a critical infrastructure role?

YesTreat as experimental. Run a proof-of-concept in a non-production environment. Evaluate the project team's responsiveness and whether a graduated alternative exists first.

NoEvaluate the specific trade-offs. Consider build vs buy for infrastructure components not in the CNCF ecosystem.

Cost and Complexity: Kubernetes Adoption Trade-offs

Deployment Model	Operational Overhead	Flexibility	Best For
Managed K8s (EKS/GKE/AKS)	Low -- cloud provider manages control plane	Medium -- limited control plane customisation	Most teams -- start here unless requirements dictate self-managed
Self-managed (kubeadm)	High -- all control plane upgrades manual	Full -- any configuration	Teams with specific compliance or on-prem requirements
K3s / K0s (lightweight)	Low -- minimal footprint	Medium -- some features removed	Edge computing, IoT, dev environments
Fully managed + GitOps (EKS + ArgoCD)	Low ops, high automation	High -- declarative everything	Platform teams wanting self-service developer experience

Exam Answer vs. Production Reality

1 / 2

Declarative vs imperative orchestration

📖 What the exam expects

Kubernetes uses a declarative model: you specify desired state in YAML, the control plane continuously reconciles actual state toward desired state. Controllers watch resource objects and make changes to move the cluster toward the desired configuration.

Toggle between what certifications teach and what production actually requires

How this might come up in interviews

Asked in senior engineering and staff interviews as "why is Kubernetes designed this way" or "what is the control plane / data plane separation and why does it matter". Also appears as context for deep-dive questions on any specific component.

Common questions:

Why do pods keep running when the Kubernetes API server is unreachable?
What is the difference between the control plane and data plane in Kubernetes?
What problem does the CRI (Container Runtime Interface) solve?
Why does kubectl apply work differently from kubectl create?
What did Kubernetes learn from Google's Borg system?

Strong answer: Mentioning Borg as the design ancestor. Explaining the reconciliation loop and why it makes kubectl apply idempotent. Knowing that the CNCF graduated projects (Prometheus, Envoy, Fluentd) represent production-proven ecosystem choices.

Red flags: Thinking the control plane and data plane must both be healthy for pods to run. Not knowing what CRI, CNI, or CSI abstractions enable (pluggable runtime, network, and storage). Treating Kubernetes as a black box without understanding the declarative reconciliation model.

Related concepts

Explore topics that connect to this one.

Suggested next

Often learned after this topic.

Cluster Architecture: Control Plane vs Data Plane Deep Dive

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Continue learning

Cluster Architecture: Control Plane vs Data Plane Deep Dive

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

Kubernetes Evolution: From Borg to the CNCF Ecosystem

What Problem Kubernetes Solves

The System View: From Borg to Kubernetes

How It Actually Works: The Reconciliation Loop

What Breaks in Production: Blast Radius

Decision Guide: CNCF Project Evaluation

Cost and Complexity: Kubernetes Adoption Trade-offs

Exam Answer vs. Production Reality

Discussion

In-app Q&A

Kubernetes Evolution: From Borg to the CNCF Ecosystem

What Problem Kubernetes Solves

The System View: From Borg to Kubernetes

How It Actually Works: The Reconciliation Loop

What Breaks in Production: Blast Radius

Decision Guide: CNCF Project Evaluation

Cost and Complexity: Kubernetes Adoption Trade-offs

Exam Answer vs. Production Reality

Discussion

In-app Q&A