Back
Interactive Explainer

Containers: Linux Kernel Foundations

Docker and Kubernetes are wrappers around Linux kernel features. Understand namespaces, cgroups, and Union File Systems — the real primitives behind every container.

🎯Key Takeaways
Containers are not VMs — they share the host kernel. Isolation is provided by Linux namespaces (visibility) and cgroups (resource limits).
Seven namespace types: PID, NET, MNT, UTS, IPC, User, Cgroup — each isolates a different resource view.
cgroups enforce CPU, memory, and I/O limits. A missing memory limit means the OOM killer fires unpredictably.
OverlayFS enables fast, storage-efficient images through copy-on-write layers. Layer order in Dockerfiles controls build cache efficiency.
OOMKilled = cgroup limit. Network pod failures = CNI/namespace. Slow builds = layer cache miss. Know the layer, find the fix fast.
Runtime stack: Linux kernel → runc → containerd → Docker/CRI-O → Kubernetes.

Containers: Linux Kernel Foundations

Docker and Kubernetes are wrappers around Linux kernel features. Understand namespaces, cgroups, and Union File Systems — the real primitives behind every container.

~8 min read
Be the first to complete!
What you'll learn
  • Containers are not VMs — they share the host kernel. Isolation is provided by Linux namespaces (visibility) and cgroups (resource limits).
  • Seven namespace types: PID, NET, MNT, UTS, IPC, User, Cgroup — each isolates a different resource view.
  • cgroups enforce CPU, memory, and I/O limits. A missing memory limit means the OOM killer fires unpredictably.
  • OverlayFS enables fast, storage-efficient images through copy-on-write layers. Layer order in Dockerfiles controls build cache efficiency.
  • OOMKilled = cgroup limit. Network pod failures = CNI/namespace. Slow builds = layer cache miss. Know the layer, find the fix fast.
  • Runtime stack: Linux kernel → runc → containerd → Docker/CRI-O → Kubernetes.

Lesson outline

Docker and Kubernetes are wrappers

To understand containers deeply, stop seeing Docker and Kubernetes as magical tools. They are wrappers — user interfaces that make complex Linux kernel features easy to manage. Every problem you encounter with containers (OOM kills, network isolation, slow image pulls, CrashLoopBackOff) traces back to these kernel primitives.

Everything containers do is performed by the Linux kernel. Docker and Kubernetes automate the setup and management of kernel features that existed long before containers were popular. Namespaces existed in 2002. cgroups landed in 2007. Docker was built in 2013. Kubernetes in 2014.

Why this matters for debugging

When a container is OOM-killed, that is a cgroup limit being enforced. When two containers cannot talk to each other, that is a network namespace or iptables issue. When a layer pull is slow, that is OverlayFS. Knowing the kernel layer tells you where to look.

Docker and Kubernetes are wrappers - user interfaces that make complex Linux kernel features easy to manage.

Kubernetes

Orchestration wrapper

Docker

Packaging wrapper

Linux Kernel

Namespaces, cgroups, UnionFS

Everything containers do is performed by the Linux kernel. Docker and Kubernetes automate the setup and management.

Namespaces: isolation ("What can I see?")

Linux namespaces give a process a private view of a system resource. A process inside a namespace sees only what the namespace shows it — not the host's full resource.

The seven namespace types

  • PID namespaceThe container process sees itself as PID 1. It cannot see the host's processes. This is why ps aux inside a container shows only container processes.
  • NET namespacePrivate network stack: its own IP address, routing tables, and iptables rules. This is how each pod in Kubernetes gets its own IP.
  • MNT namespaceThe container sees its own filesystem root (/app, /bin, /etc). It cannot see the host's /etc/passwd or other sensitive files.
  • UTS namespaceThe container has its own hostname. Running hostname inside a container returns the container ID, not the host name.
  • IPC namespaceIsolated inter-process communication (shared memory, semaphores). Containers cannot interfere with each other's IPC.
  • User namespaceMap host user IDs to container user IDs. Allows running as "root" inside a container while being an unprivileged user on the host.
  • Cgroup namespaceContainer sees its own cgroup hierarchy, not the host's.

You can inspect namespaces directly: `lsns` lists all namespaces on a host. `ls -la /proc/1/ns/` shows the namespaces of PID 1. `nsenter --target <pid> --net` lets you enter a running container's network namespace from the host.

cgroups: resource limits ("How much can I consume?")

Control Groups (cgroups) enforce resource limits and accounting on groups of processes. While namespaces hide resources, cgroups restrict how much a process can use. Without cgroups, one container could consume all host RAM and CPU, crashing everything else.

What cgroups control

  • cpuCPU shares and hard limits. --cpus=0.5 in Docker sets a cgroup cpu.cfs_quota_us limit — the container cannot use more than 50% of one core.
  • memoryRAM limit. --memory=512m sets memory.limit_in_bytes. When the container exceeds this, the OOM killer terminates it — this is why you see OOMKilled in Kubernetes.
  • blkioBlock I/O bandwidth limits. Prevents one container from monopolizing disk throughput.
  • pidsMaximum number of processes. Prevents fork bombs from exhausting the host's PID space.
  • net_cls / net_prioNetwork traffic classification for QoS — prioritize or throttle container network traffic.

Kubernetes maps directly to cgroups: `resources.limits.memory: 512Mi` in a Pod spec becomes a cgroup memory limit. When a Pod is OOMKilled, you can find the kernel log entry: `oom_kill_process` followed by the container name and the cgroup path.

Union File Systems / OverlayFS: the layer cake

OverlayFS (Union File System) solves a practical problem: if every container needs a full OS filesystem, a host with 50 containers would need 50 × 200MB = 10GB just for base images. OverlayFS uses copy-on-write layers instead.

How Docker image layers work

  • Base layerThe FROM image (e.g. ubuntu:22.04). Read-only. Shared by every container using that image.
  • Intermediate layersEach RUN, COPY, ADD instruction in a Dockerfile creates a new read-only layer. Cached by Docker and reused across builds.
  • Container layerA thin writable layer added when a container starts. All writes (new files, modified files) go here. Destroyed when the container is removed.
  • VolumesMount points that bypass the container layer. Data written to volumes persists after the container is removed. Used for databases, logs, and config.

This is why `docker build` is fast on subsequent builds: Docker checks each layer's cache hash. If the instruction hasn't changed and the layer below hasn't changed, it reuses the cached layer. This is also why layer order matters: put frequently-changing instructions (COPY . .) last.

The full stack: kernel → Docker → Kubernetes

Understanding the evolutionary flow helps you know which layer a problem lives in:

Problem → layer mapping

  • OOMKilledcgroup memory limit enforced by the kernel. Increase memory.limit or fix the memory leak.
  • Container sees wrong hostnameUTS namespace. Expected behavior — container has its own hostname.
  • Pod cannot reach another podNET namespace + CNI plugin (Calico/Flannel/Cilium). Check network policy and CNI logs.
  • Slow image pullOverlayFS layer download. Cache base layers in a registry mirror or use a smaller base image.
  • CrashLoopBackOffApplication crash — the container layer is fine. Check application logs: kubectl logs <pod> --previous.
  • Cannot run as root in containerUser namespace or Pod Security Policy. Expected in hardened environments.

The evolutionary chain: Linux kernel (namespaces + cgroups + OverlayFS) → runc (low-level container runtime that calls kernel APIs) → containerd (manages container lifecycle, image pulling, snapshotting) → Docker (user-friendly CLI and daemon) → Kubernetes (multi-host orchestration).

Interactive: Container isolation & layers

Below you can explore container isolation (multiple containers on one host, each with its own PID/network/filesystem view), Docker image layers, volume mounting, network namespaces, and microservices architecture.

Docker Container Simulator
DF

Dockerfile Builder

Click any instruction to highlight the corresponding image layer.

Dockerfile
IL

Image Layers

Total: 233 MB

Click a layer to learn about its caching behaviour. Layers stack bottom-to-top.

OS baseMetadataDependenciesApp source

Multi-stage Build Comparison

Single-stage BuildActive

FROM node:18-alpine

WORKDIR /app

COPY package*.json ./

RUN npm ci

COPY . .

Image size233 MB

Includes dev dependencies, build tools, and all intermediate artifacts.

Multi-stage Build

# Stage 1 — Builder

FROM node:18-alpine AS builder

RUN npm ci && npm run build

# Stage 2 — Final

FROM node:18-alpine

COPY --from=builder /app/dist .

Builder stage233 MB
Final image48 MB

Only production files are shipped. Dev tooling stays in the discarded builder stage.

Docker caches layers — put things that change rarely (dependencies) before things that change often (source code). A cache miss rebuilds all layers above it.

How this might come up in interviews

Container kernel questions are common in senior DevOps and SRE interviews. Be ready to explain what happens at the kernel level when a container starts (clone() syscall with CLONE_NEWPID|CLONE_NEWNET flags, cgroup creation, OverlayFS mount). Know the difference between cgroups v1 and v2 (v2 is unified hierarchy, default in modern kernels). Be able to debug an OOMKilled pod: check `kubectl describe pod`, look for `OOMKilled` in LastState, read the cgroup memory stats. Explain why rootless containers improve security (user namespace maps container root to an unprivileged host UID).

Quick check · Containers: Linux Kernel Foundations

1 / 4

A container is OOMKilled repeatedly in Kubernetes. What is the root cause at the kernel level?

Key takeaways

  • Containers are not VMs — they share the host kernel. Isolation is provided by Linux namespaces (visibility) and cgroups (resource limits).
  • Seven namespace types: PID, NET, MNT, UTS, IPC, User, Cgroup — each isolates a different resource view.
  • cgroups enforce CPU, memory, and I/O limits. A missing memory limit means the OOM killer fires unpredictably.
  • OverlayFS enables fast, storage-efficient images through copy-on-write layers. Layer order in Dockerfiles controls build cache efficiency.
  • OOMKilled = cgroup limit. Network pod failures = CNI/namespace. Slow builds = layer cache miss. Know the layer, find the fix fast.
  • Runtime stack: Linux kernel → runc → containerd → Docker/CRI-O → Kubernetes.
🧠Mental Model

💡 Analogy

Containers are like apartment buildings. The land and foundation (Linux kernel) are shared by all apartments. Each apartment (container) has its own locked front door (PID namespace — your process list), its own mailbox address (NET namespace — your IP), and its own interior walls and layout (MNT namespace — your filesystem). The building's electrical panel enforces power limits per apartment (cgroups — your CPU/memory limits). The building itself (OverlayFS) was built using standard pre-fabricated floor plans (image layers) so new apartments can be added quickly without rebuilding from scratch. Docker is the property manager who handles leases. Kubernetes is the city planning department that manages the entire district of buildings.

⚡ Core Idea

Containers are not VMs — they share the host kernel. Isolation comes from Linux namespaces (visibility) and resource limits from cgroups (consumption). OverlayFS makes images fast and storage-efficient through copy-on-write layers. Docker wraps this into a developer-friendly API. Kubernetes adds multi-host coordination.

🎯 Why It Matters

Every container incident traces to these primitives. OOM kills are cgroup limits. Network failures are namespace/CNI issues. Image pull problems are OverlayFS. Slow builds are layer cache misses. Engineers who understand the kernel layer debug container problems in minutes instead of hours. They also write better Dockerfiles, set appropriate resource limits, and make informed decisions about security (user namespaces, seccomp profiles, AppArmor).

Related concepts

Explore topics that connect to this one.

Suggested next

Often learned after this topic.

Docker and containers

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.