Kubernetes Services provide stable VIPs and DNS for pod groups. Endpoints track which pods are ready. kube-proxy programs iptables/IPVS rules. Misconfiguration creates invisible routing failures that bypass all application logs.
Know the 4 Service types (ClusterIP, NodePort, LoadBalancer, ExternalName) and when to use each. Understand how DNS resolves service names.
Understand Endpoints vs EndpointSlices, kube-proxy iptables vs IPVS mode, sessionAffinity tradeoffs, and how readiness probes gate Endpoint registration.
Design service networking for large clusters (1000+ Services): IPVS mode adoption, EndpointSlice scalability, topology-aware routing for latency reduction, and service mesh offload (Envoy sidecar vs kube-proxy).
Kubernetes Services provide stable VIPs and DNS for pod groups. Endpoints track which pods are ready. kube-proxy programs iptables/IPVS rules. Misconfiguration creates invisible routing failures that bypass all application logs.
Microservice count grows past 500; iptables rules exceed 200k on busy nodes
p99 latency alerts fire: inter-service calls at 800ms (SLO: 50ms)
Network packet captures show kernel time dominated by iptables traversal
kube-proxy mode switched to IPVS on affected nodes
p99 latency drops to 4ms; SLO restored; full fleet migration planned
The question this raises
What does kube-proxy actually do to your node's network stack, and when does its default implementation become a performance bottleneck?
A pod cannot reach a Service by its ClusterIP. kubectl get endpoints shows the expected pod IPs. What is the most likely cause?
Lesson outline
The Ephemeral Pod Problem
Pod IPs change on every restart. Pods scale up and down. A caller cannot hardcode a pod IP -- it needs a stable address that always routes to a healthy pod. Services provide that stable address (ClusterIP VIP + DNS name) backed by dynamic Endpoints that track live, ready pods.
ClusterIP
Use for: Internal-only stable VIP. DNS resolves to the VIP; kube-proxy load-balances to ready pods. Default for all internal microservices. Not reachable from outside the cluster.
NodePort
Use for: Exposes the Service on a static port (30000-32767) on every node's IP. External traffic hits any node:port -> forwarded to pod. Used for development/testing or when a cloud LoadBalancer is unavailable.
LoadBalancer
Use for: Provisions a cloud load balancer (AWS NLB, GCP L4 LB). External traffic -> cloud LB -> NodePort -> pod. The primary pattern for exposing production services externally. Each Service creates one cloud LB (cost adds up).
Headless (clusterIP: None)
Use for: No VIP assigned. DNS returns individual pod IPs directly. Used by StatefulSets for direct pod addressing (kafka-0.kafka-headless) and by service meshes that do their own load balancing.
Client Pod (10.0.1.5) | DNS: my-service.default.svc.cluster.local -> 10.96.0.50 (VIP) v iptables PREROUTING chain (on source node) | DNAT rule: 10.96.0.50:80 -> randomly select from: | 10.0.1.10:8080 (pod-a, Ready) | 10.0.2.20:8080 (pod-b, Ready) | [10.0.3.30:8080 excluded -- pod-c not Ready] v Selected Pod (10.0.2.20:8080) kube-proxy watches: Service + Endpoints objects -> programs iptables rules on EVERY node -> rules updated within ~1s of pod Ready state change IPVS mode (alternative): kernel hash table: O(1) lookup ipvsadm -Ln shows virtual server + real servers
VIP is not a real interface -- it exists only as iptables DNAT rules programmed by kube-proxy on every node
Service Networking at Scale
500+ Services on iptables mode kube-proxy
“200k+ iptables rules; O(n) packet matching; p99 latency climbs to 800ms as rule count grows”
“IPVS mode: kernel hash table provides O(1) lookup regardless of Service count; p99 returns to single digits”
Service with 500 pod IPs (Endpoints object)
“Each pod IP change rewrites the entire 500-entry Endpoints object; propagated to all nodes; O(n) update cost”
“EndpointSlices: 100-entry shards; only affected slice updated; O(1) update cost for single pod change”
From DNS query to pod response
01
1. App calls http://my-service:8080 -- OS resolves via /etc/resolv.conf -> CoreDNS
02
2. CoreDNS returns ClusterIP VIP (e.g., 10.96.0.50) from its Service cache
03
3. Packet leaves pod with dst=10.96.0.50:8080; enters iptables PREROUTING on node
04
4. kube-proxy iptables rule DNATs VIP to one of the ready pod IPs (random or session-affinity)
05
5. Packet routed to selected pod (may cross nodes via CNI overlay)
06
6. Pod processes request; response returns directly (no SNAT in default mode)
1. App calls http://my-service:8080 -- OS resolves via /etc/resolv.conf -> CoreDNS
2. CoreDNS returns ClusterIP VIP (e.g., 10.96.0.50) from its Service cache
3. Packet leaves pod with dst=10.96.0.50:8080; enters iptables PREROUTING on node
4. kube-proxy iptables rule DNATs VIP to one of the ready pod IPs (random or session-affinity)
5. Packet routed to selected pod (may cross nodes via CNI overlay)
6. Pod processes request; response returns directly (no SNAT in default mode)
1# ClusterIP (internal only)2apiVersion: v13kind: Service4metadata:5name: my-api6spec:7selector:selector must exactly match pod labels -- mismatch = empty Endpoints = traffic black hole8app: my-api # must match pod labels9ports:port: what callers use. targetPort: what containers listen on. These can differ.10- port: 80 # Service port (VIP)11targetPort: 8080 # container port12type: ClusterIP # default1314---15# Headless (StatefulSet use)16spec:17clusterIP: None # no VIP; DNS returns pod IPs18selector:19app: kafka
Service routing failure modes
Selector label mismatch creates silent traffic black hole
# Service selector
spec:
selector:
app: my-api
version: v2 # <- requires BOTH labels
---
# Deployment pod template labels
metadata:
labels:
app: my-api
# version label missing -> Endpoints = empty
# Service exists but routes to nobody# Service selector (match only stable labels)
spec:
selector:
app: my-api # single stable label
---
# Deployment pod template labels
metadata:
labels:
app: my-api
version: v2 # extra labels are fine; selector only needs subsetService selectors require ALL listed labels to be present on the pod. Adding a version label to the selector means every pod must have that label or it is excluded from Endpoints. Use minimal, stable selectors.
| Type | Reachability | Load balancing | Cost | Use case |
|---|---|---|---|---|
| ClusterIP | Cluster-internal only | kube-proxy (iptables/IPVS) | Free | All internal microservices |
| NodePort | Any node IP + port | kube-proxy then pod | Free (infra cost only) | Dev/test, bare-metal |
| LoadBalancer | External via cloud LB | Cloud LB then kube-proxy | Cloud LB cost per Service | Production external Services |
| Headless | Cluster-internal, direct pod IPs | Client-side (DNS round-robin) | Free | StatefulSets, service mesh |
| ExternalName | CNAME to external DNS | None (DNS redirect only) | Free | External service aliasing |
Service VIP and kube-proxy
📖 What the exam expects
A ClusterIP Service gets a virtual IP (VIP). kube-proxy on each node programs iptables/IPVS rules to DNAT packets destined for the VIP to one of the ready pod IPs.
Toggle between what certifications teach and what production actually requires
Debugging questions about pods that cannot reach Services, and architecture questions about service networking at scale.
Common questions:
Strong answer: Mentions EndpointSlices for large-scale services, topology-aware routing to keep traffic in-zone, and IPVS mode for clusters with many Services.
Red flags: Thinking a Service VIP is a real IP with a listening process, or not knowing that kube-proxy programs iptables rules on every node.
Related concepts
Explore topics that connect to this one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.