Interactive Explainer

Relevant for:Mid-levelSeniorStaff

Why this matters at your level

Mid-level

Know the 4 Service types (ClusterIP, NodePort, LoadBalancer, ExternalName) and when to use each. Understand how DNS resolves service names.

Senior

Understand Endpoints vs EndpointSlices, kube-proxy iptables vs IPVS mode, sessionAffinity tradeoffs, and how readiness probes gate Endpoint registration.

Staff

Design service networking for large clusters (1000+ Services): IPVS mode adoption, EndpointSlice scalability, topology-aware routing for latency reduction, and service mesh offload (Envoy sidecar vs kube-proxy).

Services & Endpoints: Stable Networking for Ephemeral Pods

Kubernetes Services provide stable VIPs and DNS for pod groups. Endpoints track which pods are ready. kube-proxy programs iptables/IPVS rules. Misconfiguration creates invisible routing failures that bypass all application logs.

~3 min read

Be the first to complete!

LIVENetworking Degradation -- Shopify -- 2021

Breaking News

T-3w

Microservice count grows past 500; iptables rules exceed 200k on busy nodes

T+0

p99 latency alerts fire: inter-service calls at 800ms (SLO: 50ms)

T+6h

Network packet captures show kernel time dominated by iptables traversal

T+8h

kube-proxy mode switched to IPVS on affected nodes

T+8h 30m

p99 latency drops to 4ms; SLO restored; full fleet migration planned

—iptables rules on busy nodes

—p99 latency at peak (SLO: 50ms)

—iptables lookup complexity vs O(1) IPVS

The question this raises

What does kube-proxy actually do to your node's network stack, and when does its default implementation become a performance bottleneck?

Test your assumption first

A pod cannot reach a Service by its ClusterIP. kubectl get endpoints shows the expected pod IPs. What is the most likely cause?

Lesson outline

What Services Solve

The Ephemeral Pod Problem

Pod IPs change on every restart. Pods scale up and down. A caller cannot hardcode a pod IP -- it needs a stable address that always routes to a healthy pod. Services provide that stable address (ClusterIP VIP + DNS name) backed by dynamic Endpoints that track live, ready pods.

ClusterIP

Use for: Internal-only stable VIP. DNS resolves to the VIP; kube-proxy load-balances to ready pods. Default for all internal microservices. Not reachable from outside the cluster.

NodePort

Use for: Exposes the Service on a static port (30000-32767) on every node's IP. External traffic hits any node:port -> forwarded to pod. Used for development/testing or when a cloud LoadBalancer is unavailable.

LoadBalancer

Use for: Provisions a cloud load balancer (AWS NLB, GCP L4 LB). External traffic -> cloud LB -> NodePort -> pod. The primary pattern for exposing production services externally. Each Service creates one cloud LB (cost adds up).

Headless (clusterIP: None)

Use for: No VIP assigned. DNS returns individual pod IPs directly. Used by StatefulSets for direct pod addressing (kafka-0.kafka-headless) and by service meshes that do their own load balancing.

The System View: VIP to Pod Packet Path

Client Pod (10.0.1.5)
  |  DNS: my-service.default.svc.cluster.local -> 10.96.0.50 (VIP)
  v
iptables PREROUTING chain (on source node)
  |  DNAT rule: 10.96.0.50:80 -> randomly select from:
  |    10.0.1.10:8080 (pod-a, Ready)
  |    10.0.2.20:8080 (pod-b, Ready)
  |    [10.0.3.30:8080 excluded -- pod-c not Ready]
  v
Selected Pod (10.0.2.20:8080)

kube-proxy watches: Service + Endpoints objects
  -> programs iptables rules on EVERY node
  -> rules updated within ~1s of pod Ready state change

IPVS mode (alternative):
  kernel hash table: O(1) lookup
  ipvsadm -Ln shows virtual server + real servers

VIP is not a real interface -- it exists only as iptables DNAT rules programmed by kube-proxy on every node

Service Networking at Scale

Situation

Before

After

500+ Services on iptables mode kube-proxy

“200k+ iptables rules; O(n) packet matching; p99 latency climbs to 800ms as rule count grows”

“IPVS mode: kernel hash table provides O(1) lookup regardless of Service count; p99 returns to single digits”

Service with 500 pod IPs (Endpoints object)

“Each pod IP change rewrites the entire 500-entry Endpoints object; propagated to all nodes; O(n) update cost”

“EndpointSlices: 100-entry shards; only affected slice updated; O(1) update cost for single pod change”

How Service Routing Works

From DNS query to pod response

→

1. App calls http://my-service:8080 -- OS resolves via /etc/resolv.conf -> CoreDNS

→

2. CoreDNS returns ClusterIP VIP (e.g., 10.96.0.50) from its Service cache

→

3. Packet leaves pod with dst=10.96.0.50:8080; enters iptables PREROUTING on node

→

4. kube-proxy iptables rule DNATs VIP to one of the ready pod IPs (random or session-affinity)

→

5. Packet routed to selected pod (may cross nodes via CNI overlay)

6. Pod processes request; response returns directly (no SNAT in default mode)

1. App calls http://my-service:8080 -- OS resolves via /etc/resolv.conf -> CoreDNS

2. CoreDNS returns ClusterIP VIP (e.g., 10.96.0.50) from its Service cache

3. Packet leaves pod with dst=10.96.0.50:8080; enters iptables PREROUTING on node

4. kube-proxy iptables rule DNATs VIP to one of the ready pod IPs (random or session-affinity)

5. Packet routed to selected pod (may cross nodes via CNI overlay)

6. Pod processes request; response returns directly (no SNAT in default mode)

service-types.yaml

1# ClusterIP (internal only)
2apiVersion: v1
3kind: Service
4metadata:
5  name: my-api
6spec:
7  selector:
selector must exactly match pod labels -- mismatch = empty Endpoints = traffic black hole
8    app: my-api          # must match pod labels
9  ports:
port: what callers use. targetPort: what containers listen on. These can differ.
10  - port: 80             # Service port (VIP)
11    targetPort: 8080     # container port
12  type: ClusterIP        # default
13 
14---
15# Headless (StatefulSet use)
16spec:
17  clusterIP: None        # no VIP; DNS returns pod IPs
18  selector:
19    app: kafka

What Breaks in Production: Blast Radius

Service routing failure modes

Empty Endpoints (selector mismatch) — Service exists but no pods match the selector. All traffic returns connection refused. kubectl get endpoints my-service shows empty. Check label keys AND values match exactly -- case-sensitive.
kube-proxy not running on node — iptables rules not programmed on that node. Pods on that node cannot reach any Service VIP. All inter-service calls fail. Check kube-proxy DaemonSet pods are Running on all nodes.
Pod passes readiness but takes traffic before warm — Readiness probe passes too early (checks /healthz before connection pools warm). First few requests hit cold pod. Add minReadySeconds or a /warmup endpoint that blocks until caches are populated.
DNS ndots:5 multiplies latency — Default ndots:5 causes 5 DNS lookups for short names before trying the full qualified name. For 10k req/s, this is 50k DNS queries/s. Use fully qualified names (my-service.namespace.svc.cluster.local) or reduce ndots in pod spec.

Selector label mismatch creates silent traffic black hole

Bug

# Service selector
spec:
  selector:
    app: my-api
    version: v2     # <- requires BOTH labels

---
# Deployment pod template labels
metadata:
  labels:
    app: my-api
    # version label missing -> Endpoints = empty
    # Service exists but routes to nobody

Fix

# Service selector (match only stable labels)
spec:
  selector:
    app: my-api     # single stable label

---
# Deployment pod template labels
metadata:
  labels:
    app: my-api
    version: v2     # extra labels are fine; selector only needs subset

Service selectors require ALL listed labels to be present on the pod. Adding a version label to the selector means every pod must have that label or it is excluded from Endpoints. Use minimal, stable selectors.

Decision Guide: Which Service Type

Does external traffic (outside the cluster) need to reach this Service?

YesLoadBalancer (cloud LB) or NodePort (development/bare-metal)

NoClusterIP (default) for internal-only services

Do pods need direct addressing by name (StatefulSet, service mesh)?

YesHeadless Service (clusterIP: None) -- DNS returns individual pod IPs

NoStandard ClusterIP with VIP load balancing

Do you have 500+ Services and latency is creeping up?

YesSwitch kube-proxy to IPVS mode and enable EndpointSlices

NoDefault iptables mode is fine at small to medium cluster sizes

Cost and Complexity: Service Type Comparison

Type	Reachability	Load balancing	Cost	Use case
ClusterIP	Cluster-internal only	kube-proxy (iptables/IPVS)	Free	All internal microservices
NodePort	Any node IP + port	kube-proxy then pod	Free (infra cost only)	Dev/test, bare-metal
LoadBalancer	External via cloud LB	Cloud LB then kube-proxy	Cloud LB cost per Service	Production external Services
Headless	Cluster-internal, direct pod IPs	Client-side (DNS round-robin)	Free	StatefulSets, service mesh
ExternalName	CNAME to external DNS	None (DNS redirect only)	Free	External service aliasing

Exam Answer vs. Production Reality

1 / 3

Service VIP and kube-proxy

📖 What the exam expects

A ClusterIP Service gets a virtual IP (VIP). kube-proxy on each node programs iptables/IPVS rules to DNAT packets destined for the VIP to one of the ready pod IPs.

Toggle between what certifications teach and what production actually requires

How this might come up in interviews

Debugging questions about pods that cannot reach Services, and architecture questions about service networking at scale.

Common questions:

What is the difference between ClusterIP, NodePort, and LoadBalancer Services?
How does kube-proxy implement Service routing?
A pod cannot reach a Service by DNS but can reach it by IP. What is wrong?
What is an EndpointSlice and why does it matter at scale?

Strong answer: Mentions EndpointSlices for large-scale services, topology-aware routing to keep traffic in-zone, and IPVS mode for clusters with many Services.

Red flags: Thinking a Service VIP is a real IP with a listening process, or not knowing that kube-proxy programs iptables rules on every node.

Related concepts

Explore topics that connect to this one.

Suggested next

Often learned after this topic.

Ingress Controllers: L7 Routing at the Cluster Edge

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Continue learning