Kubernetes Autoscaling: HPA, VPA, Cluster Autoscaler, and Resource Management
Autoscaling adjusts replica counts and node counts based on demand. HPA (Horizontal Pod Autoscaler) scales Pods, VPA (Vertical Pod Autoscaler) adjusts resource requests, cluster autoscaler adds nodes.
Kubernetes Autoscaling: HPA, VPA, Cluster Autoscaler, and Resource Management
Autoscaling adjusts replica counts and node counts based on demand. HPA (Horizontal Pod Autoscaler) scales Pods, VPA (Vertical Pod Autoscaler) adjusts resource requests, cluster autoscaler adds nodes.
What you'll learn
- HPA scales Pods based on metrics (CPU, memory, custom); requires accurate resource requests to work correctly
- VPA adjusts resource requests based on historical usage; less common but useful for right-sizing
- Cluster autoscaler adds nodes when Pods can't fit; works with cloud provider node groups
- Autoscaling amplifies misconfiguration; test thoroughly and set reasonable min/max limits
Lesson outline
Horizontal Pod Autoscaler (HPA)
HPA watches a metric (CPU, memory, custom) and scales the number of replicas up or down.
Example: "if avg CPU > 70%, increase replicas to 10. If avg CPU < 30%, decrease replicas to 2."
HPA makes decisions every 15-30 seconds. Requires Metrics Server to be installed for CPU/memory metrics.
Custom metrics (from Prometheus) enable scaling on application-specific signals (requests per second, queue depth).
Vertical Pod Autoscaler (VPA)
VPA adjusts CPU/memory resource requests (not limits) based on historical usage.
Useful if you don't know what resources your app needs. VPA recommends values, or can auto-apply them (requires Pod restart).
VPA is less commonly used than HPA; most teams prefer manual tuning or HPA with appropriate requests.
Cluster Autoscaler
Cluster autoscaler watches for unschedulable Pods (stuck in Pending). If a Pod can't fit on any node, it adds nodes.
Works with cloud providers (AWS, Azure, GCP) to scale node groups up or down.
Paired with HPA: HPA scales Pods, cluster autoscaler scales nodes to accommodate.
Resource Requests and Limits
Request: minimum resources needed. Used by scheduler to fit Pods on nodes. Affects HPA CPU metrics.
Limit: maximum resources allowed. If a Pod exceeds the limit, it is throttled (CPU) or killed (memory).
Proper requests are critical for HPA and cluster autoscaler to work correctly.
Key takeaways
- HPA scales Pods based on metrics (CPU, memory, custom); requires accurate resource requests to work correctly
- VPA adjusts resource requests based on historical usage; less common but useful for right-sizing
- Cluster autoscaler adds nodes when Pods can't fit; works with cloud provider node groups
- Autoscaling amplifies misconfiguration; test thoroughly and set reasonable min/max limits
💡 Analogy
HPA is like a restaurant that hires waiters based on table occupancy (Pods = waiters, load = tables). VPA is like a tailor adjusting uniform sizes based on employee measurements. Cluster autoscaler is like the restaurant opening new branches when all restaurants are full.
⚡ Core Idea
HPA scales replicas based on metrics. VPA adjusts resource requests. Cluster autoscaler adds nodes. All three work together to auto-scale capacity.
🎯 Why It Matters
Autoscaling reduces costs (scale down when traffic drops) and improves availability (scale up when traffic spikes). Proper resource requests are the foundation.
Related concepts
Explore topics that connect to this one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Discussion
Questions? Discuss in the community or start a thread below.
Join DiscordIn-app Q&A
Sign in to start or join a thread.