Back

High availability (HA) & resilience

Keep systems up when things fail-redundancy, failover, and simple targets.

What is high availability?

High availability (HA) = your system stays up even when parts fail. One server dies? Traffic goes to another. One datacenter has a blackout? Another region takes over.

Resilience = the system bends instead of breaking. It recovers from failures without you having to fix it by hand every time.

Redundancy

More than one of something. One fails, the other keeps serving.

Failover

Primary fails → traffic switches to backup automatically. No manual flip.

RTO & RPO

RTO = how long you can be down. RPO = how much data loss you accept.

Multi-AZ / multi-region

Run or replicate in more than one place. One AZ down? Another takes over.

HA = system stays up when parts fail. Set RTO and RPO, then design redundancy and failover to meet them.

Redundancy and failover

Redundancy = you have more than one of something. Two servers, two regions, two copies of data. If one fails, the other keeps serving.

Failover = when the primary fails, traffic or work automatically switches to the backup. No manual flip. Health checks detect failure; the system reroutes.

Cloud makes this easier: multiple availability zones (AZs), managed load balancers, and auto-scaling groups replace bad instances automatically.

RTO and RPO-simple targets

RTO (Recovery Time Objective) = how long you can afford to be down. "We need to be back within 1 hour." That drives how fast failover and restore must be.

RPO (Recovery Point Objective) = how much data loss you can accept. "We can lose at most 5 minutes of data." That drives how often you replicate or back up.

Set RTO and RPO based on business impact. Then design redundancy, backups, and failover to meet them.

Real-world scenario: database in one AZ

Expert scenario

Scenario: Your app runs in two AZs, but the database lives in only one. That AZ goes down.

Decision: The app tier fails over to the other AZ, but the database is gone. You are down until you restore from backup. To get HA, you need a multi-AZ database (replica in another AZ with automatic failover) or accept a longer RTO and restore from backups.

Sign in to track progress on your dashboard.

Ready to see how this works in the cloud?

Switch to Career Paths on the Academy page for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths