Back
Interactive Explainer

AWS Well-Architected Framework

Six pillars for building secure, reliable, efficient, and cost-optimized systems — and the trade-offs between them.

🎯Key Takeaways
Six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability.
Pillars trade off — maximize reliability for payment processing; optimize cost for internal batch jobs. Match investment to business criticality.
Reliability pillar key patterns: Multi-AZ databases, Auto Scaling Groups, automated recovery, test recovery procedures with game days.
Security pillar: least privilege IAM, all data encrypted at rest and in transit, no secrets in code or userdata, CloudTrail enabled.
Use the free AWS Well-Architected Tool in the console to run a structured review against all six pillars.

AWS Well-Architected Framework

Six pillars for building secure, reliable, efficient, and cost-optimized systems — and the trade-offs between them.

~5 min read
Be the first to complete!
What you'll learn
  • Six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability.
  • Pillars trade off — maximize reliability for payment processing; optimize cost for internal batch jobs. Match investment to business criticality.
  • Reliability pillar key patterns: Multi-AZ databases, Auto Scaling Groups, automated recovery, test recovery procedures with game days.
  • Security pillar: least privilege IAM, all data encrypted at rest and in transit, no secrets in code or userdata, CloudTrail enabled.
  • Use the free AWS Well-Architected Tool in the console to run a structured review against all six pillars.

Lesson outline

The checklist that saved a company from a $30M outage

A Series C startup was acquired and their architecture needed to scale from 50,000 to 5 million users. Before the migration, the acquiring company ran an AWS Well-Architected Review.

Three critical findings: single point of failure in the database (no Multi-AZ), secrets stored in EC2 userdata (security), and no documented runbook for any incident (reliability). All three were fixed before the migration.

Seven months later, the primary database failed. Instead of a 6-hour outage, RDS Multi-AZ automatic failover restored service in 73 seconds. The Well-Architected review paid for itself 1,000 times over.

What is the AWS Well-Architected Framework?

A set of architectural best practices organized into six pillars, developed by AWS from reviewing thousands of customer workloads. Each pillar has a set of design principles, questions, and best practices. The free AWS Well-Architected Tool in the console lets you review any workload against these pillars.

The six pillars — and what they each protect

The six pillars of the Well-Architected Framework

  • I. Operational ExcellenceRunning and monitoring systems to deliver business value and continually improving operations. Key practices: infrastructure as code, annotated documentation, frequent small reversible changes, regular game days, post-mortems.
  • II. SecurityProtecting information, systems, and assets. Key practices: identity foundation (least privilege), traceability (CloudTrail), security at all layers, automated security best practices, protect data in transit and at rest, keep people away from data, prepare for security events.
  • III. ReliabilityEnsuring a workload performs its intended function correctly and consistently. Key practices: test recovery procedures, automatically recover from failure, scale horizontally, stop guessing capacity, manage change through automation.
  • IV. Performance EfficiencyUsing computing resources efficiently. Key practices: democratize advanced technologies (use managed services), go global in minutes, use serverless architectures, experiment more often, consider mechanical sympathy (use the right tool for the job).
  • V. Cost OptimizationAvoiding unnecessary costs. Key practices: implement cloud financial management, adopt a consumption model (pay for what you use), measure overall efficiency, stop spending on undifferentiated heavy lifting, analyze and attribute expenditure.
  • VI. Sustainability(Added 2021) Minimizing the environmental impacts of running cloud workloads. Key practices: understand your impact, establish sustainability goals, maximize utilization, use managed services, use higher-level managed services that spread load efficiently, reduce downstream impact.

The trade-offs between pillars

The hardest part of the Well-Architected Framework is that the pillars trade off against each other. There is no architecture that is simultaneously maximally reliable, maximally performant, and maximally cost-optimized.

Trade-off scenarioDecisionWhat you gainWhat you sacrifice
Multi-AZ vs single-AZ databaseMulti-AZReliability (73s failover vs 6h recovery)Cost (2× database cost)
DynamoDB on-demand vs provisionedOn-demandCost efficiency at low/unpredictable load; Reliability (no throttling)Cost at high predictable load (provisioned is 80% cheaper)
Synchronous vs async processingAsync (SQS + Lambda)Reliability (retries, dead letter queues); Cost (pay per invocation)Complexity; Operational excellence (harder to debug)
Caching (ElastiCache vs no cache)Add cachePerformance; Cost (fewer DB reads)Reliability (cache invalidation bugs); Operational excellence (more components)
Spot instances vs On-DemandSpot for batch jobsCost (70-90% cheaper)Reliability (spot interruptions); must design for graceful shutdown

How to make pillar trade-off decisions

Match your reliability and cost investments to the business criticality of the workload. A payment processing service needs maximum reliability (Multi-AZ, read replicas, chaos testing). An internal analytics dashboard can trade reliability for cost (single-AZ, no HA, spot instances for batch processing).

Quick check

A company uses a single-AZ RDS instance to reduce database costs. Which Well-Architected pillar are they trading off?

Pillar deep-dive: Reliability

Reliability is most often the pillar with the highest failure risk. Here are the key design patterns:

Key reliability patterns with AWS services

  • Automatic recovery (circuit breaker)Amazon EC2 auto recovery, ECS health checks, ALB target group health checks — automatically replace unhealthy instances without human intervention.
  • Test recovery proceduresRun game days: simulate AZ failures, instance terminations, database failovers. The Well-Architected Framework says "never guess your RTO — measure it." Most teams discover their actual recovery time is 10× their assumed RTO during game days.
  • Horizontal scalingAuto Scaling Groups with EC2, ECS Service autoscaling, DynamoDB on-demand — add capacity as load increases, remove it when load drops. Never a single point of failure.
  • Manage change with automationCloudFormation, CDK, or Terraform for all infrastructure changes. No manual console changes in production. Every change is code-reviewed, tested in staging, and deployed with rollback capability.
well-architected-rds.tf
1# Well-Architected: Reliability pillar — RDS with Multi-AZ and automated backups
2resource "aws_db_instance" "main" {
3 identifier = "production-db"
4 engine = "postgres"
5 engine_version = "15.4"
6 instance_class = "db.t3.large"
7
8 # Reliability: Multi-AZ for automatic failover (73s vs hours)
9 multi_az = true
10
Multi-AZ is the most important reliability setting — enables 73s automatic failover
11 # Reliability: Automated backups with 7-day retention
12 backup_retention_period = 7
13 backup_window = "03:00-04:00"
Automated backups enable point-in-time recovery — your RPO is minutes, not days
14 maintenance_window = "sun:05:00-sun:06:00"
15
16 # Security: encryption at rest (Security pillar)
17 storage_encrypted = true
18 kms_key_id = aws_kms_key.db.arn
Encryption at rest is a Security pillar requirement — always enable for production data
19
20 # Reliability: automated minor version upgrades
21 auto_minor_version_upgrade = true
22
23 # Reliability: deletion protection prevents accidental deletion
24 deletion_protection = true
Deletion protection prevents someone accidentally `terraform destroy`-ing your production database
25
26 # Cost: storage autoscaling prevents manual intervention
27 max_allocated_storage = 1000
28}
How this might come up in interviews

Cloud architecture interviews, solutions architect roles, and technical leadership discussions. AWS certifications (SAA, SAP) test this extensively.

Common questions:

  • What are the six pillars of the AWS Well-Architected Framework?
  • How do you trade off reliability vs cost in a real architecture decision?
  • What does the Well-Architected Framework say about security best practices?
  • Have you done a Well-Architected Review? What did you find?

Key takeaways

  • Six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability.
  • Pillars trade off — maximize reliability for payment processing; optimize cost for internal batch jobs. Match investment to business criticality.
  • Reliability pillar key patterns: Multi-AZ databases, Auto Scaling Groups, automated recovery, test recovery procedures with game days.
  • Security pillar: least privilege IAM, all data encrypted at rest and in transit, no secrets in code or userdata, CloudTrail enabled.
  • Use the free AWS Well-Architected Tool in the console to run a structured review against all six pillars.
Before you move on: can you answer these?

A startup wants to minimize AWS costs on their MVP. Should they use Multi-AZ RDS?

It depends on business criticality. For an MVP with no paying customers, single-AZ is an acceptable cost trade-off. Once customers are paying or the product is business-critical, Multi-AZ is required (Reliability pillar). This is the explicit trade-off the framework asks you to make consciously.

What is the purpose of a "game day" in the Reliability pillar?

A game day is a scheduled exercise where you simulate failures (AZ outage, database failover, instance termination) to measure your actual RTO and RTO — and discover recovery procedures that do not work before a real incident does.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.