How Route 53 routing policies, record types, and TTL strategy work in production -- and how CloudFront distributions, cache behaviours, origin failover, and invalidation costs shape CDN architecture decisions.
Configures Route 53 records and CloudFront distributions. Knows the difference between CNAME and ALIAS. Can trace a "DNS change isn't working" issue to TTL or client-side caching. Sets cache-control headers correctly for static vs dynamic content.
Designs DNS architecture for multi-region systems including failover routing, health check configuration, and TTL strategy for planned changes. Owns CloudFront distribution design with correct cache behaviours, Origin Shield decision, and cost model. Writes runbooks for DNS failover procedures.
Sets organisation-wide DNS and CDN standards. Decides between CloudFront vs third-party CDN (Fastly, Akamai) based on cost model, feature requirements, and vendor risk. Ensures all customer-facing endpoints have tested failover paths with defined RTO/RPO.
How Route 53 routing policies, record types, and TTL strategy work in production -- and how CloudFront distributions, cache behaviours, origin failover, and invalidation costs shape CDN architecture decisions.
A single customer changes a config setting, triggering a latent software bug in Fastly's network.
WARNINGWithin 49 seconds, 85% of Fastly's global PoPs begin failing and returning errors.
CRITICALThe New York Times, BBC, Reddit, GitHub, Twitch, Amazon UK, and thousands of other sites return 503 errors globally.
CRITICALFastly engineers identify the root cause and begin deploying the software fix across the global network.
WARNINGServices restore globally. Total outage duration: 49 minutes.
The question this raises
Does your CDN configuration contain a latent bug that a single customer config change could trigger -- and would you detect it before 85% of your edge nodes were returning errors?
You want to point your root domain (example.com) directly to an Application Load Balancer. You try to create a CNAME record for example.com pointing to the ALB DNS name, but your DNS provider rejects it. Why?
Lesson outline
DNS for application engineers means "it translates hostnames to IPs." DNS for cloud engineers means controlling how traffic is routed globally -- which region handles a user's request, what happens when an origin fails, and how long clients cache the answer. Route 53 routing policies make DNS a traffic management layer, not just a name lookup service.
Route 53 record types that matter
CNAME at the zone apex is invalid -- use ALIAS
example.com is the zone apex. You cannot create a CNAME for example.com -- the DNS specification prohibits it because CNAME requires exclusive ownership of the name. You can create a CNAME for www.example.com. For the root domain pointing to an ALB or CloudFront distribution, always use an ALIAS record. AWS resolves ALIAS records server-side without the extra RTT of a CNAME chain.
Route 53 transforms DNS from a static name-to-IP mapping into a programmable traffic management layer. The right policy depends on whether you are optimising for latency, cost, resilience, or gradual rollout.
| Policy | How it works | Primary use case | Gotcha |
|---|---|---|---|
| Simple | Returns a single record (or all values if multiple). No health check by default. | Single-region, single-origin endpoints | No automatic failover -- if the target is unhealthy, clients receive the broken IP. |
| Weighted | Splits traffic by percentage across multiple records. Weights are relative, not absolute. | Canary rollouts and A/B traffic splitting | A weight of 0 on a record removes it from rotation entirely, not just reduces it. |
| Latency-based | Routes to the AWS region with the lowest measured latency from the user's resolver. | Multi-region active-active deployments | Measures latency from resolver location, not user location. Corporate DNS proxies can skew results. |
| Failover | Routes to primary unless primary fails health check; then routes to secondary. | Active-passive disaster recovery | The health check must be correctly configured -- failover never triggers without a passing health check. |
| Geolocation | Routes based on user's geographic location (country or continent). | Data residency compliance and localisation | Requires a default record for locations not explicitly mapped, or DNS queries fail. |
Failover routing requires a correctly configured health check
A failover record without a health check never fails over -- it always serves the primary. The health check interval (10 or 30 seconds) plus the failure threshold (default 3 consecutive failures) means failover triggers after 30-90 seconds of primary failure. Set the health check endpoint to something lightweight that does not call downstream dependencies -- otherwise a slow database triggers DNS failover when the application server is actually healthy.
A CloudFront distribution sits in front of one or more origins (S3, ALB, API Gateway, custom HTTP server). Requests hit one of 600+ global PoPs; if the content is cached, it is served immediately. If not, the request is forwarded to the origin. The cache hit rate determines how much origin load and cost is saved.
Cache behaviour key decisions
Origin Shield adds a caching layer between PoPs and your origin
Without Origin Shield, each of CloudFront's 600+ PoPs can independently query your origin on a cache miss. For a low-traffic origin, this means up to 600 simultaneous requests for the same uncached object during a traffic spike. Origin Shield collapses all PoP cache misses to a single request to the shield region, then distributes the response. It adds $0.01/GB but can dramatically reduce origin load and prevent origin overload during traffic spikes.
CNAME vs ALIAS
📖 What the exam expects
Both CNAME and ALIAS records map one name to another name or resource. Use CNAME for subdomains and ALIAS for root domains.
Toggle between what certifications teach and what production actually requires
Mid-level and senior cloud engineer interviews, solutions architect interviews, and production incident investigations. Often framed as "why did this DNS change break things?" or "design a globally distributed static site."
Common questions:
Try this question: What is your current CDN cache hit rate? Do you use versioned asset filenames or rely on invalidations for deployments? Have you tested your DNS failover routing recently?
Strong answer: Reduces TTL one week before a planned DNS change. Uses versioned filenames for all static assets. Knows the difference between the 600-second and 86400-second TTL implications. Mentions IMDSv2 or Origin Shield unprompted when discussing CloudFront.
Red flags: Uses CNAME at the zone apex without knowing why it fails. Believes DNS changes propagate instantly. Treats cache invalidation as the primary deployment mechanism. Cannot explain what a health check must test to trigger DNS failover.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.