Every request to your backend passes through these layers — understand them deeply
Every request to your backend passes through these layers — understand them deeply
Before API gateways, every client connected to every backend service directly. Adding authentication meant adding it to every service. Rate limiting meant reimplementing it everywhere. An API gateway centralizes these cross-cutting concerns.
What an API Gateway Does
| Algorithm | How It Works | Best For | Limitation |
|---|---|---|---|
| Round Robin | Cycle through servers: 1, 2, 3, 1, 2, 3… | Stateless services with similar request costs | Ignores server load; slow servers still get traffic |
| Weighted Round Robin | Server A gets 2× requests of Server B | Heterogeneous server capacities | Static weights don't adapt to real-time load |
| Least Connections | Route to server with fewest active connections | Long-lived connections, variable request time | Doesn't account for request cost differences |
| IP Hash | hash(client_ip) % N → same server always | Session affinity (sticky sessions) | Uneven distribution; adding servers changes mappings |
| Least Response Time | Route to server with lowest avg latency + fewest connections | General production (nginx, HAProxy default) | Requires tracking response times |
| Consistent Hashing | Virtual nodes on a ring; minimal redistribution on changes | Caches, distributed KV stores | Complex implementation |
Production Recommendation
Stateless HTTP services: Least Response Time (or Round Robin with health checks). Stateful services or caches: Consistent Hashing. Sticky sessions required: IP Hash (but prefer making services stateless instead).
1// Token Bucket Rate Limiter using Redis2// Handles distributed rate limiting across multiple API gateway instances34const redis = new Redis(process.env.REDIS_URL);56async function checkRateLimit(7identifier: string, // 'user:123' or 'ip:1.2.3.4'8maxTokens: number,9refillRate: number // tokens per second10): Promise<{ allowed: boolean; remaining: number }> {11const key = `ratelimit:${identifier}`;12const now = Date.now() / 1000;Lua script executes atomically in Redis — no race conditions between check and decrement1314// Lua script: atomic check-and-update (prevents race conditions)15const luaScript = `16local key = KEYS[1]17local max_tokens = tonumber(ARGV[1])18local refill_rate = tonumber(ARGV[2])19local now = tonumber(ARGV[3])2021local data = redis.call('HMGET', key, 'tokens', 'last_refill')22local tokens = tonumber(data[1]) or max_tokens23local last_refill = tonumber(data[2]) or now2425-- Refill based on elapsed timeToken bucket: full bucket allows burst up to maxTokens; steady state limited to refillRate/sec26local elapsed = now - last_refill27tokens = math.min(max_tokens, tokens + elapsed * refill_rate)2829if tokens >= 1 then30tokens = tokens - 131redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)32redis.call('EXPIRE', key, 86400)33return {1, math.floor(tokens)} -- {allowed, remaining}34else35return {0, 0} -- denied36end37`;3839const [allowed, remaining] = await redis.eval(40luaScript, 1, key,41maxTokens.toString(), refillRate.toString(), now.toString()42) as [number, number];4344return { allowed: allowed === 1, remaining };Always return rate limit headers — clients need these to implement backoff45}4647// Middleware48export async function rateLimitMiddleware(req: Request, res: Response, next: NextFunction) {429 Too Many Requests is the correct HTTP status code for rate limiting (RFC 6585)49const result = await checkRateLimit(`user:${req.user.id}`, 100, 10);5051res.setHeader('X-RateLimit-Remaining', result.remaining);5253if (!result.allowed) {54return res.status(429).json({ error: 'Rate limit exceeded' });55}56next();57}
Health Check Types (Kubernetes)
Graceful Shutdown Pattern
On SIGTERM: (1) stop accepting new connections, (2) finish processing in-flight requests, (3) close DB connections, (4) exit. Kubernetes waits terminationGracePeriodSeconds (default 30s). Without graceful shutdown, in-flight requests return 500 errors.
Load balancing and gateway questions test system design fundamentals. Know the algorithms and tradeoffs. The sticky sessions and consistent hashing questions are classics.
Common questions:
Strong answers include:
Red flags:
Quick check · API Gateway and Load Balancing: Traffic at Scale
1 / 2
Key takeaways
From the books
Web Scalability for Startup Engineers — Artur Ejsmont (2015)
Chapter 7: Scaling with a Load Balancer
Load balancers are not just for horizontal scaling — they provide health checking, SSL termination, and graceful deployments (rolling restarts) that would otherwise require downtime.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.