Your retries are turning a blip into a full outage
When a service slows down, naive retries multiply the load exactly when the system is weakest, creating a retry storm that finishes the job the original failure started. Always pair retries with exponential backoff AND jitter, cap the total attempts, and add a circuit breaker so you stop hammering a dependency that is already down.