Skip to main content
Career Paths
Concepts
Istio Troubleshooting
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Istio Troubleshooting

The systematic approach to debugging Istio issues -- proxy-status STALE, missing VirtualService effect, and the 503 that is not a 503.

Relevant for:Mid-levelSeniorStaff
Why this matters at your level
Mid-level

Know the basic debugging ladder: proxy-status, proxy-config routes, proxy-config endpoints. Know the response flag meanings for the most common 503 types.

Senior

Debug complex scenarios: namespace scope, cross-cluster, STALE config. Know how to enable per-component Envoy debug logging without causing performance issues.

Staff

Build internal debugging toolkits that codify the debugging ladder as scripts. Define response flag monitoring as part of the cluster observability standard. Write the Istio troubleshooting runbook for on-call engineers.

Istio Troubleshooting

The systematic approach to debugging Istio issues -- proxy-status STALE, missing VirtualService effect, and the 503 that is not a 503.

~4 min read
Be the first to complete!
LIVEData Plane Mystery -- proxy-status STALE -- Config Sync Broken, Canary Stuck
Breaking News
T+0

VirtualService applied -- expected 10% traffic to v2

T+2h

Grafana shows 0% traffic to v2 -- team re-applies VS, still no effect

WARNING
T+3h

istioctl proxy-status shows STALE for payment namespace pods

CRITICAL
T+3.5h

NetworkPolicy blocking port 15012 (xDS) discovered and removed

T+3.5h

Proxies resync -- STALE -> SYNCED -- canary traffic appears

—Debugging time before finding NetworkPolicy
—v2 traffic despite VirtualService applied
—NetworkPolicy rule blocking xDS port
—Port blocked (Envoy-to-istiod xDS)

The question this raises

What is the systematic debugging approach for Istio issues, and which command do you run first when traffic is not behaving as expected?

Test your assumption first

You apply a VirtualService to route 10% of traffic to v2. After 30 minutes, Grafana shows 0% to v2. You run kubectl get virtualservice and the spec shows the correct 90/10 weights. What is your next debugging step?

Lesson outline

The Istio debugging ladder

Always start with proxy-status -- it tells you whether config is applied before debugging config content

The most common mistake when debugging Istio is reading VirtualService YAML and wondering why it does not work. The first question must be: does the proxy even HAVE this config? istioctl proxy-status answers that. STALE means the proxy has not acknowledged the latest push. SYNCED means it has. Debug the config content ONLY after confirming SYNCED.

Systematic Istio debugging ladder

→

01

Is the pod in the mesh? kubectl get pod my-pod -o jsonpath='{.spec.containers[*].name}' | grep istio-proxy

→

02

Is config synced? istioctl proxy-status -- look for SYNCED vs STALE vs NOT_SENT

→

03

What routes does the proxy have? istioctl proxy-config routes my-pod.namespace

→

04

What endpoints does the proxy know about? istioctl proxy-config endpoints my-pod.namespace

→

05

Is mTLS configured correctly? istioctl x describe pod my-pod.namespace

→

06

Is AuthorizationPolicy blocking? istioctl x authz check my-pod.namespace

07

Full config dump for comparison? istioctl proxy-config all my-pod.namespace --output json

1

Is the pod in the mesh? kubectl get pod my-pod -o jsonpath='{.spec.containers[*].name}' | grep istio-proxy

2

Is config synced? istioctl proxy-status -- look for SYNCED vs STALE vs NOT_SENT

3

What routes does the proxy have? istioctl proxy-config routes my-pod.namespace

4

What endpoints does the proxy know about? istioctl proxy-config endpoints my-pod.namespace

5

Is mTLS configured correctly? istioctl x describe pod my-pod.namespace

6

Is AuthorizationPolicy blocking? istioctl x authz check my-pod.namespace

7

Full config dump for comparison? istioctl proxy-config all my-pod.namespace --output json

Common failure patterns and diagnosis

Most common Istio issues and their root causes

  • VirtualService has no effect (traffic unchanged) — Check proxy-status first -- STALE = xDS blocked. Then check VS namespace scope vs caller namespace. Then verify DestinationRule subsets exist.
  • 503 with no upstream error — Check response_flags: UH = upstream unhealthy (circuit breaker), URX = retry exhausted, NR = no route, UF = upstream connection failure
  • Liveness probe failing suddenly — Did you recently add an AuthorizationPolicy? Kubelet has no SPIFFE cert -- check for missing health check ALLOW rule.
  • mTLS handshake error — Is STRICT applied but some callers not in mesh? Check PeerAuthentication mode. Are certs expired? Check with openssl on the sidecar.
  • Cross-cluster call failing — Check istioctl remote-clusters for SYNCED. Verify east-west gateway IP reachable from remote cluster. Compare root CA fingerprints.

Response flag decoder (most important for 503 debugging):
  UH  = Upstream Unhealthy -- circuit breaker / outlier detection ejected endpoint
  URX = Upstream Retry Exhausted -- all retry attempts failed
  NR  = No Route -- VirtualService has no matching route (misconfigured or REGISTRY_ONLY)
  UF  = Upstream Connection Failure -- could not connect (wrong port, wrong IP, STRICT mTLS with non-mesh)
  DC  = Downstream Connection Termination -- client disconnected before response
  RL  = Rate Limited -- connection pool overflow (maxPendingRequests exceeded)
  UAEX= Unauthorized (AuthorizationPolicy DENY)

How to read response_flags:
  kubectl logs my-pod -c istio-proxy | grep response_flags | head -20
  Or in Prometheus:
  sum by (response_flags) (rate(istio_requests_total[5m]))
kubectl
1# === STEP 1: Is the pod in the mesh? ===
2kubectl get pod payment-svc-xxx -n production \
3 -o jsonpath='{.spec.containers[*].name}'
4# Expected: payment-svc istio-proxy
5
6# === STEP 2: Is config synced? ===
7istioctl proxy-status
8# SYNCED: proxy has latest config
9# STALE: push sent but not acknowledged (check port 15012 connectivity)
10# NOT SENT: istiod has not pushed to this proxy
11
12# === STEP 3: What routes does the proxy have? ===
13istioctl proxy-config routes payment-svc-xxx.production
14# Check: does the VirtualService route appear?
15istioctl proxy-config routes payment-svc-xxx.production \
16 --name 8080 --output json | python3 -m json.tool
17
18# === STEP 4: Check endpoints ===
19istioctl proxy-config endpoints payment-svc-xxx.production \
20 --cluster "outbound|8080||payment-svc.production.svc.cluster.local"
21# HEALTHY: endpoint in pool
22# EJECTED: circuit breaker ejected this endpoint
23
24# === STEP 5: Check mTLS policy ===
25istioctl x describe pod payment-svc-xxx.production
26# Shows: PeerAuthentication mode, AuthorizationPolicy matches
27
28# === STEP 6: Check AuthorizationPolicy ===
29istioctl x authz check payment-svc-xxx.production
30# Shows which policies allow/deny each traffic pattern
31
32# === Debug: response flags in Prometheus ===
33# Query: sum by (response_flags) (rate(istio_requests_total{destination_service="payment-svc"}[5m]))
34# Response flags: UH=circuit breaker, URX=retry exhausted, NR=no route, RL=rate limited
35
36# === Debug: enable logging for a specific component ===
37istioctl proxy-config log payment-svc-xxx.production --level rbac:debug
38# After debugging, restore default:
39istioctl proxy-config log payment-svc-xxx.production --level default
debug-checklist.yaml
1# istio-analyze: validates all Istio resources in a namespace
2istioctl analyze -n production
3# Common warnings:
4# - VirtualService gateway does not exist
5# - DestinationRule subset has no matching pods
6# - PeerAuthentication in namespace conflicts with mesh default
7# - ServiceEntry hostname shadows Kubernetes service
8
9# Check if a specific request would be allowed/denied
10istioctl x authz check payment-svc-xxx.production \
11 --namespace production \
12 --header "x-forwarded-proto: https"
13
14# Compare two pods' xDS config (should be identical for same service)
15istioctl proxy-config all payment-svc-xxx.production --output json > pod1.json
16istioctl proxy-config all payment-svc-yyy.production --output json > pod2.json
17diff pod1.json pod2.json # should show only minor differences (pod IP, name)

What breaks in production

Most common causes of "VirtualService has no effect"

  • proxy-status STALE — xDS connection blocked (NetworkPolicy, firewall rule on port 15012/15010) -- VS applied but never received by proxy
  • Wrong namespace scope — VS in namespace A does not govern traffic FROM namespace B -- callers in B bypass the VS entirely
  • DestinationRule subset missing — VS references subset "v2" but DR does not define it -- 503 for that weight percentage
  • VS applied to wrong host — hosts: ["payment"] but service FQDN is "payment-svc" -- no host match, no VS applied
  • istiod down — VS applied to API server but istiod is down -- no xDS push -- proxies run on last known config

Debugging config content before checking proxy-status

Bug
# Wrong debugging order:
# 1. VS not working -> re-read the VS YAML
kubectl get virtualservice payment-svc -o yaml
# Looks correct...
# 2. Delete and recreate the VS
kubectl delete vs payment-svc && kubectl apply -f vs.yaml
# Still not working...
# 3. Add more rules, check edge cases
# Hours pass...
# 4. Never checked if proxy even received the config
Fix
# Correct debugging order:
# Step 1: Is the proxy synced?
istioctl proxy-status | grep payment-svc
# Shows STALE -> proxy never received the config
# -> check xDS connectivity: port 15012 from pod to istiod

# If SYNCED -> proxy HAS the config
# Step 2: Does the proxy have the right routes?
istioctl proxy-config routes payment-svc-pod.production --name 8080
# Shows route table -> is the VS rule in there?

# Step 3: Are the endpoints correct?
istioctl proxy-config endpoints payment-svc-pod.production
# Shows endpoint health -> any EJECTED endpoints?

# Step 4: Is policy blocking?
istioctl x authz check payment-svc-pod.production

The debugging ladder is ordered by what eliminates the most hypotheses fastest. proxy-status is first because a STALE status immediately explains any routing anomaly regardless of what the VS YAML says. Reading the VS YAML when the proxy is STALE is wasted effort. Always eliminate "does the proxy have the config?" before debugging "is the config correct?"

Decision guide: where is the failure?

Does istioctl proxy-status show STALE for the affected pods?
YesxDS sync is broken -- check NetworkPolicy on port 15012, istiod health, and istiod-to-pod connectivity. Fix sync before debugging config.
NoDoes istioctl proxy-config routes show the expected route?
Does istioctl proxy-config routes show the expected route?
YesRouting config is present -- check endpoints (circuit breaker state) and authorization policy (authz check).
NoConfig is not in the proxy -- check VS namespace scope, DestinationRule subset definitions, and run istioctl analyze for validation errors.

Istio debugging commands reference

CommandWhat it showsWhen to use
istioctl proxy-statusCDS/LDS/EDS/RDS sync state per podAlways first -- eliminates config sync issues
istioctl proxy-config routesHTTP route table (VirtualService rules)VS not taking effect -- verify rule is in proxy
istioctl proxy-config endpointsEndpoint health and circuit breaker state503s from healthy pods -- check for EJECTED endpoints
istioctl proxy-config clustersCluster config (DestinationRule settings)Connection pool / LB config questions
istioctl x describe podmTLS policy, AuthorizationPolicy summarymTLS errors, auth policy questions
istioctl x authz checkPolicy evaluation for a podAuthorizationPolicy blocking unexpected traffic
istioctl analyzeValidation of all Istio CRDs in namespacePost-apply validation -- finds misconfigured resources

Exam Answer vs. Production Reality

1 / 2

Istio debugging approach

📖 What the exam expects

Start with istioctl proxy-status to verify config sync. Then use proxy-config commands to inspect the actual Envoy config. Use istioctl analyze for resource validation.

Toggle between what certifications teach and what production actually requires

How this might come up in interviews

Scenario-based questions in all Istio interviews. "Walk me through how you would debug a 503 in a mesh" is the standard entry point.

Common questions:

  • What is the first command you run when a VirtualService seems to have no effect?
  • What does proxy-status STALE mean and what causes it?
  • How do you debug a 503 in Istio?
  • What are response flags and how do you use them?
  • How would you verify that an AuthorizationPolicy is correctly blocking traffic?

Strong answer: Says "proxy-status first" immediately, knows STALE vs SYNCED distinction, knows NR/UH/URX flags, has debugged a NetworkPolicy blocking xDS.

Red flags: Starts by reading the VirtualService YAML, does not know proxy-status, cannot name any response flags.

Related concepts

Explore topics that connect to this one.

  • Istio Performance Tuning
  • Istio Observability & Distributed Tracing
  • Istio Production Best Practices

Suggested next

Often learned after this topic.

Istio Production Best Practices

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.

Sign in to track your progress and mark lessons complete.

Continue learning

Istio Production Best Practices

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.