The systematic approach to debugging Istio issues -- proxy-status STALE, missing VirtualService effect, and the 503 that is not a 503.
Know the basic debugging ladder: proxy-status, proxy-config routes, proxy-config endpoints. Know the response flag meanings for the most common 503 types.
Debug complex scenarios: namespace scope, cross-cluster, STALE config. Know how to enable per-component Envoy debug logging without causing performance issues.
Build internal debugging toolkits that codify the debugging ladder as scripts. Define response flag monitoring as part of the cluster observability standard. Write the Istio troubleshooting runbook for on-call engineers.
The systematic approach to debugging Istio issues -- proxy-status STALE, missing VirtualService effect, and the 503 that is not a 503.
VirtualService applied -- expected 10% traffic to v2
Grafana shows 0% traffic to v2 -- team re-applies VS, still no effect
WARNINGistioctl proxy-status shows STALE for payment namespace pods
CRITICALNetworkPolicy blocking port 15012 (xDS) discovered and removed
Proxies resync -- STALE -> SYNCED -- canary traffic appears
The question this raises
What is the systematic debugging approach for Istio issues, and which command do you run first when traffic is not behaving as expected?
You apply a VirtualService to route 10% of traffic to v2. After 30 minutes, Grafana shows 0% to v2. You run kubectl get virtualservice and the spec shows the correct 90/10 weights. What is your next debugging step?
Lesson outline
Always start with proxy-status -- it tells you whether config is applied before debugging config content
The most common mistake when debugging Istio is reading VirtualService YAML and wondering why it does not work. The first question must be: does the proxy even HAVE this config? istioctl proxy-status answers that. STALE means the proxy has not acknowledged the latest push. SYNCED means it has. Debug the config content ONLY after confirming SYNCED.
Systematic Istio debugging ladder
01
Is the pod in the mesh? kubectl get pod my-pod -o jsonpath='{.spec.containers[*].name}' | grep istio-proxy
02
Is config synced? istioctl proxy-status -- look for SYNCED vs STALE vs NOT_SENT
03
What routes does the proxy have? istioctl proxy-config routes my-pod.namespace
04
What endpoints does the proxy know about? istioctl proxy-config endpoints my-pod.namespace
05
Is mTLS configured correctly? istioctl x describe pod my-pod.namespace
06
Is AuthorizationPolicy blocking? istioctl x authz check my-pod.namespace
07
Full config dump for comparison? istioctl proxy-config all my-pod.namespace --output json
Is the pod in the mesh? kubectl get pod my-pod -o jsonpath='{.spec.containers[*].name}' | grep istio-proxy
Is config synced? istioctl proxy-status -- look for SYNCED vs STALE vs NOT_SENT
What routes does the proxy have? istioctl proxy-config routes my-pod.namespace
What endpoints does the proxy know about? istioctl proxy-config endpoints my-pod.namespace
Is mTLS configured correctly? istioctl x describe pod my-pod.namespace
Is AuthorizationPolicy blocking? istioctl x authz check my-pod.namespace
Full config dump for comparison? istioctl proxy-config all my-pod.namespace --output json
Most common Istio issues and their root causes
Response flag decoder (most important for 503 debugging): UH = Upstream Unhealthy -- circuit breaker / outlier detection ejected endpoint URX = Upstream Retry Exhausted -- all retry attempts failed NR = No Route -- VirtualService has no matching route (misconfigured or REGISTRY_ONLY) UF = Upstream Connection Failure -- could not connect (wrong port, wrong IP, STRICT mTLS with non-mesh) DC = Downstream Connection Termination -- client disconnected before response RL = Rate Limited -- connection pool overflow (maxPendingRequests exceeded) UAEX= Unauthorized (AuthorizationPolicy DENY) How to read response_flags: kubectl logs my-pod -c istio-proxy | grep response_flags | head -20 Or in Prometheus: sum by (response_flags) (rate(istio_requests_total[5m]))
1# === STEP 1: Is the pod in the mesh? ===2kubectl get pod payment-svc-xxx -n production \3-o jsonpath='{.spec.containers[*].name}'4# Expected: payment-svc istio-proxy56# === STEP 2: Is config synced? ===7istioctl proxy-status8# SYNCED: proxy has latest config9# STALE: push sent but not acknowledged (check port 15012 connectivity)10# NOT SENT: istiod has not pushed to this proxy1112# === STEP 3: What routes does the proxy have? ===13istioctl proxy-config routes payment-svc-xxx.production14# Check: does the VirtualService route appear?15istioctl proxy-config routes payment-svc-xxx.production \16--name 8080 --output json | python3 -m json.tool1718# === STEP 4: Check endpoints ===19istioctl proxy-config endpoints payment-svc-xxx.production \20--cluster "outbound|8080||payment-svc.production.svc.cluster.local"21# HEALTHY: endpoint in pool22# EJECTED: circuit breaker ejected this endpoint2324# === STEP 5: Check mTLS policy ===25istioctl x describe pod payment-svc-xxx.production26# Shows: PeerAuthentication mode, AuthorizationPolicy matches2728# === STEP 6: Check AuthorizationPolicy ===29istioctl x authz check payment-svc-xxx.production30# Shows which policies allow/deny each traffic pattern3132# === Debug: response flags in Prometheus ===33# Query: sum by (response_flags) (rate(istio_requests_total{destination_service="payment-svc"}[5m]))34# Response flags: UH=circuit breaker, URX=retry exhausted, NR=no route, RL=rate limited3536# === Debug: enable logging for a specific component ===37istioctl proxy-config log payment-svc-xxx.production --level rbac:debug38# After debugging, restore default:39istioctl proxy-config log payment-svc-xxx.production --level default
1# istio-analyze: validates all Istio resources in a namespace2istioctl analyze -n production3# Common warnings:4# - VirtualService gateway does not exist5# - DestinationRule subset has no matching pods6# - PeerAuthentication in namespace conflicts with mesh default7# - ServiceEntry hostname shadows Kubernetes service89# Check if a specific request would be allowed/denied10istioctl x authz check payment-svc-xxx.production \11--namespace production \12--header "x-forwarded-proto: https"1314# Compare two pods' xDS config (should be identical for same service)15istioctl proxy-config all payment-svc-xxx.production --output json > pod1.json16istioctl proxy-config all payment-svc-yyy.production --output json > pod2.json17diff pod1.json pod2.json # should show only minor differences (pod IP, name)
Most common causes of "VirtualService has no effect"
Debugging config content before checking proxy-status
# Wrong debugging order:
# 1. VS not working -> re-read the VS YAML
kubectl get virtualservice payment-svc -o yaml
# Looks correct...
# 2. Delete and recreate the VS
kubectl delete vs payment-svc && kubectl apply -f vs.yaml
# Still not working...
# 3. Add more rules, check edge cases
# Hours pass...
# 4. Never checked if proxy even received the config# Correct debugging order:
# Step 1: Is the proxy synced?
istioctl proxy-status | grep payment-svc
# Shows STALE -> proxy never received the config
# -> check xDS connectivity: port 15012 from pod to istiod
# If SYNCED -> proxy HAS the config
# Step 2: Does the proxy have the right routes?
istioctl proxy-config routes payment-svc-pod.production --name 8080
# Shows route table -> is the VS rule in there?
# Step 3: Are the endpoints correct?
istioctl proxy-config endpoints payment-svc-pod.production
# Shows endpoint health -> any EJECTED endpoints?
# Step 4: Is policy blocking?
istioctl x authz check payment-svc-pod.productionThe debugging ladder is ordered by what eliminates the most hypotheses fastest. proxy-status is first because a STALE status immediately explains any routing anomaly regardless of what the VS YAML says. Reading the VS YAML when the proxy is STALE is wasted effort. Always eliminate "does the proxy have the config?" before debugging "is the config correct?"
| Command | What it shows | When to use |
|---|---|---|
| istioctl proxy-status | CDS/LDS/EDS/RDS sync state per pod | Always first -- eliminates config sync issues |
| istioctl proxy-config routes | HTTP route table (VirtualService rules) | VS not taking effect -- verify rule is in proxy |
| istioctl proxy-config endpoints | Endpoint health and circuit breaker state | 503s from healthy pods -- check for EJECTED endpoints |
| istioctl proxy-config clusters | Cluster config (DestinationRule settings) | Connection pool / LB config questions |
| istioctl x describe pod | mTLS policy, AuthorizationPolicy summary | mTLS errors, auth policy questions |
| istioctl x authz check | Policy evaluation for a pod | AuthorizationPolicy blocking unexpected traffic |
| istioctl analyze | Validation of all Istio CRDs in namespace | Post-apply validation -- finds misconfigured resources |
Istio debugging approach
📖 What the exam expects
Start with istioctl proxy-status to verify config sync. Then use proxy-config commands to inspect the actual Envoy config. Use istioctl analyze for resource validation.
Toggle between what certifications teach and what production actually requires
Scenario-based questions in all Istio interviews. "Walk me through how you would debug a 503 in a mesh" is the standard entry point.
Common questions:
Strong answer: Says "proxy-status first" immediately, knows STALE vs SYNCED distinction, knows NR/UH/URX flags, has debugged a NetworkPolicy blocking xDS.
Red flags: Starts by reading the VirtualService YAML, does not know proxy-status, cannot name any response flags.
Related concepts
Explore topics that connect to this one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.