Kubernetes Audit Logging: Who Did What, When
Kubernetes audit logs record every API request including who made it, what they did, and when. Without audit logging, security incidents are unrecoverable -- you cannot determine what an attacker accessed or modified. Audit logging is the forensic foundation of Kubernetes security.
Why this matters at your level
Enable kube-apiserver audit logging. Configure an AuditPolicy to capture relevant events without excessive noise. Ship logs to a SIEM. Know what the four audit stages are (RequestReceived, ResponseStarted, ResponseComplete, Panic).
Design audit log retention policy (compliance: often 1 year minimum). Set up alerting on high-value events (Secret reads, ClusterRoleBinding creates, privileged pod creates). Integrate with SOC workflows for tier-1 security response.
Kubernetes Audit Logging: Who Did What, When
Kubernetes audit logs record every API request including who made it, what they did, and when. Without audit logging, security incidents are unrecoverable -- you cannot determine what an attacker accessed or modified. Audit logging is the forensic foundation of Kubernetes security.
Attacker enters cluster; no audit logs to record initial access method
Anomalous AWS charges detected; incident response begins
Investigation finds: no audit logs; cannot determine scope of breach
Decision: treat entire cluster as compromised; full rebuild begins
Full cluster rebuild and credential rotation complete; estimated $150k response cost
The question this raises
What does Kubernetes audit logging capture, and what is unrecoverable if audit logs are absent during a security incident?
A pod was deleted at 3 AM and you need to determine who deleted it. Audit logging is enabled at Metadata level for pod deletions. What information is available in the audit log?
Lesson outline
What Audit Logging Solves
No Audit Logs = No Forensics
Without audit logs, a security incident becomes unrecoverable: you cannot determine initial access method, what an attacker read or modified, or what persistent backdoors were left. Audit logging is the difference between a targeted incident response and rebuilding the entire cluster from scratch.
AuditPolicy: tiered approach
Use for: Metadata level for most resources (low volume, high forensic value). Request level for RBAC changes and pod creates (captures intent). None for high-volume read paths (metric scrapers, controllers polling status).
Out-of-band log shipping
Use for: Ship audit logs to external SIEM or immutable log store immediately. Never rely on logs stored on the API server node -- attacker with node access can delete them. CloudWatch, Splunk, Elasticsearch with S3 backup.
High-value event alerting
Use for: Alert immediately (not batch) on: ClusterRoleBinding creates/updates, Secret reads in production namespaces, privileged pod creates, node shell commands. These are early indicators of attack progression.
The System View: Audit Log Flow
API Request: GET /api/v1/namespaces/prod/secrets/db-password
|
v kube-apiserver audit filter
AuditPolicy rule: secrets -> level: Request
|
v AuditEvent generated:
{
"verb": "get",
"user": {"username": "system:serviceaccount:prod:attacker-sa"},
"objectRef": {"resource": "secrets", "name": "db-password", "namespace": "prod"},
"sourceIPs": ["10.0.1.5"],
"responseStatus": {"code": 200},
"requestReceivedTimestamp": "2023-03-15T02:14:33Z",
"stageTimestamp": "2023-03-15T02:14:33.001Z",
"stage": "ResponseComplete"
}
|
v Fluent Bit DaemonSet on API server node
v -> Elasticsearch (SIEM)
v -> S3 (immutable archive, 1-year retention)
|
v Alert: Secret read in prod namespace -> PagerDutyEvery API request generates an audit event; ship immediately to out-of-band immutable store before attacker can delete on-node logs
Audit Policy Design
Full RequestResponse on all resources
“Secret values appear in response body of logs; encryption at rest moot if logs contain plaintext; 10TB/day of log volume”
“Metadata for most; Request for RBAC/pod creates; RequestResponse for nothing (Secret values never logged); 50GB/day manageable”
Audit logs only on API server node
“Attacker with node access deletes audit logs; incident response blind; cannot prove scope of breach”
“Fluent Bit ships logs within 30 seconds to external SIEM; S3 object lock prevents deletion; forensic trail preserved even if node compromised”
How AuditPolicy Works
Designing a tiered audit policy
01
1. Identify high-value resources: secrets, clusterrolebindings, pods, nodes, serviceaccounts
02
2. Identify noisy low-value paths: GET/WATCH on configmaps/status/leases by controllers
03
3. Write rules: None for controller watch loops, Metadata for normal CRUD, Request for RBAC changes
04
4. Apply policy via kube-apiserver --audit-policy-file flag
05
5. Monitor log volume; adjust None rules for paths generating > 10% of total audit events
06
6. Set up alerts on Secret reads, ClusterRoleBinding creates, privileged pod specs
1. Identify high-value resources: secrets, clusterrolebindings, pods, nodes, serviceaccounts
2. Identify noisy low-value paths: GET/WATCH on configmaps/status/leases by controllers
3. Write rules: None for controller watch loops, Metadata for normal CRUD, Request for RBAC changes
4. Apply policy via kube-apiserver --audit-policy-file flag
5. Monitor log volume; adjust None rules for paths generating > 10% of total audit events
6. Set up alerts on Secret reads, ClusterRoleBinding creates, privileged pod specs
1apiVersion: audit.k8s.io/v12kind: Policy3rules:4# Log secret access at Request level (no response body = no values)5- level: RequestRequest level for secrets: logs who read them without exposing values in response body6resources:7- group: ""8resources: ["secrets"]910# Log RBAC changes at Request level11- level: Request12resources:13- group: "rbac.authorization.k8s.io"14resources: ["clusterrolebindings", "rolebindings", "clusterroles", "roles"]15None for controller watch loops: kube-controller-manager polls constantly; would dominate audit volume16# Skip noisy controller watch loops17- level: None18users: ["system:kube-controller-manager"]19verbs: ["watch", "list"]20resources:21- group: ""22resources: ["pods", "services", "endpoints"]2324# Default: Metadata for everything else25- level: Metadata
What Breaks in Production: Blast Radius
Audit logging failure modes
- API server performance impact — RequestResponse level on all resources can slow API server response time by 15-20% and generate huge log volumes. Always profile log volume and API latency after changing audit policy. Start with Metadata-only, add Request for specific resources.
- Logs not shipped out-of-band — Audit logs stored only on API server nodes. Node compromise = log destruction. Ship within 30 seconds to external SIEM. Use Fluent Bit DaemonSet on control plane nodes with direct SIEM integration.
- Secret values in audit response body — RequestResponse level for secrets logs the decrypted Secret values in the response body. This defeats all encryption at rest. Never use RequestResponse for secrets. Use Request level (logs who reads it, not what value).
- No alerting -- logs collected but not monitored — Audit logs in SIEM but no alerts configured. Attack happens; logs record it faithfully; nobody sees it for 3 weeks (real incident timeline). Configure real-time alerts on ClusterRoleBinding creates, Secret reads, privileged pod creates.
RequestResponse level for secrets logs credential values in plaintext
# DANGEROUS: logs Secret VALUES in response body
rules:
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
# Every kubectl get secret now logs:
# responseObject.data.password = "aHVudGVyMg=="
# base64 decoded = "hunter2"
# All credential values in your SIEM/S3 logs# Safe: logs who accessed secrets, not their values
rules:
- level: Request
resources:
- group: ""
resources: ["secrets"]
# Logs: user, IP, verb, secret name, timestamp
# Does NOT log responseObject (no values exposed)Request level logs the requestObject (what the user sent) but not responseObject (what the API server returned -- the secret values). Use Request level for secrets: forensically useful (who read it) without exposing credentials.
Decision Guide: Audit Policy Levels
Cost and Complexity: Audit Level Comparison
| Level | What is logged | Log volume | Forensic value | Use for |
|---|---|---|---|---|
| None | Nothing | Zero | None | High-frequency controller watches |
| Metadata | User, resource, verb, IP, timestamp | Low | High | Most resources (default) |
| Request | Metadata + request body | Medium | Very high | RBAC changes, pod creates |
| RequestResponse | Metadata + request + response | Very high | Highest (+ security risk for secrets) | Non-sensitive debugging only |
Exam Answer vs. Production Reality
What audit logs capture
📖 What the exam expects
Every API request: verb (get/create/delete), resource (pods/secrets/configmaps), namespace, user/serviceaccount, source IP, request/response body (depending on audit level), timestamp.
Toggle between what certifications teach and what production actually requires
How this might come up in interviews
Security and compliance questions about forensic capability and incident response readiness.
Common questions:
- What does Kubernetes audit logging capture?
- What are the four audit log levels?
- How would you design an audit policy that is forensically useful without generating excessive noise?
- Why must audit logs be shipped out-of-band from the cluster?
Strong answer: Mentions tiered AuditPolicy (different levels for different resources), out-of-band log shipping to immutable store, and alerting on high-value events (cluster-admin bindings, Secret reads).
Red flags: Disabling audit logging to save storage, or not knowing the audit policy levels.
Related concepts
Explore topics that connect to this one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Discussion
Questions? Discuss in the community or start a thread below.
Join DiscordIn-app Q&A
Sign in to start or join a thread.