Skip to main content
Career Paths
Concepts
Cluster Hardening Cis
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Cluster Hardening & CIS Benchmark

The CIS Kubernetes Benchmark provides 200+ controls for hardening cluster infrastructure. Understanding which controls matter most, how to assess them with kube-bench, and which hardening flags have operational consequences is the difference between a compliant cluster and a usable one.

Relevant for:Mid-levelSeniorStaff
Why this matters at your level
Mid-level

Know the top 5 CIS controls and why they matter. Be able to run kube-bench and interpret its output. Understand that --anonymous-auth=true is a critical misconfiguration.

Senior

Write audit policies targeted at compliance requirements (SOC2, PCI, HIPAA). Implement hardening without breaking monitoring agents and existing workloads. Plan etcd TLS retrofit as a maintenance operation.

Staff

Define organizational CIS compliance targets and acceptable risk register. Automate kube-bench in CI/CD with defined thresholds. Own the cluster hardening runbook. Evaluate L2 controls against specific threat model rather than applying all controls indiscriminately.

Cluster Hardening & CIS Benchmark

The CIS Kubernetes Benchmark provides 200+ controls for hardening cluster infrastructure. Understanding which controls matter most, how to assess them with kube-bench, and which hardening flags have operational consequences is the difference between a compliant cluster and a usable one.

~5 min read
Be the first to complete!
LIVEControl Plane Exposure -- Anonymous API Access -- Production Cluster -- 2021
Breaking News
Week 0

Cluster deployed with default API server settings -- anonymous auth enabled

WARNING
Week 1

Dashboard deployed with no network restrictions -- accessible externally

WARNING
Week 3

Security scanner reveals unauthenticated API server and indexed Dashboard

CRITICAL
Week 3+1d

kube-bench run: cluster scores 23% CIS compliance -- 47 FAIL controls

CRITICAL
Week 3+2d

Full incident response triggered -- secrets rotated, cluster rebuilt

WARNING
Week 4

Rebuilt cluster scores 78% CIS compliance -- anonymous auth disabled, audit logging enabled

—Exposure window before detection
—Initial CIS compliance score
—Resources with potential exposure
—Engineering time for remediation

The question this raises

Which CIS Benchmark controls prevent the highest-consequence exposures, and how do you enforce them without breaking cluster operations?

Test your assumption first

You run kube-bench on a new production cluster and get 47 FAIL results across L1 and L2 controls. Your security team asks you to fix everything immediately. What is the right approach?

Lesson outline

What the CIS Benchmark covers

CIS Kubernetes Benchmark: 5 control families

The Center for Internet Security publishes a Kubernetes Benchmark with 200+ controls across 5 families: Control Plane Configuration (API server, etcd, scheduler, controller manager), Worker Node Configuration (kubelet, config files), Policies (RBAC, PSA, network policies), Managed Services (cloud-specific controls for EKS/GKE/AKS). Controls are graded L1 (minimum security, no operational impact) and L2 (defense-in-depth, may have operational trade-offs).

Highest-impact CIS controls (L1 -- apply immediately)

  • Disable anonymous auth (1.2.1) — API server --anonymous-auth=false. Prevents unauthenticated read access to cluster state including ConfigMaps, Secrets references, and namespace listings
  • Enable audit logging (1.2.22-1.2.25) — API server --audit-log-path, --audit-policy-file. Without audit logs, you cannot detect breach or reconstruct what was accessed. Required for SOC2/PCI compliance.
  • Restrict kubelet API (4.2.1) — kubelet --anonymous-auth=false, --authorization-mode=Webhook. Unauthenticated kubelet API allows pod exec, log access, and node enumeration
  • etcd TLS (2.1) — etcd --cert-file, --key-file, --peer-cert-file. etcd stores all cluster state -- unencrypted etcd is full cluster compromise
  • Disable profiling endpoints (1.2.21) — API server and scheduler --profiling=false. Profiling endpoints leak performance and stack data useful for exploitation

The system view: hardening layers

Kubernetes security hardening layers:

Internet/Internal Users
    |
    v
[Network boundary]  --> Restrict API server to VPC/private subnet
    |                   Kubernetes Dashboard: never expose publicly
    v
[API Server]        --> --anonymous-auth=false
    |                   --audit-log-path=/var/log/kubernetes/audit.log
    |                   --audit-policy-file=/etc/kubernetes/audit-policy.yaml
    |                   --tls-min-version=VersionTLS12
    v
[Authorization]     --> --authorization-mode=Node,RBAC  (never AlwaysAllow)
    |                   Least-privilege RBAC for all ServiceAccounts
    v
[Admission]         --> OPA/Gatekeeper or Kyverno policies
    |                   Pod Security Standards enforcement
    v
[etcd]              --> --cert-file + --key-file (TLS)
    |                   --peer-cert-file (peer TLS)
    |                   Access restricted to API server only (firewall)
    v
[Worker Nodes]      --> kubelet --anonymous-auth=false
    |                   kubelet --authorization-mode=Webhook
    |                   AppArmor/seccomp profiles on pods
    v
[Runtime]           --> Pod Security Standards (Restricted profile)
                        Falco runtime detection

How this concept changes your thinking

Situation
Before
After

Setting up a new cluster for a production workload

“Deploy with kubeadm defaults, enable Dashboard for visibility, configure RBAC later when teams are onboarded.”

“Run kube-bench immediately after cluster creation. Fix all L1 failures before any workload deployment. Dashboard: deploy with RBAC authentication and no external exposure from day one.”

Enabling audit logging on a production cluster

“Audit logging will flood disk space and slow the API server -- we will enable it before the next audit.”

“A targeted audit policy (log Secrets reads, privileged pod creations, RBAC changes) adds <5% API server overhead. Without audit logs, you have no breach detection. Enable at cluster creation with log rotation and ship to a SIEM.”

How to assess and remediate with kube-bench

CIS hardening assessment and remediation workflow

→

01

Run kube-bench as a Job in the cluster to assess all control plane and worker node controls: kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml

→

02

Review FAIL results by section. Prioritize L1 controls first -- these are minimum security with no operational trade-offs.

→

03

For API server controls: modify the kube-apiserver static pod manifest at /etc/kubernetes/manifests/kube-apiserver.yaml. kubelet will automatically restart the pod. Controls: --anonymous-auth=false, --audit-log-path, --profiling=false.

→

04

For kubelet controls: modify kubelet config at /var/lib/kubelet/config.yaml or /etc/kubernetes/kubelet.conf. Restart kubelet service. Controls: anonymous.enabled: false, authorization.mode: Webhook.

→

05

For etcd controls: etcd peer and client TLS is configured at cluster creation. Retrofitting TLS on a running cluster requires etcd restart -- plan a maintenance window.

→

06

Run kube-bench again after changes to confirm PASS. Target 80%+ L1 compliance before any production workload.

07

Integrate kube-bench into CI/CD as a gate: run on every node image build to catch regressions before they reach production.

1

Run kube-bench as a Job in the cluster to assess all control plane and worker node controls: kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml

2

Review FAIL results by section. Prioritize L1 controls first -- these are minimum security with no operational trade-offs.

3

For API server controls: modify the kube-apiserver static pod manifest at /etc/kubernetes/manifests/kube-apiserver.yaml. kubelet will automatically restart the pod. Controls: --anonymous-auth=false, --audit-log-path, --profiling=false.

4

For kubelet controls: modify kubelet config at /var/lib/kubelet/config.yaml or /etc/kubernetes/kubelet.conf. Restart kubelet service. Controls: anonymous.enabled: false, authorization.mode: Webhook.

5

For etcd controls: etcd peer and client TLS is configured at cluster creation. Retrofitting TLS on a running cluster requires etcd restart -- plan a maintenance window.

6

Run kube-bench again after changes to confirm PASS. Target 80%+ L1 compliance before any production workload.

7

Integrate kube-bench into CI/CD as a gate: run on every node image build to catch regressions before they reach production.

audit-policy.yaml
1# Targeted audit policy -- log high-value events without flooding disk
2apiVersion: audit.k8s.io/v1
3kind: Policy
4rules:
Metadata level logs who accessed what -- not the response body. Lower overhead than RequestResponse.
5 # Log all Secret reads (credential access detection)
6 - level: Metadata
7 resources:
8 - group: ""
9 resources: ["secrets"]
10 verbs: ["get", "list", "watch"]
11
12 # Log RBAC changes (privilege escalation detection)
RBAC changes at RequestResponse level to catch privilege escalation attempts with full detail
13 - level: RequestResponse
14 resources:
15 - group: "rbac.authorization.k8s.io"
16 resources: ["roles", "clusterroles", "rolebindings", "clusterrolebindings"]
17
18 # Log privileged pod creations
19 - level: Request
20 resources:
21 - group: ""
22 resources: ["pods"]
23 verbs: ["create"]
24
Skip high-volume read-only calls that generate noise without security value
25 # Skip noisy read-only API calls
26 - level: None
27 verbs: ["get", "list", "watch"]
28 resources:
29 - group: ""
30 resources: ["endpoints", "services", "configmaps"]
31
32 # Default: log metadata for everything else
33 - level: Metadata

What breaks -- and operational trade-offs of hardening

Blast radius: what hardening controls can break

  • --anonymous-auth=false — Breaks health checks configured to hit /healthz without a token. Fix: configure health check clients with a kubeconfig or use localhost health probes
  • kubelet --anonymous-auth=false — Breaks node-level monitoring agents that hit the kubelet metrics endpoint without authentication. Fix: configure Prometheus node exporter with kubelet TLS certificates
  • etcd peer TLS retrofit — Requires cluster downtime to add peer TLS to a running etcd cluster. Must be planned as a maintenance operation with etcd snapshot backup before proceeding
  • Pod Security Restricted profile — Blocks containers running as root, with privileged: true, or with host namespaces. Many Helm charts fail Restricted -- requires chart updates or namespace exemptions
  • Disabling service account token automount — Pods that need API server access break if automountServiceAccountToken: false is set globally. Requires explicit opt-in per workload

Test L2 controls in staging before production

L1 CIS controls are safe to apply without testing. L2 controls (encryption at rest for all data, read-only root filesystem requirement, restricting service account tokens globally) have operational side effects. Always test L2 controls in staging first, with a representative sample of your production workloads.

Decision guide: which controls to prioritize first

CIS Benchmark prioritization

Is anonymous auth enabled on the API server or kubelet?
YesFix immediately (L1). This is the highest-blast-radius control -- unauthenticated API access allows full cluster enumeration.
NoContinue to next gate
Is audit logging enabled with at least Metadata-level Secret access logging?
YesContinue to next gate
NoEnable audit logging before any other hardening. Without audit logs you cannot detect breach or prove compliance. This is required for SOC2/PCI.
Are etcd and kubelet running with TLS?
YesContinue to next gate
NoCritical: etcd without TLS means all cluster state is readable on the network. kubelet without TLS exposes pod exec and logs. Address in maintenance window.
Is RBAC the only authorization mode (no AlwaysAllow)?
YesRun kube-bench to assess remaining controls and target 80%+ L1 compliance
NoRemove AlwaysAllow immediately. Any other authorization mode combined with RBAC bypasses access controls entirely.

Cost and complexity: compliance vs operability

CIS ControlRisk if skippedOperational cost to implement
--anonymous-auth=falseFull unauthenticated API enumerationLow -- one flag, no operational impact
Audit loggingZero breach detection, compliance failureMedium -- audit policy design, log storage, SIEM integration
etcd TLSAll cluster state readable on internal networkHigh -- requires maintenance window for retrofit
Kubelet authNode-level pod exec/log access without authMedium -- breaks unauthenticated monitoring agents
Pod Security RestrictedPrivilege escalation via root containersHigh -- breaks root-running Helm charts, requires app changes
Secrets encryption at restetcd backup readable without keyLow for new clusters, medium for retrofit (key management)

Exam Answer vs. Production Reality

1 / 2

CIS Benchmark assessment

📖 What the exam expects

The CIS Kubernetes Benchmark provides prescriptive security recommendations. kube-bench is an open-source tool that checks cluster configuration against CIS controls and reports PASS/FAIL/WARN.

Toggle between what certifications teach and what production actually requires

How this might come up in interviews

Asked in security-focused platform interviews: "How would you harden a new Kubernetes cluster?" Also in post-mortem debrief style questions about cluster exposures.

Common questions:

  • Walk me through how you would harden a new Kubernetes cluster against the CIS Benchmark.
  • What does kube-bench do and how would you use it?
  • Which CIS control would you apply first and why?
  • How do you enable audit logging without degrading API server performance?
  • What is the risk of leaving --anonymous-auth=true on a production cluster?

Strong answer: Candidates who have written audit policies for specific compliance frameworks, who have retrofitted hardening controls on running clusters, who can articulate the operational impact of each control category.

Red flags: Candidates who have never run kube-bench. Anyone who thinks 100% CIS compliance is the goal without understanding operational trade-offs. Not knowing that anonymous auth defaults to true.

Related concepts

Explore topics that connect to this one.

  • Kubernetes Audit Logging: Who Did What, When
  • Pod Security Standards: Hardening Workload Configurations
  • Kubernetes Multi-Tenancy: Sharing Clusters Safely

Suggested next

Often learned after this topic.

Kubernetes Multi-Tenancy: Sharing Clusters Safely

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.

Sign in to track your progress and mark lessons complete.

Continue learning

Kubernetes Multi-Tenancy: Sharing Clusters Safely

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.