Automated CIS Hardening of Kubernetes Clusters

Table of Contents

A freshly bootstrapped Kubernetes cluster is functional, but it is not secure by default. The CIS Kubernetes Benchmark from the Center for Internet Security is the most widely used baseline for locking a cluster down. It is a long checklist of concrete settings covering the API server, the controller manager, the scheduler, etcd, the kubelet, and general cluster policies. Going through it by hand is slow and error prone, and the result drifts the moment someone changes a flag. This post shows how to assess and enforce the benchmark automatically.

Why automate it
#

The benchmark has well over a hundred individual controls, and most clusters have more than one node. Checking each control manually on every node does not scale, and a one time manual pass tells you nothing about the state of the cluster next week. Automation gives you three things that a manual review cannot. You get a repeatable assessment that produces the same report every time, you get continuous verification so configuration drift is caught early, and you get remediation as code so the hardened state is reproducible on a new cluster.

The usual split is to separate the two halves of the job. Assessment answers the question of where the cluster fails the benchmark, and remediation actually changes the configuration to fix those failures. It is good practice to keep them apart so you can audit first, understand the impact, and only then enforce.

Step 1: Assess with kube-bench
#

kube-bench from Aqua Security is the de facto tool for checking a cluster against the CIS benchmark. It inspects the running configuration of each component, compares it against the benchmark, and reports every control as pass, fail, or warn. It detects the Kubernetes version and selects the matching benchmark automatically.

The cleanest way to run it is as a Job on the cluster itself, because it needs access to the host filesystem to read the component manifests and config files.

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml

# wait for the job to finish, then read the report
kubectl logs job/kube-bench

On a control plane node you can also run it directly against the host. This is handy in a CI pipeline or during the initial build before the cluster serves traffic.

# run the control plane checks and emit JSON for machine processing
kube-bench run --targets master --json > kube-bench-report.json

The JSON output is the important part for automation. Every control carries a test number, a description, the result, and a remediation hint. You can feed this into a pipeline and fail the build when any control regresses.

# fail the pipeline if kube-bench reports any failed control
kube-bench run --targets master --json \
  | jq -e '.Totals.total_fail == 0' > /dev/null \
  || { echo "CIS benchmark failures found"; exit 1; }

Step 2: Understand a typical finding
#

It helps to look at what a single control actually means before automating the fix. A common failure on a fresh cluster is the kubelet anonymous authentication setting. By default the kubelet may accept unauthenticated requests on its API, which lets anyone who can reach the node read pod information or trigger actions. The benchmark requires anonymous access to be turned off.

The fix lives in the kubelet configuration file, usually /var/lib/kubelet/config.yaml.

# /var/lib/kubelet/config.yaml
authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
authorization:
  mode: Webhook

After changing the config you restart the kubelet so the new settings take effect.

sudo systemctl restart kubelet

Most control plane controls work the same way. They are flags on the static pod manifests in /etc/kubernetes/manifests, such as kube-apiserver.yaml, and editing the manifest makes the kubelet recreate the pod with the new settings.

Step 3: Enforce with Ansible
#

Hand editing files does not scale across nodes, so the remediation belongs in a configuration management tool. Ansible is a natural fit because it is agentless and idempotent, which means you can run the same playbook repeatedly and it only changes what is not already correct. The pattern is one task per benchmark control, so the playbook reads like a hardened version of the checklist itself.

# harden-kubelet.yml
- name: Harden kubelet according to CIS benchmark
  hosts: all
  become: true
  tasks:
    - name: Disable anonymous auth on the kubelet
      ansible.builtin.replace:
        path: /var/lib/kubelet/config.yaml
        regexp: 'anonymous:\n\s*enabled: true'
        replace: "anonymous:\n    enabled: false"
      notify: restart kubelet

    - name: Set kubelet config file permissions to 0600
      ansible.builtin.file:
        path: /var/lib/kubelet/config.yaml
        owner: root
        group: root
        mode: '0600'

  handlers:
    - name: restart kubelet
      ansible.builtin.systemd:
        name: kubelet
        state: restarted

If you would rather not write every control yourself, there are maintained roles that already encode the full benchmark, such as the ansible-lockdown Kubernetes role. These let you toggle individual controls through variables, which is useful because some controls do not fit every environment and need to be reviewed before they are enforced.

Step 4: Keep it from drifting
#

A hardened cluster slowly drifts as people debug issues and forget to revert a flag, so the assessment has to run on a schedule rather than once. A simple and effective approach is a CronJob that runs kube-bench inside the cluster and ships the result to wherever you collect logs or alerts.

# kube-bench-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: kube-bench
  namespace: security
spec:
  schedule: "0 3 * * *"   # every night at 03:00
  jobTemplate:
    spec:
      template:
        spec:
          hostPID: true
          containers:
            - name: kube-bench
              image: aquasec/kube-bench:latest
              command: ["kube-bench", "run", "--targets", "node", "--json"]
          restartPolicy: Never

For a stronger guarantee you can stop bad configuration before it ever reaches the cluster. An admission policy engine such as Kyverno or OPA Gatekeeper enforces many of the workload related benchmark controls at admission time, for example by rejecting privileged containers or pods that mount the host filesystem. This pairs well with kube-bench, because kube-bench audits the cluster components while the admission policies guard the workloads running on top.

A practical workflow
#

Putting the pieces together gives a clear and repeatable process. You assess the cluster with kube-bench and capture the JSON report, you review the failures and decide which controls apply to your environment, you encode the fixes as an Ansible playbook and run it across all nodes, and you schedule kube-bench to run nightly so any drift shows up the next morning. Workload controls are handled by admission policies so insecure pods never start in the first place.

The result is a cluster that is not only hardened once, but stays hardened, with the security posture living in version control next to the rest of your infrastructure code.

Why automate it#

Step 1: Assess with kube-bench#

Step 2: Understand a typical finding#

Step 3: Enforce with Ansible#

Step 4: Keep it from drifting#

A practical workflow#

Further reading#