Skip to main content
emnode / learn
Compliance

Enable cluster and search audit logging

One capability across EKS clusters and Elasticsearch search domains: capture and watch the control-plane and search activity that records who called the API, what they queried, and what failed.

14 min·10 sections·AWS

Last reviewed

Remediates AWS Security Hub: EKS.8ES.4ES.5GuardDuty.5

Cluster and search audit logging: the basics

What does a quiet control plane actually hide?

This capability covers the cluster control planes and search domains that broker access to your workloads: the Amazon EKS Kubernetes API server, and Amazon Elasticsearch Service (the older name for OpenSearch) search domains. Each runs a front door that every action passes through, and by default that front door keeps no durable, watched record of who called it or what they asked for. Cluster and search audit logging is about turning that recorder on and, for EKS, having a managed detector read the tape.

AWS Security Hub turns each layer into its own control, which is why a single estate can fail several at once. EKS.8 checks that an EKS cluster has the audit control-plane log type exporting to CloudWatch Logs. GuardDuty.5 checks that GuardDuty EKS Audit Log Monitoring is analysing that Kubernetes audit stream for threats. ES.4 checks that an Elasticsearch domain publishes its error logs, and ES.5 checks that it publishes its audit logs. They look like separate problems on the report, but they are one capability: make sure the cluster and search layers record what happens to them, and that something is watching.

It is flagged because these are the most attacker-active and audit-relevant surfaces in their respective stacks. The Kubernetes API is where an attacker who lands in a cluster enumerates, escalates, and reads secrets; a search domain holds the most queryable copy of an organisation's sensitive data. Without the audit log there is no answer to "who deleted that deployment?" or "who ran that query at 2am?" and the events were never recorded, so no later investigation recovers them. Audit logging is the kind of thing that is cheap to leave on and impossible to retrofit onto an incident that already happened.

In this lesson you will learn what the EKS control plane and Elasticsearch search domains actually log, the difference between the agentless control-plane audit log and the managed detector that reads it, and how the Elasticsearch error and audit logs differ and depend on each other. The Controls this lesson covers section lists every Security Hub control in this capability, each linking to a deep page with the exact check and a copy-and-paste fix.

Fun fact

The audit log was already there

When GuardDuty EKS Audit Log Monitoring first shipped, a surprising number of teams found their clusters had been writing Kubernetes audit logs the whole time, with nobody analysing them. GuardDuty does not even require you to enable EKS control-plane logging to CloudWatch first; it consumes the audit stream directly at no extra EKS logging cost. One platform team turned the feature on across 40 clusters in an afternoon with a single organization-level setting and had their first finding (an over-permissive service account binding created months earlier) within hours. The data had been flowing past unwatched for the better part of a year. The same lesson holds for Elasticsearch: the moment someone asks for the access trail on a customer-data domain is the moment they discover audit logging was never on.

Finding unmonitored clusters and domains

Diego runs platform security at a healthcare SaaS company. Security Hub flags EKS.8 on three of seven clusters, GuardDuty.5 as failed across the organization, and ES.5 on a legacy Elasticsearch domain backing customer-record search. None of these surfaces is recording or being watched.

Rather than work the findings one by one, he starts by confirming which clusters have the audit log type enabled, so he can see the scope of the gap before changing anything.

Sweep the EKS fleet for the audit control-plane log type. A cluster missing it fails EKS.8 and is running its API server unrecorded.

$ aws eks list-clusters --query 'clusters[]' --output text
prod-platform staging-a staging-b data-eks sandbox
# prod-platform: audit=true
# staging-a: audit=false
# staging-b: audit=false
# Three clusters logging their control plane; two going unrecorded.

Confirm the audit log type across the fleet before enabling, so the change is deliberate per cluster rather than a blind toggle.

How clusters and search domains record activitydeep dive

EKS exposes five independently toggleable control-plane log types: api, audit, authenticator, controllerManager and scheduler. EKS.8 evaluates only audit (the Kubernetes audit log of every API request, the authenticated identity, the verb, the resource and the response), and its Config rule eks-cluster-log-enabled is parameterised with logTypes: audit. When enabled, EKS ships these records to a CloudWatch Logs group named /aws/eks//cluster. GuardDuty.5 is distinct: it is the EKS_AUDIT_LOGS detector feature that reads the Kubernetes audit stream directly from the control plane (agentless, no node software, and no requirement to enable CloudWatch export first) and raises Kubernetes/* threat findings. One gives you the durable log in your account; the other is a managed detector.

Elasticsearch domains configure logging through the LogPublishingOptions map, keyed by log type: ES_APPLICATION_LOGS (error logs, the ES.4 control), the slow-log types, and AUDIT_LOGS (the security trail, the ES.5 control). ES.4 fails when ES_APPLICATION_LOGS is absent or disabled; ES.5 fails when AUDIT_LOGS is. The hard dependency is that audit logs require fine-grained access control, which in turn requires node-to-node encryption, encryption at rest, and HTTPS enforcement, so enabling AUDIT_LOGS on a domain without FGAC is rejected. The error log has no such prerequisite, which is why ES.4 is usually the quicker fix.

Both layers have a permission wrinkle and an evaluation lag. Elasticsearch writes as the es.amazonaws.com service principal and needs a CloudWatch Logs resource policy on the target log group (not an IAM role on the domain); skip it and the domain reconfigures cleanly while no events arrive. In a GuardDuty organization, only the delegated administrator can enable EKS_AUDIT_LOGS and the cleanest pattern is auto-enable so new members inherit it, with the notorious edge case that a suspended member lacking the feature keeps GuardDuty.5 red until it is disassociated. Security Hub re-evaluates on a periodic or change-triggered cycle, so a fix can lag the change by a short window.

What is the impact of leaving these surfaces unmonitored?

The primary impact is investigative blindness on the highest-value targets. The Kubernetes API brokers every meaningful action in a cluster, and escaping one container can mean owning all of them; a search domain holds the most queryable copy of sensitive data. With audit logging off, a security incident on either has no trail: you cannot determine which identity escalated privileges, read a secret, or ran a bulk export, because the events were never recorded. For GuardDuty.5 specifically, the events are being logged by the control plane and analysed by no one, so the precursors to a breach pass by unseen.

The second impact is operational. Elasticsearch error logs are where the domain records circuit-breaker trips, shard allocation failures, and mapping conflicts; without them in CloudWatch, an engineer responding to a degraded search domain restarts blind and stretches a short incident into a long one. Cluster and search audit logs are routinely the first place teams look during the next reliability incident too, not just the next security one.

On the compliance side, EKS.8, ES.4 and ES.5 map to the NIST 800-53 audit family (AU-2, AU-3, AU-12) and to PCI DSS requirement 10.2.1, and GuardDuty.5 is a High-severity control whose persistent failure drags down the overall security score and surfaces in audits and customer security questionnaires. In a GuardDuty organization there is also no partial credit: the control only clears when the delegated administrator and every active member have the feature on, so covering 28 of 30 accounts reads the same as covering none.

How do you enable cluster and search audit logging safely?

Work the capability as one loop rather than chasing individual findings. The order matters: confirm prerequisites and scope before flipping switches, and set retention before logs start piling up.

1. Inventory which clusters and domains record and which are watched

Across every region and account, check which EKS clusters have the audit log type enabled, whether GuardDuty EKS Audit Log Monitoring is on at the delegated administrator, and which Elasticsearch domains publish error and audit logs. EKS.8 and the GuardDuty control are change-triggered or periodic, so a cluster or domain created before the standard will silently fail until someone touches it. Produce a one-line pass/fail per resource and rank by data sensitivity.

2. Enable the EKS audit log type and turn on the detector

Enable the audit log type with update-cluster-config; it is a non-disruptive control-plane change that does not restart the API server or evict pods, and EKS creates the /aws/eks//cluster log group on first enable. Separately, enable EKS_AUDIT_LOGS on the GuardDuty detector (the delegated administrator in an organization) and auto-enable it for all members so new accounts inherit it. If GuardDuty.5 stays red, disassociate any suspended member that lacks the feature.

3. Confirm FGAC, then enable the Elasticsearch error and audit logs

Error logging (ES.4) has no prerequisite: prepare a CloudWatch Logs group with the es.amazonaws.com resource policy and set ES_APPLICATION_LOGS. Audit logging (ES.5) requires fine-grained access control first; if AdvancedSecurityOptions is disabled, enabling it triggers a blue/green deployment, so schedule a window before setting AUDIT_LOGS. After each change, run a test request and confirm events actually land in the log group, because a clean reconfigure with a missing resource policy produces zero events silently.

4. Cap retention and prevent recurrence

Set a retention policy on every CloudWatch log group; the audit stream is the highest-volume EKS log type and the default is never-expire. Match the window to the strictest compliance obligation in scope. Then bake the audit log type, FGAC, the log publishing options, GuardDuty auto-enable, and retention into the provisioning template and Config rules (eks-cluster-log-enabled, opensearch-audit-logging-enabled) so new clusters and domains arrive compliant and the controls stay green by construction.

# Enable the EKS audit log type (non-disruptive), then bound the cost with retention.
aws eks update-cluster-config \
  --name prod-platform \
  --logging '{"clusterLogging":[{"types":["audit"],"enabled":true}]}'

aws logs put-retention-policy \
  --log-group-name /aws/eks/prod-platform/cluster \
  --retention-in-days 90

# Turn on GuardDuty EKS Audit Log Monitoring and auto-enable for the whole org.
DETECTOR=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)
aws guardduty update-detector --detector-id "$DETECTOR" \
  --features '[{"Name":"EKS_AUDIT_LOGS","Status":"ENABLED"}]'
aws guardduty update-organization-configuration --detector-id "$DETECTOR" \
  --features '[{"Name":"EKS_AUDIT_LOGS","AutoEnable":"ALL"}]'

Quick quiz

Question 1 of 5

Security Hub shows EKS.8, GuardDuty.5 and ES.5 all failing. What is the most efficient way to think about them?

You can now treat cluster and search audit logging as one capability rather than a scatter of findings: inventory which clusters and domains record and which are watched, enable the EKS audit log type and the GuardDuty detector, confirm the FGAC prerequisite before turning on Elasticsearch audit logs, and cap retention so the cost stays bounded. The Controls this lesson covers section below links every control in this group to its deep page and fix.

Back to the library

Controls this lesson covers

One capability, many AWS Security Hub controls. This lesson is the shared playbook; each control below keeps its own deep page with the exact check, severity and a copy-and-paste fix.