Skip to main content
emnode / learn
Compliance

Manage KMS encryption keys

One capability across rotation, deletion protection, key-policy scope and decrypt permissions: keep the KMS keys that protect everything you encrypt rotating, recoverable, private and reachable only by the principals that genuinely need them.

15 min·10 sections·AWS

Last reviewed

Managing KMS keys: the basics

What does it actually mean to manage a key well across its whole lifecycle?

AWS KMS keys are the root of trust for almost everything you encrypt: S3 objects, EBS volumes, RDS databases, Secrets Manager secrets, CloudTrail logs, Lambda environment variables. A key is not a one-time setting you switch on and forget. It has a lifecycle, and Security Hub turns each part of that lifecycle into its own control. KMS.4 checks that the backing material rotates on a schedule. KMS.3 checks that no key is sitting in a pending-deletion countdown. KMS.5 checks that no key policy grants a wildcard principal. KMS.1 and KMS.2 check that customer managed IAM policies, and inline IAM policies, do not allow the decrypt-family actions against every key. The estate can fail several of these at once, but they are one capability: keep your keys healthy.

Two more controls in this group sit one layer out from KMS itself. CloudTrail.2 and CloudTrail.10 check that your audit trail and CloudTrail Lake event data store are encrypted with a customer managed KMS key rather than the default service-managed key, so that reading the audit history requires a separate kms:Decrypt you control and audit. IAM.3 checks that long-lived IAM access keys are rotated within 90 days. Different services, same underlying discipline: the keys and key-shaped credentials that gate your data should rotate, stay recoverable, stay private, and be reachable only by named principals.

The unifying idea is that a key compromise or a key mistake has an outsized blast radius, because a key sits in front of so much data. A key that never rotates means one stolen backing key is valid forever. A key scheduled for deletion takes every byte it ever encrypted with it. A key policy with a wildcard principal hands any AWS account the power to encrypt and decrypt. A decrypt-on-all IAM policy turns one stolen credential into a master pass. Managing keys well is about closing each of these doors before it becomes the door an attacker walks through.

In this lesson you will learn how AWS expresses key health across rotation, deletion protection, key policies, IAM decrypt permissions and audit-log encryption, how to inventory the keys in an account and read their state, and how to remediate each failure without locking yourself out or breaking a workload. The Controls this lesson covers section lists every Security Hub control in this capability, each linking to a deep page with the exact check and a copy-and-paste fix.

Fun fact

The Capital One key that wasn't

The 2019 Capital One breach is remembered as a server-side request forgery story, but the second act was KMS. The compromised role held kms:Decrypt permissions broad enough to decrypt the S3 object-encryption keys protecting more than a hundred million customer records, including SSNs and bank account numbers. Tighter IAM scoping on the decrypt action would not have stopped the initial intrusion, but it would have stopped the data exfiltration cold. The attacker pulled ciphertext and got AWS to decrypt it for them, using credentials that should never have had that breadth. It is the textbook case for why decrypt should be scoped to specific keys, not granted on every key in the account.

Auditing key health across an account

Devon runs cloud security at a healthcare SaaS company. Security Hub shows a scatter of key findings: a few KMS.4 rotation failures, one KMS.5 publicly accessible key, and a KMS.3 finding where a decommissioning role has just scheduled a production key for deletion.

Rather than work the findings one at a time, he starts by listing every customer managed key with the three facts that decide its health: who manages it, whether rotation is on, and what state it is in. The deletion countdown is the one with a clock running, so he triages that first.

List every customer managed key with its rotation status and key state. The PendingDeletion row is the one with a clock running.

$ for k in $(aws kms list-keys --query 'Keys[].KeyId' --output text); do echo "$k $(aws kms describe-key --key-id $k --query 'KeyMetadata.KeyState' --output text) $(aws kms get-key-rotation-status --key-id $k --query KeyRotationEnabled --output text)"; done
1234abcd-... Enabled True
2345bcde-... Enabled False
3456cdef-... PendingDeletion False
4567defa-... Enabled True
# One key in PendingDeletion (cancel now) and one with rotation off (KMS.4).

Key state and rotation status in one pass. PendingDeletion is the irreversible clock; cancel it before anything else.

How AWS evaluates key healthdeep dive

The KMS controls resolve to a few distinct mechanisms. Rotation (KMS.4) is a property on the key: when on, KMS generates fresh backing material on a schedule, 365 days by default and configurable from 90 to 2,560 days, while retaining every previous backing key so existing ciphertext decrypts transparently with no re-encryption. Deletion (KMS.3) is a key state: ScheduleKeyDeletion moves a key to PendingDeletion with a 7-to-30-day waiting period, the only window in which CancelKeyDeletion can save it. The access controls are policy documents: KMS.5 inspects the key policy for a wildcard principal, while KMS.1 and KMS.2 inspect customer managed and inline IAM policies for the decrypt-family actions against a wildcard resource.

The KMS authorisation model is dual-control: every call is checked against both the key policy on the resource and the IAM policy on the principal, and both must allow. That is why a wildcard decrypt IAM policy is bounded by what key policies permit, and also why a wildcard principal in a key policy is so dangerous: the key policy is evaluated directly. The decrypt-family actions the controls care about are kms:Decrypt, kms:ReEncryptFrom, kms:GenerateDataKey and kms:GenerateDataKeyWithoutPlaintext, which together cover reading ciphertext and minting plaintext data keys.

Most of these controls are evaluated by AWS Config on a periodic cycle, so a fix does not flip the finding to PASSED instantly; the control plane change is immediate but the report catches up on the next evaluation. The exception worth knowing is KMS.3, which is change-triggered, so a scheduled deletion surfaces within minutes, which is exactly the window in which you can still cancel it. CloudTrail.2 and CloudTrail.10 extend the same key-management discipline to the audit trail itself, and IAM.3 applies it to long-lived access keys, which never expire on their own and must be rotated by policy.

What is the impact of poorly managed keys?

The blast radius is the whole point. A key sits in front of so much data that a single key problem cascades. A never-rotated key means one stolen backing key decrypts the entire history of data under it rather than a bounded window. A key in a deletion countdown that finishes is permanent, total data loss for everything it encrypted, backups included, because they were encrypted with the same key. A wildcard key policy lets any AWS account encrypt and decrypt your data. A decrypt-on-all IAM policy turns one compromised credential into access to every key the key policies permit.

There is a second-order integrity risk too. If an outsider can encrypt with your key, they can plant ciphertext into a pipeline that later decrypts and trusts anything encrypted under that key. The wildcard turns an integrity assumption into a vulnerability, which is why the fix is always to name principals and scope keys explicitly rather than trusting the key boundary alone.

On the compliance side, every modern framework, CIS, PCI DSS, NIST 800-53, SOC 2 and ISO 27001, expects evidence that encryption keys are rotated, protected, scoped and that audit logs are encrypted with a customer-controlled key. A passing set of KMS and CloudTrail key controls across every account is among the cheapest and most defensible artefacts you can hand an auditor, and a persistent failure on a free, well-known control reads as weak hygiene that raises questions about what else basic has been skipped.

How do you manage keys safely?

Work the capability as one loop, ordered by reversibility. Cancel the irreversible thing first (a scheduled deletion), then close the open doors, then ratchet the whole estate shut so the findings cannot recur.

1. Stop any irreversible clock first

If any key is in PendingDeletion (KMS.3), nothing else matters until the clock is stopped. Confirm via CloudTrail who scheduled it and whether the key is still in use, then run cancel-key-deletion. Cancelling returns the key to Disabled, not Enabled, so follow with enable-key and confirm dependent services can decrypt again. When you do retire a key on purpose, disable it first and observe before scheduling deletion with the full 30-day window.

2. Close the open doors: public keys and broad decrypt

For a publicly accessible key (KMS.5), rewrite the wildcard principal to named account or role ARNs and keep the root administrative statement intact, never just delete the statement or you self-lock. For broad decrypt (KMS.1, KMS.2), read CloudTrail to learn which keys a principal actually uses, then replace Resource: "*" with specific key ARNs or narrow with fixed-value conditions such as kms:ViaService or kms:CallerAccount.

3. Turn rotation on and encrypt the audit trail

Enable rotation (KMS.4) on every eligible customer managed symmetric key; it is non-destructive, requires no re-encryption, and the key ID, aliases and policies are unchanged. Asymmetric, HMAC, imported-material and custom-key-store keys cannot auto-rotate and need a documented suppression. Point CloudTrail and the CloudTrail Lake event data store at a dedicated customer managed key (CloudTrail.2, CloudTrail.10), and rotate or retire long-lived IAM access keys (IAM.3) toward short-lived role credentials.

4. Ratchet it shut with guardrails

Make each fix a default rather than a one-time cleanup. Deploy the backing AWS Config rules org-wide, bake EnableKeyRotation and customer managed key encryption into the Terraform and CloudFormation modules teams use, and add Service Control Policies that deny kms:ScheduleKeyDeletion on production-tagged keys and reject inline policies that reintroduce a wildcard decrypt or a wildcard principal. The misconfiguration should be impossible to merge, not merely detectable after the fact.

# Stop the irreversible clock first: cancel any scheduled deletion, then re-enable.
for k in $(aws kms list-keys --query 'Keys[].KeyId' --output text); do
  state=$(aws kms describe-key --key-id "$k" \
    --query 'KeyMetadata.KeyState' --output text)
  if [ "$state" = "PendingDeletion" ]; then
    aws kms cancel-key-deletion --key-id "$k"
    aws kms enable-key --key-id "$k"   # cancel leaves it Disabled
    echo "$k: deletion cancelled and re-enabled"
  fi
done

# Turn rotation on for eligible customer-managed symmetric keys.
for k in $(aws kms list-keys --query 'Keys[].KeyId' --output text); do
  read -r mgr spec <<<"$(aws kms describe-key --key-id "$k" \
    --query 'KeyMetadata.[KeyManager,KeySpec]' --output text)"
  if [ "$mgr" = "CUSTOMER" ] && [ "$spec" = "SYMMETRIC_DEFAULT" ]; then
    aws kms enable-key-rotation --key-id "$k"
  fi
done

Quick quiz

Question 1 of 5

Security Hub shows key findings across KMS.3, KMS.4 and KMS.5 at once. Which should you remediate first?

You can now treat KMS keys as one capability rather than a scatter of findings: stop any deletion countdown first, close public key policies and broad decrypt permissions by naming principals and scoping keys, turn rotation on and encrypt the audit trail, then ratchet the estate shut with Config rules, infrastructure-as-code defaults and Service Control Policies. The Controls this lesson covers section below links every control in this group to its deep page and fix.

Back to the library

Controls this lesson covers

One capability, many AWS Security Hub controls. This lesson is the shared playbook; each control below keeps its own deep page with the exact check, severity and a copy-and-paste fix.