Skip to main content
emnode / learn
Compliance

Encrypt other services at rest (queues, streams, logs, ML)

One capability across the long tail of stateful services, SQS, Kinesis, EMR, OpenSearch, AWS Backup, API Gateway caches, Glue ML, CodeBuild reports and load-balancer backends, where data lands on disk or crosses a backend hop and needs to be encrypted rather than left in the clear.

14 min·10 sections·AWS

Last reviewed

Encrypting the long tail at rest: the basics

Why a dozen unrelated-looking services share one finding

Beyond the obvious databases and disks, a long tail of AWS services quietly writes data to disk or moves it across a backend hop, and each has its own at-rest control. SQS.1 covers message queues, Kinesis.1 covers data streams, EMR.3 covers the big-data fleet's scratch and EMRFS storage, ES.1 covers OpenSearch domain storage, Backup.1 covers AWS Backup recovery points, APIGateway.5 covers the per-stage REST API cache, Glue.3 covers a machine-learning transform's learned state, and CodeBuild.7 covers test-report exports. Two more in this group, ELB.21 and ELB.22, are about the load-balancer-to-target hop rather than storage, and ES.3 covers traffic between OpenSearch nodes. Different services, one theme: data that lands somewhere readable unless you turn encryption on.

They look like a scatter of unrelated findings, but they are one capability: prove that the queues, streams, caches, logs, recovery points and ML state across the estate are encrypted under a key you can govern, and that the backend hops these services rely on are not silently downgraded to plaintext. The bytes are protected with AES-256 envelope encryption keyed by KMS; the value is the key boundary, which lets you authorise, log and revoke access without touching the workload. Message bodies, stream records, cached responses and recovery points routinely carry PII, tokens and payment data that you would never knowingly leave readable on disk.

What makes this group easy in most cases is that the fix is a flip, not a migration: SQS, Kinesis, API Gateway caching, CodeBuild reports and Glue transforms enable encryption with a single attribute change and no downtime. Two members behave like the immutable database engines and need a recreate-and-migrate: OpenSearch (ES.1) and the snapshot-inheriting recovery points behind Backup.1, where an unencrypted source produces unencrypted backups. Knowing which member is a flip and which is a migration is half the work.

In this lesson you will learn how encryption at rest is expressed across the long tail of stateful AWS services, which members are a free single-flip fix and which two behave like immutable databases and need a recreate-and-migrate, the KMS request-cost trap on high-throughput SQS and Kinesis and the data-key reuse window that defuses it, and how to stop new unencrypted resources from ever appearing. The Controls this lesson covers section lists every Security Hub control in this capability, each linking to a deep page with the exact check and a copy-and-paste fix.

Fun fact

The default that arrived in 2023, for new resources only

In June 2023 AWS quietly switched SQS so that new queues are created with server-side encryption enabled by default. The catch: only new queues. Every queue created before that date, and there are hundreds of millions across AWS, kept its original setting. Most teams assume AWS encrypts everything by default these days, run a Security Hub scan, and discover their oldest, most business-critical, most data-bearing queues are the exact ones still running unencrypted. The same long-tail pattern repeats across this whole group: Kinesis streams, OpenSearch domains and EMR clusters spun up before encryption was a default habit are precisely the ones the report flags, because the resource that predates the default never gets revisited until something forces it.

Finding unencrypted stateful services across an estate

Marco runs platform engineering at a fintech. A quarterly Security Hub scan fires findings across SQS, Kinesis and OpenSearch, mostly resources created before the team's encryption-by-default policy, with the SOC 2 evidence package due Friday.

Rather than flip everything to a customer-managed key blindly (KMS charges per request, and the payment-events queue alone handles 40 million messages a day) he starts by listing which resources are unencrypted, so he can sort the free SSE flips from the high-throughput queues that need a tuned reuse window and the OpenSearch domains that need a migration.

Start with the queues. A queue with neither encryption attribute set is the SQS.1 failure; the high-throughput ones are the cost-sensitive ones.

$ for q in $(aws sqs list-queues --query 'QueueUrls[]' --output text); do echo "$q"; aws sqs get-queue-attributes --queue-url $q --attribute-names KmsMasterKeyId SqsManagedSseEnabled --query 'Attributes' --output json; done
.../payment-events
null
.../order-pipeline
{ "SqsManagedSseEnabled": "true" }
# payment-events is unencrypted and carries cardholder data: SSE-KMS with a tuned reuse window.

Inventory first: free SSE-SQS for the bulk, customer-managed keys only where regulated data justifies the audit trail and cost.

How these services encrypt at restdeep dive

Each service uses AES-256 envelope encryption keyed by KMS, but the shape of the setting differs. SQS and Kinesis wrap each message or record in a data key wrapped by a KMS key, with a choice between AWS-managed encryption (SSE-SQS, or the aws/kinesis key) and a customer-managed key that adds CloudTrail visibility and key control. EMR.3 is governed by a reusable EMR security configuration, the one template that decides at-rest encryption (EMRFS S3, local-disk LUKS and HDFS) for a whole fleet of clusters. API Gateway uses a per-method cacheDataEncrypted flag on a cache-enabled stage; Glue.3 encrypts a transform's learned state; CodeBuild.7 encrypts report-group exports to S3; Backup.1 reads the IsEncrypted flag on each recovery point. In every case, revoking the KMS key makes the data unreadable, which is the property the controls are really about.

Two behaviours catch teams out. First, encryption is forward-only on streams and queues: Kinesis StartStreamEncryption only encrypts records written after the call, so existing records stay plaintext until they age out of the retention window (a 7-day stream is not fully encrypted until a week later), and SQS re-encrypts lazily as consumers receive messages. Second, AWS Backup recovery points inherit the source's encryption state for EBS and RDS, so a KMS-protected vault is necessary but not sufficient: an unencrypted source volume produces an unencrypted recovery point regardless of the vault key, which is why Backup.1 can fail even on a perfectly configured vault.

The KMS request-cost trap lives with SSE-KMS at scale. Naively, every send and receive would trigger a KMS call, so SQS and Kinesis cache the data key for KmsDataKeyReusePeriodSeconds (default five minutes, up to 24 hours); within that window producers and consumers reuse the cached key with no KMS round-trip. A 40-million-message-a-day queue with a five-minute window makes roughly 288 KMS calls a day instead of 40 million, the difference between a few cents and thousands of dollars. With a customer-managed key, both producers and consumers also need kms:GenerateDataKey and kms:Decrypt on the key, in addition to their service permissions; forgetting that is the most common cause of the queue worked yesterday after enabling SSE-KMS. Two members in this group, ELB.21 and ELB.22, sit slightly apart: they cover the load-balancer-to-target transport and health-check hop, and ES.3 covers traffic between OpenSearch nodes, so they are about not downgrading a backend connection to plaintext rather than disk storage, but they belong to the same encrypt-everything-this-service-touches discipline.

What is the impact of leaving these services unencrypted?

The direct impact is data-at-rest exposure. Message bodies, stream records, cached API responses, ML state and recovery points sit on AWS storage protected by IAM but, without encryption, not at the storage layer. Any chain of failures that exposes that storage (a misconfigured cross-account policy, an internal AWS incident, a forensic capture during a legal hold, a CI runner with backup permissions) makes the contents readable. A single recovery point of a production database is an entire copy of customer data; an unencrypted stream of payment events is the same exposure in motion. Encryption removes the worst case.

The compliance impact is concrete and immediate. SOC 2, ISO 27001, HIPAA, PCI DSS and GDPR Article 32 all require encryption of data at rest, and auditors do not accept that IAM compensates: the requirement is explicit, and these are among the most common audit citations precisely because they are easy to verify and hard to argue. There is a breach-disclosure dimension too: most regimes reduce or waive mandatory notification when exposed data was encrypted, so turning encryption on materially shrinks the legal blast radius of any future incident.

The cost impact is small but asymmetric in two places. Most members are free to fix, so the real cost of leaving them open is reputational and procedural: a lingering encryption finding signals that the org's secure-by-default conventions are not enforced, the pattern auditors and security-conscious customers read as immaturity. The two exceptions are the KMS request charge on high-throughput SQS and Kinesis (bounded by the reuse window) and the migration effort for OpenSearch and the recovery points behind Backup.1, which grows with data volume the longer it is deferred.

How do you encrypt the long tail safely?

Work the capability as one loop rather than chasing individual service findings. The order matters: inventory and tier by data sensitivity, align IAM before any SSE-KMS flip, enable encryption (a flip for most, a migration for OpenSearch and Backup), then enforce defaults with Config rules and SCPs so the gap cannot reopen.

1. Inventory every resource and tier by data sensitivity

List queues, streams, EMR security configs, OpenSearch domains, recovery points, cache-enabled API stages, Glue transforms and CodeBuild report groups, with their current encryption state. Tier them: anything carrying PII, payment data or auth tokens goes to a customer-managed KMS key for the audit trail; everything else can take the free AWS-managed option. The tier should follow what a resource carries, not who happens to be running the sprint, and the classification doubles as the audit evidence.

2. Align IAM before any SSE-KMS flip on a busy resource

For SQS and Kinesis using a customer-managed key, update every producer and consumer role to include kms:GenerateDataKey and kms:Decrypt on the target key, and update the key policy to allow those principals. Deploy the IAM changes first, wait an IAM-propagation cycle, then enable encryption. Skipping this order means producers start failing with KMS.AccessDeniedException the moment you set the attribute.

3. Enable encryption: a flip for most, a migration for two

For SQS, Kinesis, API Gateway caching, Glue transforms, CodeBuild reports and the EMR security config, enable encryption with a single attribute change (set a sane KmsDataKeyReusePeriodSeconds on high-throughput streams to keep KMS cost flat). For OpenSearch (ES.1), encryption is create-only, so snapshot to S3, create a new domain with encryption on, restore and cut over. For Backup.1, fix the unencrypted source (an EBS or RDS instance) so future backups inherit encryption, and copy any existing unencrypted recovery point to a KMS-protected vault with start-copy-job, then delete the original. For ELB.21, ELB.22 and ES.3, switch the backend or inter-node protocol to its encrypted variant.

4. Prevent recurrence with Config rules and SCPs

Enable the AWS Config managed rules (sqs-queue-encrypted, kinesis-stream-encrypted, elasticsearch-encrypted-at-rest and the rest) so any new unencrypted resource is flagged within minutes. For prevention, attach an SCP that denies creating a queue or stream without an encryption attribute, and bake encryption defaults into your Terraform and CloudFormation modules. Engineers can still create resources, just not unencrypted ones, which converts a recurring cleanup into a one-time guardrail.

# 1. Bulk-enable free SSE-SQS on every unencrypted queue in the region.
for q in $(aws sqs list-queues --query 'QueueUrls[]' --output text); do
  state=$(aws sqs get-queue-attributes --queue-url $q \
    --attribute-names KmsMasterKeyId SqsManagedSseEnabled --query 'Attributes' --output text)
  [ -z "$state" ] && aws sqs set-queue-attributes --queue-url $q \
    --attributes '{"SqsManagedSseEnabled":"true"}' && echo "encrypted $q"
done

# 2. High-throughput stream: SSE-KMS with a 5-minute data-key reuse window to keep KMS cost flat.
aws kinesis start-stream-encryption --stream-name payment-events \
  --encryption-type KMS \
  --key-id arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab

# 3. Find unencrypted recovery points (Backup.1 reads IsEncrypted per recovery point, not per vault).
aws backup list-recovery-points-by-backup-vault --backup-vault-name prod-backups \
  --query 'RecoveryPoints[?IsEncrypted==`false`].[RecoveryPointArn,ResourceType]' --output table

# 4. Confirm an at-rest Config rule is evaluating so regressions are caught automatically.
aws configservice describe-compliance-by-config-rule --config-rule-names sqs-queue-encrypted \
  --query 'ComplianceByConfigRules[].Compliance.ComplianceType'

Quick quiz

Question 1 of 5

Security Hub shows at-rest encryption failures across SQS, Kinesis, OpenSearch and AWS Backup. What is the most efficient way to think about them?

You can now treat the long tail of stateful services as one encryption capability rather than a scatter of findings: inventory and tier by data sensitivity, align IAM before any customer-managed-key flip, enable encryption (a free flip for most, a migration for OpenSearch and the recovery points behind AWS Backup), tune the data-key reuse window so high-throughput streams stay cheap, and lock it with Config rules and SCPs so nothing regresses. The Controls this lesson covers section below links every control in this group to its deep page and fix.

Back to the library

Controls this lesson covers

One capability, many AWS Security Hub controls. This lesson is the shared playbook; each control below keeps its own deep page with the exact check, severity and a copy-and-paste fix.