Compliance

Configure backups and retention

One capability across databases, tables, streams, file systems and snapshots: make sure every data store can be recovered to a recent point, and that no backup is shared with the public internet.

14 min·10 sections·AWS

Last reviewed 16 June 2026

Remediates AWS Security Hub: DocumentDB.2 DocumentDB.3 DynamoDB.2 DynamoDB.4 EC2.1 EC2.182 EFS.2 EFS.7 ElastiCache.1 Kinesis.3 Neptune.3 Neptune.5 RDS.1 RDS.11 RDS.26 RDS.50 Redshift.3

Backups and retention: the basics

What does "recoverable" actually mean across AWS data stores?

Recoverability is not one setting. Every AWS data store expresses it differently: RDS has a backup retention period in days that enables daily snapshots plus continuous transaction-log archiving for point-in-time recovery, DynamoDB has point-in-time recovery (PITR) that captures a rolling 35-day change log, DocumentDB and Neptune have their own retention windows, EFS has automatic backups, ElastiCache for Redis has automatic snapshots, Redshift has automated snapshots, and Kinesis has a stream retention period that defines how long records can be replayed. Each is the same idea wearing a different name: if something goes wrong, can you get the data back?

AWS Security Hub turns each of these into its own control, which is why a single estate can fail a dozen backup checks at once. RDS.1, RDS.11, RDS.26 and RDS.50 cover database and cluster backups, DynamoDB.2 and DynamoDB.4 cover PITR and backup plans, DocumentDB.2 and DocumentDB.3 cover cluster retention and snapshots, Neptune.3 and Neptune.5 cover automated backups, EFS.2 and EFS.7 cover file-system backups, ElastiCache.1 covers Redis snapshots, Redshift.3 covers automated snapshots, and Kinesis.3 covers stream retention. A second, related set, EC2.1, EC2.182 and the public-snapshot checks behind them, covers the other half of this capability: a backup that is shared publicly is a data leak, so EBS snapshots must not be exposed to all accounts.

The good news is that most of this is one decision repeated: turn recovery on, set a sensible window, and keep the resulting backups private. Most failures are drift, a database launched from a template with retention set to zero, a table created with PITR off (the default), a stream left at the 24-hour default, a snapshot copied with the wrong sharing flag. The job is to find every data store that cannot be recovered or whose backups are exposed, fix the production ones, and enforce a retention floor so new resources arrive protected.

In this lesson you will learn how AWS expresses recoverability across databases, tables, streams, file systems and snapshots, how to find every production data store that cannot be recovered or whose backups are exposed publicly, and how to fix them without spending on coverage that genuinely is not needed. The Controls this lesson covers section lists every Security Hub control in this capability, each linking to a deep page with the exact check and a copy-and-paste fix.

Fun fact

The retention period that quietly read zero

A team launched an RDS instance from a template they had copied from a tutorial, which set the backup retention period to zero so demo environments tore down instantly and for free. The instance graduated to staging, then quietly to production, carrying the zero the whole way. Eighteen months later a migration script dropped the wrong table, and there was no automated backup and no point-in-time recovery to fall back on: the most recent restore point was a manual snapshot someone had taken, by luck, six weeks earlier. The same trap shows up across the capability, DynamoDB PITR is off by default and only captures forward from the moment you enable it, and Kinesis streams default to 24 hours, so a consumer that fails on a Friday can outrun the buffer before anyone is back on Monday. Recovery has to be turned on before you need it, never after.

Finding unrecoverable data across an estate

Marco picks up a batch of backup findings during the weekly compliance triage. Security Hub shows failures spread across RDS, DynamoDB and Kinesis in three accounts, plus a couple of EBS snapshots flagged as shared publicly.

Rather than work the findings one by one, he starts with the data stores whose loss is most expensive, listing which RDS instances have backups disabled so he can separate the production databases that must be fixed from the dev instances that can be documented as exceptions, before changing anything.

Start with the resources whose loss is most expensive. RDS instances with a zero or below-floor retention period have no point-in-time recovery at all.

$ aws rds describe-db-instances --query 'DBInstances[?BackupRetentionPeriod<`7`].[DBInstanceIdentifier,BackupRetentionPeriod,Engine]' --output table

----------------------------------------------

| prod-orders | 0 | postgres |

| prod-billing | 3 | mysql |

| dev-scratch | 0 | mysql |

----------------------------------------------

# prod-orders has NO backups; prod-billing is below the 7-day floor; dev-scratch can stay.

Retention of 0 disables backups entirely; anything 1 to 6 fails the default 7-day floor. Fix the production databases first, then document the dev ones as exceptions.

How AWS keeps a data store recoverabledeep dive

Most recovery controls resolve to one of two mechanisms. The first is a retention window measured in time: RDS takes a daily snapshot and continuously archives transaction logs so it can replay to any second within the retention period, DynamoDB PITR captures a rolling 35-day change log, and Kinesis keeps records readable for a configurable window (24 hours by default, up to 8,760 hours). Set the window to zero or leave it at a low default and the safety net simply is not there. The second is an automatic backup job: EFS, ElastiCache for Redis, Redshift, DocumentDB and Neptune each take scheduled backups when the relevant setting is on. Security Hub reads the configured value directly (for example BackupRetentionPeriod on RDS, PointInTimeRecoveryStatus on DynamoDB, RetentionPeriodHours on Kinesis) and fails anything below the floor.

Two behaviours catch teams out. Backups only ever protect forward from the moment they are enabled: turning on DynamoDB PITR the day after an accident gives you one day of history, not 35, so the only safe time to enable recovery is at creation. And a restore is usually not an in-place rollback: DynamoDB and RDS point-in-time restores create a new resource from the chosen timestamp, with settings like auto scaling, TTL and tags not carried over, so the restore runbook matters as much as the setting.

The other half of this capability is keeping the backups themselves private. EBS and RDS snapshots can be shared with other accounts or with all accounts, and a snapshot shared with all accounts is a public copy of your data. The public-snapshot controls evaluate snapshot sharing attributes and fail any snapshot exposed to the public, which is why EBS snapshot block-public-access (EC2.182) is the strongest backstop: it prevents any snapshot in the account from being made public regardless of an individual sharing flag.

What is the impact of leaving data unrecoverable?

The direct impact is the absence of a rewind. With backups disabled or retention too short, there is no automated way to undo a bad migration, a buggy batch job, an accidental delete or a ransomware encryption event. The failure mode is silent and total: the data store works perfectly until the day someone needs to restore and discovers they cannot. For a production system this is the difference between a five-minute restore to just before the incident and an open-ended reconstruction effort with permanent gaps.

The second-order impact is blast radius across everything downstream. When events or records are lost, every system that derives from them, analytics dashboards, billing reconciliation, audit trails, ML training sets, inherits a gap that is often invisible until someone asks a question the data can no longer answer. A short retention window quietly raises the severity of every incident upstream of it.

On the compliance side, backup and recovery map directly to recognised frameworks: NIST 800-53 contingency-planning controls (CP-9, CP-10, SI-12), SOC 2, ISO 27001 and PCI DSS all expect production data to be recoverable and backups to be protected. A failing finding is documented audit evidence, and a publicly shared snapshot is a data-exposure incident in its own right. A clean, complete set of backup controls across every account is among the cheapest and most defensible artefacts you can hand an auditor.

How do you make data recoverable safely?

Work the capability as one loop rather than chasing individual findings. The order matters: fix the highest-impact production data stores first, mind the operational gotchas on the way, and enforce a retention floor so new resources arrive protected.

1. Inventory every data store and its recovery state

Across services, list each data store with its recovery setting: RDS BackupRetentionPeriod, DynamoDB PITR status, Kinesis RetentionPeriodHours, plus the automatic-backup flags on EFS, ElastiCache, Redshift, DocumentDB and Neptune. Separately, list every EBS and RDS snapshot shared publicly. Capture the environment tag for each so you can separate production (must fix) from genuinely disposable resources (document as exceptions). Read replicas are excluded from RDS.11 because their recovery follows the source instance, so do not waste effort on them.

2. Set a retention floor that matches the recovery requirement

Seven days is a sensible minimum for RDS and Kinesis, but match it to your real recovery point objective and any regulation: 35 days is the RDS automated maximum, 8,760 hours the Kinesis maximum, and longer horizons layer on manual snapshots or AWS Backup. For DynamoDB, PITR is simply on or off. Set the window once, correctly, per data classification rather than blindly applying one number everywhere.

3. Apply changes at the right time and un-share public snapshots

Mind the gotchas. Enabling RDS backups from a retention of zero can trigger a brief I/O pause for the first base snapshot, so apply it in the maintenance window unless the instance carries no traffic. DynamoDB PITR and Kinesis retention increases are instant and non-disruptive. For exposed backups, remove the public sharing from each snapshot and remember that recovery only protects forward, so enable it early rather than during an incident.

4. Ratchet it in with defaults and guardrails

Fix the source, not just the symptom. Set the retention floor and PITR-on in your CloudFormation and Terraform modules so new resources arrive protected, enable account-level EBS snapshot block-public-access so no snapshot can be made public again, and back the lot with AWS Config rules (for example db-instance-backup-enabled, dynamodb-pitr-enabled) so the posture cannot drift. For the resources you intentionally leave unprotected, record a documented exception rather than ignoring the finding.

# Set a 7-day backup floor on production databases below it (skip read replicas).
for db in $(aws rds describe-db-instances \
    --query 'DBInstances[?ReadReplicaSourceDBInstanceIdentifier==`null` && BackupRetentionPeriod<`7`].DBInstanceIdentifier' --output text); do
  aws rds modify-db-instance --db-instance-identifier "$db" \
    --backup-retention-period 7 --no-apply-immediately
done

# Turn on DynamoDB point-in-time recovery (instant, no downtime).
aws dynamodb update-continuous-backups --table-name prod-orders \
  --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true

# Stop any snapshot in the account from being shared publicly, ever.
aws ec2 enable-snapshot-block-public-access --state block-all-sharing

Quick quiz

Question 1 of 5

Security Hub shows backup failures across RDS, DynamoDB and Kinesis plus a publicly shared EBS snapshot. What is the most efficient way to think about them?

Keep learning

Go deeper on how recovery works across the services in this capability.

You can now treat backups and retention as one capability rather than a scatter of findings: inventory every data store's recovery state, set a retention floor that matches each system's recovery requirement, fix the production resources highest-impact first while keeping every snapshot private, and ratchet the estate shut with retention defaults, snapshot block-public-access and Config guardrails. The Controls this lesson covers section below links every control in this group to its deep page and fix.

Back to the library

Backups and retention: the cost and risk view

A near-free safety net that underwrites a very large recovery bill

Most of this capability is effectively free. RDS includes backup storage up to the size of the database at no charge and bills only the excess at a low rate, DynamoDB PITR adds a small per-gigabyte charge, and turning a public snapshot private costs nothing. The one place with a real, predictable price tag is extended Kinesis retention, which bills per shard-hour beyond the included 24 hours, but even there the number is usually single-digit to low-hundreds of dollars a month and is easy to forecast up front.

Frame each failing control as a risk position, not a cost saving. A production database with backups disabled, or a table with no PITR, is the cloud equivalent of cancelling the insurance to lower the premium: the line item goes down and the exposure goes up by orders of magnitude. The right response to a backup finding is almost never to optimise the spend down; it is to confirm the coverage exists everywhere it matters and to record any genuine exceptions (a disposable dev store) as deliberate, owned decisions.

The downside it protects against is the kind of event that does not show up on the cloud bill until it happens: engineering hours reconstructing records by hand, customer remediation, regulatory penalties for losing data you were obligated to retain, and audit findings that delay enterprise deals. Backup and recovery controls are table stakes in SOC 2, ISO 27001, PCI DSS and most customer security questionnaires, so a clean set across the estate protects the revenue pipeline as much as the data.

This lesson is for the finance partner who sees a cluster of backup findings on the security report, or a backup-storage line on the invoice, and wants to know what the right response is and what it costs. It covers why most of these controls are nearly free to fix, which one (Kinesis extended retention) carries a real but predictable charge, why backup storage is the wrong place to optimise, and how to turn a list of red findings into a risk-ordered plan with documented exceptions.

Fun fact

The retention period that quietly read zero

How a finance partner frames the backup decision

Priya is the finance partner reviewing the weekly compliance pack. Security Hub shows a dozen backup failures across RDS, DynamoDB and Kinesis in three accounts, plus a couple of EBS snapshots flagged as shared publicly. Her instinct is the opposite of her usual cost reflex: this is not a line to optimise down, it is coverage to confirm exists. Enabling recovery is nearly free on almost every service, so the spend question is narrow, only Kinesis extended retention carries a real per-shard-hour charge, and that is single-digit to low-hundreds of dollars a month and easy to forecast.

She asks the team to map each finding to the data behind it. A production orders database has backups disabled outright, a billing database sits below the seven-day floor, an analytics table has PITR off, and the dev-scratch instances can be documented as exceptions. The public snapshots are a separate, urgent class: a backup shared with all accounts is a public copy of the data, not a recoverability gap. Priya reframes the pack as a coverage position rather than a cost saving: 'Recovery is effectively free to switch on, and the dollars saved by leaving it off are trivial against an open-ended data-loss bill, engineering hours reconstructing records, customer remediation, regulatory penalties, stalled enterprise deals. The right response is 100% coverage on production data stores, zero public snapshots, and a recorded owner for any genuine dev exception.'

Why backups belong on the risk register

The cost model here is asymmetric in a way that is rare for security work. Enabling recovery is nearly free for almost every service (RDS includes backup storage up to the database size, DynamoDB PITR is a small per-gigabyte charge, making a snapshot private costs nothing), and the one charge with a real price tag, Kinesis extended retention, is predictable per shard-hour and easy to scope to the streams that matter. The dollars saved by leaving backups off are tiny; the downside they expose is open-ended.

The documented cost of a data-loss event includes engineering hours reconstructing records by hand, customer remediation and goodwill, regulatory penalties for losing data you were obligated to retain, and audit findings that delay certifications and stall enterprise deals. Against that, remediation is near-zero cost. The finance role is to take backup storage off the cost-cutting table, track recoverability as a coverage percentage rather than a dollar figure (the target is 100% of production data stores), and require a recorded, owned reason for any genuine exception rather than a silently suppressed finding.

What finance can actually do about backups

Finance cannot change a retention setting, but it owns the framing that makes recoverability a non-negotiable rather than a line to be optimised. Three levers, used at the regular cadence.

1. Take backup storage off the cost-cutting table

State explicitly that backup storage and recovery settings are not candidates for cost reduction. The free allowance covers most RDS backup storage anyway, and the small charges that remain buy recoverability no other line can replace. Removing it from the optimisation conversation prevents anyone disabling it to make a budget look tidier.

2. Track recoverability as a coverage percentage

Put a single recurring metric on the governance pack: the percentage of production data stores that are recoverable to the defined floor, with the target at 100%, plus a separate count of snapshots shared publicly (target zero). Because remediation cost is negligible, any gap is a pure governance issue, not a budget trade-off, which makes it an easy, unambiguous ask of engineering.

3. Scope the one real charge and tie it to compliance

For Kinesis extended retention, the only line with a real price tag, ask that the spend is concentrated on the streams that matter and forecast precisely (per-shard-hour rate times shards times extra hours). Then tie the whole capability to audit and deal readiness: a clean backup posture is part of substantiating the data-retention and resilience claims that gate enterprise deals, so the value is the revenue protected, not just the storage cost.

Quick quiz

Question 1 of 5

A remediation push leaves a small backup-storage line on the next invoice. As the finance partner, how should you treat it?

Keep learning

Go deeper on how recovery works across the services in this capability.

You have finished the finance view of backups and retention. You know the cost model is asymmetric: recovery is nearly free to enable, the only real charge (Kinesis extended retention) is predictable per shard-hour, and the downside of leaving backups off is an open-ended data-loss bill that never shows up on the invoice until it happens. Next time a backup finding lands, you will take backup storage off the cost-cutting table, track recoverability as a coverage percentage with a 100% target, and treat the spend as cheap insurance rather than a line to optimise down.

Back to the library

Backups and retention: the headline

Whether the business can undo a data-loss accident on any system that matters

Every data store in the estate can be configured to back itself up and keep a rolling window of history, so an accidental delete, a bad migration or a ransomware event becomes a quick restore instead of a permanent loss. The report shows this as a scatter of separate findings across databases, tables, streams and file systems, but the underlying question is one: if data is lost or corrupted tomorrow, can we get it back?

This is overwhelmingly a resilience and compliance issue, not a cost one. The cost of keeping recovery on is trivial for almost every service, and the cost of not having it is a data-loss or downtime event the business may not survive intact, plus a control failure auditors and customers will ask about. The healthy end state is simple to state: every production data store is recoverable to a defined window, no backup is shared publicly, and any exception is signed off in writing.

The leadership question is binary and one-minute: can we restore every production data store? A standing yes is cheap insurance. The first no is the most important thing on the page that day.

A short read for the leader who needs to know what missing backups expose, why fixing them is a governance decision rather than a budget one, and what a defensible end state looks like: every production data store recoverable to a defined window, no backup shared publicly, and exceptions on the record.

Fun fact

The retention period that quietly read zero

What it looks like when recoverability is a stated standard, not luck

After a migration script dropped the wrong table on a production database that, it turned out, had been carrying a backup retention period of zero inherited from a tutorial template, the only restore point was a manual snapshot someone had taken by luck six weeks earlier. The reconstruction effort ran for days. At the next review the CEO asked the binary question the incident had made unavoidable: can we restore every production data store, and is any backup shared publicly?

The honest answer at the time was no on both counts, because recoverability had been inherited from copied templates rather than set as a standard. The team's response was to declare a recoverability floor as policy: every production data store recoverable to a defined window, no backup shared publicly, and any exception signed off in writing by a named owner. They set the floor and PITR-on in the infrastructure-as-code modules so new resources arrive protected, turned on account-level EBS snapshot block-public-access as a backstop, and backed it with Config rules so the posture cannot drift. The next time the question came up, the answer was a standing yes on recoverability and a standing no on public snapshots, cheap insurance against the kind of event that does not show up on the bill until it happens. The shift from relying on whoever copied the template to a stated, owned standard is the one-line confidence signal this control delivers.

Why this is a board-level risk

A production data store that cannot be restored is a single point of catastrophic failure, and these controls surface every one of them. The reason it is tracked at the leadership level is not its cost, which is trivial, but its consequence: on a bad day an unrecoverable data store is the difference between a brief incident and a business-threatening data-loss event, plus a control failure that customers and auditors will hold against you.

It also reads as a discipline signal. A failing backup control almost always traces back to a copied template, an unowned resource, or a gap between the team that flags risks and the team that owns the fix, the same root causes behind larger resilience gaps that do not surface as cleanly. A clean set of backup controls, including no publicly shared snapshots, is cheap evidence that the basics of recoverability are taken seriously across the estate.

The leadership move on backups

The executive handle is not to chase individual findings. It is to set recoverability as a stated, owned standard with no silent exceptions.

1. Set a recoverability floor as policy

Declare that every production data store must be recoverable to at least a defined window, more where regulation demands, and that no backup may be shared publicly. A clear standard turns a scattered set of findings into a single, measurable expectation everyone understands.

2. Require written sign-off for any exception

Any production data store left unprotected needs a named owner accepting the risk in writing. Exceptions should be rare and reviewed. Most disappear the moment someone has to put their name to the decision rather than inherit it from a copied template.

3. Ask the binary question at the review

Can we restore every production data store, and is any backup shared publicly? That one-minute item tells you whether the data is protected without any technical depth. A standing yes (and no) is cheap insurance; the first failure is the most important thing on the page.

Quick quiz

Question 1 of 5

What is the one-minute binary question leadership should ask about this capability at a review?

Keep learning

Go deeper on how recovery works across the services in this capability.

Two takeaways. Backups are overwhelmingly a resilience and compliance issue, not a cost one: the cost of keeping recovery on is trivial, and the cost of not having it is a data-loss event the business may not survive intact. And the defensible end state is a stated, owned standard, every production data store recoverable to a defined window, no backup shared publicly, every exception signed off in writing, answerable by a one-minute binary question at the review.

Back to the library

Controls this lesson covers

One capability, many AWS Security Hub controls. This lesson is the shared playbook; each control below keeps its own deep page with the exact check, severity and a copy-and-paste fix.

Configure backups and retention

Backups and retention: the basics

The retention period that quietly read zero

Finding unrecoverable data across an estate

How AWS keeps a data store recoverabledeep dive

What is the impact of leaving data unrecoverable?

How do you make data recoverable safely?

1. Inventory every data store and its recovery state

2. Set a retention floor that matches the recovery requirement

3. Apply changes at the right time and un-share public snapshots

4. Ratchet it in with defaults and guardrails

Quick quiz

Keep learning

Backups and retention: the cost and risk view

The retention period that quietly read zero

How a finance partner frames the backup decision

Why backups belong on the risk register

What finance can actually do about backups

1. Take backup storage off the cost-cutting table

2. Track recoverability as a coverage percentage

3. Scope the one real charge and tie it to compliance

Quick quiz

Keep learning

Backups and retention: the headline

The retention period that quietly read zero

What it looks like when recoverability is a stated standard, not luck

Why this is a board-level risk

The leadership move on backups

1. Set a recoverability floor as policy

2. Require written sign-off for any exception

3. Ask the binary question at the review

Quick quiz

Keep learning

Controls this lesson covers

DocumentDB

DynamoDB

EC2

EFS

ElastiCache

Kinesis

Neptune

RDS

Redshift

Related compliance lessons