Site Reliability

Extend RDS backup retention

Automated backups are on, but the retention window is too short — corruption or a bad change discovered after the window rolls off is unrecoverable.

11 min·10 sections·AWS

Last reviewed 27 May 2026

RDS backup retention: the basics

Why a short retention window is a recovery you don't have

Amazon RDS automated backups give you point-in-time recovery (PITR) — the ability to restore a database to any second within a retention window, down to roughly the last five minutes. RDS does this by taking a daily storage-level snapshot and continuously shipping transaction logs to S3. The retention period is the length of that window, configurable from 1 to 35 days. This lesson assumes automated backups are already enabled (a separate lesson covers turning them on); the issue here is that the window is set too short.

The check flags instances whose BackupRetentionPeriod is below a recommended floor — typically 7 days, with 30–35 days expected for regulated data. A retention of 1 day means you can only rewind 24 hours. That sounds fine until you realise the failures that actually hurt — a bad migration, a silent corruption, an accidental bulk delete, a logic bug that slowly poisons data — are often discovered days after they happen. If the damage landed outside the window, the clean version of the data has already rolled off and there is nothing left to restore to.

It's flagged because retention length is the single dial that decides how far back you can recover, and the default people inherit from the launch wizard or a quick script is usually too small. The cost of widening it is near zero (backup storage is free up to your allocated database size), while the cost of discovering you can't recover is total. The fix is simply lengthening the window — and the change takes effect on the next backup, so it's worth doing before you need it.

In this lesson you'll learn how RDS automated backups and point-in-time recovery actually work — the daily snapshot, the continuous transaction-log shipping, and the 1-to-35-day retention window that bounds how far back you can restore. You'll see how to inventory which instances have a dangerously short window, the single AWS CLI flag that widens it, why the change applies on the next backup rather than retroactively, how automated backups differ from manual snapshots and from AWS Backup for retention beyond 35 days, and why production should default to at least 7 days and regulated data to 30–35.

Fun fact

The corruption that hid for nine days

A SaaS team shipped a migration with a subtle off-by-one that mis-linked a fraction of records on write. Nothing alerted — the app ran fine. Nine days later a customer report surfaced the corruption. The team reached for point-in-time recovery to restore to the moment before the migration, and discovered their prod database had been left at the 1-day default retention the launch wizard set years earlier. The clean state had rolled off eight days prior. There was no point in time to recover to. The fix afterward took one command and cost almost nothing in storage — --backup-retention-period 30 — but the data reconstruction took three engineers two weeks. The window you didn't widen is the recovery you don't have.

Extending a too-short retention window in action

Lena runs the reliability review at a fintech. The dashboard flags prod-ledger-db, a db.r6g.xlarge Postgres instance, with BackupRetentionPeriod: 1. Automated backups are on, but a one-day window means a problem found on a Friday that started Tuesday is unrecoverable. For a ledger database, that's an unacceptable exposure.

She confirms the scope first: describe-db-instances shows the retention at 1 day and the backup window set, so this is genuinely a short-retention finding, not a backups-disabled one. She checks the data classification — the Compliance=pci tag means this database is subject to a retention requirement well beyond a week, so 30 days is the target, not the 7-day floor.

Lena runs modify-db-instance with --backup-retention-period 30 --apply-immediately. There's no rebuild and no downtime — RDS simply begins retaining logs and snapshots for the longer window; the change takes effect from the next backup forward, so the window grows out to 30 days over the coming month. A re-scan clears the finding, and the ledger now has a month of point-in-time recovery for essentially no added cost.

First, find every RDS DB instance with a retention window below the 7-day floor — these are the short-retention findings.

$ aws rds describe-db-instances --query "DBInstances[?BackupRetentionPeriod < \`7\`].{Id:DBInstanceIdentifier,Class:DBInstanceClass,Engine:Engine,Retention:BackupRetentionPeriod}" --output table

------------------------------------------------------------------

| DescribeDBInstances |

+------------------+----------------+-----------+----------------+

+------------------+----------------+-----------+----------------+

+------------------+----------------+-----------+----------------+

# Two production databases can only rewind 1-3 days — too short to catch late-found corruption.

Short-retention inventory; cross-reference each Id against its data classification before choosing a target window.

For a production instance, widen the retention window. There's no rebuild and no downtime — the longer window takes effect from the next backup forward.

$ aws rds modify-db-instance --db-instance-identifier prod-ledger-db --backup-retention-period 14 --apply-immediately

{

"DBInstance": {

"DBInstanceIdentifier": "prod-ledger-db",

"DBInstanceStatus": "available",

"BackupRetentionPeriod": 1,

"PendingModifiedValues": {

"BackupRetentionPeriod": 14

}

# Retention flips to 14; the window grows out over the next two weeks as logs accumulate.

The --backup-retention-period flag widens the PITR window; it applies going forward, not retroactively, so do it before you need it.

RDS backup retention under the hooddeep dive

RDS automated backups are two mechanisms working together. Once a day, during the configured backup window, RDS takes a storage-volume snapshot of the database. Continuously, it streams the database's transaction logs to Amazon S3. Point-in-time recovery works by taking the most recent snapshot at or before your target time and replaying transaction logs forward to the exact second you ask for — typically restoring to any point from the start of the retention window up to about the last five minutes. The BackupRetentionPeriod (1–35 days) is simply how long both the snapshots and the logs are kept; set it to 0 and automated backups are disabled entirely.

A critical detail: changing the retention period applies going forward, not retroactively. Lengthening it from 1 to 14 days does not magically resurrect the 13 days you already discarded — RDS begins retaining new backups for 14 days, and the recoverable window grows to its full length over the next two weeks. This is why a short window is dangerous in a quiet way: by the time you realise you need more history, it's too late to add it. The widen-it-before-you-need-it ordering is non-negotiable.

Automated backups are distinct from two related tools. Manual snapshots are user-triggered, kept until you explicitly delete them, and are point-in-time copies — not a continuous PITR window. AWS Backup is a separate service for centralized, policy-driven retention that can hold RDS backups well beyond the 35-day automated-backup ceiling (months or years) for long-term and compliance archival. So the layering is: automated backups for the rolling PITR window (≤35 days), manual snapshots for ad-hoc checkpoints, and AWS Backup for retention beyond 35 days — see the companion lesson on protecting RDS with AWS Backup for the long-term tier.

# Inspect an instance's retention window and backup configuration.
aws rds describe-db-instances \
  --db-instance-identifier prod-ledger-db \
  --query 'DBInstances[0].{Retention:BackupRetentionPeriod,Window:PreferredBackupWindow,LatestRestorable:LatestRestorableTime}' \
  --output json

# Widen retention to 14 days; applies to backups taken from now forward.
aws rds modify-db-instance \
  --db-instance-identifier prod-ledger-db \
  --backup-retention-period 14 \
  --apply-immediately

# Confirm the new retention and how recent a point you can currently restore to.
aws rds describe-db-instances \
  --db-instance-identifier prod-ledger-db \
  --query 'DBInstances[0].[BackupRetentionPeriod,EarliestRestorableTime,LatestRestorableTime]' \
  --output text

What is the impact of a too-short retention window?

The headline impact is recoverability. A retention window bounds how far back point-in-time recovery can reach. At 1 day you can only rewind 24 hours; at 7 days, a week; at 35 days, the maximum. The failures that cause the worst damage — a bad migration, a slow data-corrupting bug, an accidental bulk delete, a compromised credential quietly altering rows — are frequently discovered days after they begin, not in real time. If the corruption landed outside the window, the last clean copy of the data has already rolled off and there is nothing to restore to. The recovery simply does not exist.

The second impact is that the gap is invisible until the worst possible moment. A short window produces no alarm and no degraded behaviour in normal operation — the database runs perfectly. The exposure only materialises during an incident, when the team reaches for PITR and finds the target time is before the earliest restorable point. And because lengthening retention applies only going forward, you cannot fix it reactively: the history you needed is gone the instant you try to add it.

The third impact is compliance. Many regulatory regimes — PCI-DSS, HIPAA, SOC 2, and various financial and data-protection rules — require that recoverable data be retained for a defined minimum, frequently 30 days or more. A database carrying regulated data with a 1- or 7-day window can be a control failure in its own right, independent of any actual data loss, surfacing as an audit finding. For these systems the retention number isn't just a reliability choice; it's a documented obligation.

The cost impact runs the other way and is why this is almost always worth fixing. RDS includes backup storage free up to 100% of the total allocated storage of the database; you only pay (roughly $0.095 per GB-month in US-East) for backup storage that exceeds the database's own size. For most databases, extending retention from 1 day to 7, 14, or even 30 days stays within the free allotment or adds a trivial amount. Unlike most reliability improvements, this one has essentially no price tag — the only thing standing between you and a longer recovery window is the decision to widen it.

How do you extend retention safely?

Remediation is a four-step loop: inventory the short-retention instances, set a target window per database against risk and compliance, widen the window, and prevent regressions so new databases launch correctly configured.

1. Inventory instances below the retention floor

Pull every RDS DB instance where BackupRetentionPeriod is below your floor (7 days is a common minimum; 0 means automated backups are off entirely — that's the separate enable-backups lesson, not this one). For each, capture the engine, allocated storage, current monthly cost, and most importantly the data-classification tag, so the next step can set the right target. Anything carrying regulated data needs special attention regardless of its current setting.

2. Set a target window per database

Decide per database against two inputs: how quickly you'd realistically detect a problem, and any compliance retention requirement. Production databases should clear at least 7 days; align it upward to your real detection lag if incidents tend to surface slowly. Regulated data (PCI, HIPAA, SOC 2, financial records) typically needs 30–35 days. Record the chosen window and its rationale (a BackupRetentionDays or RetentionTier tag works) so it's auditable and not re-litigated each scan.

3. Widen the window — and remember it's forward-only

Use modify-db-instance --backup-retention-period N. There's no downtime and no rebuild; RDS simply starts retaining backups for the longer period, and the recoverable window grows to full length over the following N days. Use --apply-immediately to start the clock now, or --no-apply-immediately to land it in the next maintenance window. The key discipline: do this before you need it, because lengthening retention never recovers history you've already discarded.

4. Prevent regressions and layer in AWS Backup for long-term needs

Make the floor a default: set backup_retention_period in your CloudFormation/Terraform modules and add an AWS Config rule (db-instance-backup-enabled checks retention) or SCP so new production databases can't launch below it. For retention beyond the 35-day automated-backup ceiling — long-term archival or multi-year compliance — use AWS Backup with a policy-driven plan rather than trying to stretch automated backups, which physically cannot exceed 35 days. See the companion lesson on protecting RDS with AWS Backup.

# 1. List instances below the 7-day floor (and not already disabled at 0).
aws rds describe-db-instances \
  --query "DBInstances[?BackupRetentionPeriod > \`0\` && BackupRetentionPeriod < \`7\`].DBInstanceIdentifier" \
  --output text

# 3. Widen a production instance to 14 days immediately (no downtime, forward-only).
aws rds modify-db-instance \
  --db-instance-identifier prod-ledger-db \
  --backup-retention-period 14 \
  --apply-immediately

# 3. Confirm the new retention took effect.
aws rds wait db-instance-available \
  --db-instance-identifier prod-ledger-db
aws rds describe-db-instances \
  --db-instance-identifier prod-ledger-db \
  --query 'DBInstances[0].BackupRetentionPeriod'

Quick quiz

Question 1 of 5

prod-ledger-db is flagged with BackupRetentionPeriod=1. A migration bug nine days ago corrupted some rows, just discovered today. What's the situation and the right next move?

Keep learning

Go deeper on RDS automated backups, point-in-time recovery, the retention dial, and the AWS Backup tier for retention beyond 35 days.

You've completed Extend RDS backup retention. You now know how automated backups and point-in-time recovery work, why the 1-to-35-day retention window decides how far back you can recover, why a short default leaves you unable to undo problems found late, the exact modify-db-instance --backup-retention-period flag that widens it with no downtime, why the change is forward-only so you must do it before you need it, and where AWS Backup picks up beyond 35 days. Next time the check fires, you'll know to set a deliberate window matched to risk and compliance.

Back to the library

RDS backup retention: what it means for exposure

How far back you can recover, for a cost that rounds to zero

Amazon RDS automatically backs up production databases so the team can rewind to an earlier moment if something goes wrong. The retention period is how far back that rewind goes — anywhere from 1 to 35 days. This control flags databases where that window is set too short, commonly just a single day. Backups are on; the problem is they don't reach back far enough to be useful.

Why does the length matter? Because the kinds of failures that cause real damage — a faulty software change, data quietly corrupting, an accidental mass deletion — are frequently noticed days after they occur, not minutes. If the database can only rewind 24 hours, the last good copy of the data is already gone by the time anyone realises there's a problem. A longer window is the difference between "we restored and lost nothing" and "that data is unrecoverable."

The unusual thing here is the cost. Backup storage is free up to the size of the database itself, and only charged at roughly $0.095 per GB per month beyond that — so extending retention from 1 day to 7, 14, or 30 days typically costs little or nothing. This is one of the rare risk-reduction levers with almost no price tag. The right framing is exposure, not spend: a too-short window is an uninsured gap, and closing it is nearly free.

This lesson is for the finance partner who sees a small "backup storage" line on the RDS bill and needs to understand what the retention window buys and what it costs. It explains why retention length is really a recovery-exposure dial, why widening it is nearly free (backups are free up to the database size), how to think about the right window for production versus regulated data, and the governance levers — a documented retention standard, tracking instances below the floor, and a standing review — that keep the exposure closed. No commands required.

Fun fact

The corruption that hid for nine days

How a finance partner frames the retention decision

Dana is the finance partner working with the platform team. At the monthly review the reliability dashboard shows nine databases below the 7-day retention floor — nine recovery windows too short to be useful. Rather than treat it as a cost question, Dana asks the exposure question: how far back can we actually recover each of these, and is that enough given how quickly we'd realistically notice a problem?

The team sorts them by data classification. Three carry regulated data with a documented retention requirement, so those go straight to 30 days. The rest are production databases that should sit at the 7-day floor or a little more. Dana confirms the key fact that makes this easy to approve: backup storage is free up to the size of each database, so widening every one of these windows costs essentially nothing — there's no 2x premium, no variable usage surprise, just a near-zero line item.

Dana's takeaway for the finance pack is one line: "Every production database now has a recovery window matched to its risk and its compliance obligations, at no meaningful cost." The next time someone asks "if we corrupt data, how far back can we recover?", the answer is a recorded number per database — not a guess, and not the launch-wizard default nobody chose.

Why this matters to the risk register, not the bill

The cost side of this control is almost a non-event, which is what makes it unusual. RDS bundles backup storage free up to the size of the database itself, and only charges (around $0.095 per GB-month) for backup storage beyond that. Extending a retention window from 1 day to 7, 14, or 30 days usually stays inside the free allotment, so for most databases the spend impact rounds to zero. There's no 2x premium and no variable usage component to forecast.

Because the cost is negligible and the benefit is binary — either you can recover to the moment before the problem or you can't — this isn't a tiering-by-spend decision the way some resilience controls are. The finance contribution is to insist the window is set deliberately against business risk and compliance obligation, not left at a default. A simple standard — production at least 7 days, regulated data 30–35 — closes the exposure at no meaningful cost.

On the risk register, a too-short window is a quantifiable, uninsured exposure: the value of the data that becomes unrecoverable if a problem is found late, against a backup-storage cost that is usually zero. Framed that way, there is no trade-off to weigh — widening the window is free insurance against an event whose cost can be catastrophic. The finding is simply the trigger to set the number on purpose.

Finally, treat the compliance dimension as first-class. For databases subject to a retention requirement, a short window can be an audit finding even if no data is ever lost. A clean picture is one where every regulated database meets or exceeds its mandated retention and every production database clears the floor — both verifiable, both defensible at audit, and both achieved without a budget conversation.

What finance can actually do about retention

Finance can't run modify-db-instance, but it owns the standard that turns retention from an inherited default into a deliberate, near-free risk decision. Four levers, used at the regular cadence.

1. Set a documented retention standard

Agree a simple rule with engineering: production databases retain at least 7 days, regulated data 30–35. Because backup storage is free up to the database size, this is a standard you can mandate without a budget impact — making it the default removes the per-database debate and closes the exposure by policy rather than by chance.

2. Track instances below the floor as an exposure, not a cost

Put the count of databases below the retention floor on the reliability review — framed as recovery exposure, not spend, since the spend is near zero. The number that matters is production and regulated databases under the floor; a short window on a throwaday dev box is fine. This keeps the conversation on uninsured risk.

3. Require a recorded reason for any exception

Any production or regulated database left below the standard should carry a documented, finance-visible justification, not a silently short window. That converts "nobody set it" into "we made a decision," which is what survives an audit or a post-incident review — and given the near-zero cost, genuine exceptions should be rare.

4. Tie the regulated tier to the compliance obligation

For databases under PCI, HIPAA, SOC 2, or financial-records rules, the retention window is often a stated control requirement, not a judgment call. Treat meeting it as mandatory and verify it on the same cadence as other compliance evidence, so a short window never becomes an audit finding for the sake of a free configuration change.

Quick quiz

Question 1 of 5

A scan shows nine databases below the 7-day retention floor: three carry regulated data, six are production. As the finance partner, what's the right approach?

Keep learning

Go deeper on RDS automated backups, point-in-time recovery, the retention dial, and the AWS Backup tier for retention beyond 35 days.

You've finished the finance partner's view of RDS backup retention. You know the window length is really a recovery-exposure dial, that widening it is essentially free because backups are free up to the database size, why it's a deliberate decision rather than a spend trade-off, and the four levers — a documented standard, tracking instances below the floor as exposure, recorded exceptions, and tying the regulated tier to compliance — that close the gap at no meaningful cost. Next time it shows up, you'll have a sharper question than "what does this add to the bill?"

Back to the library

RDS backup retention: the headline

Whether you can actually recover when a problem is found late

Databases are backed up automatically, but the backups only reach back as far as the retention window is set — and that window is often left at its short default. If a data problem is discovered a few days after it started, a short window means the clean version is already gone and the loss is permanent. This control flags databases whose recovery window is too short to be safe.

This is a business-continuity decision with almost no cost attached. Extending the window is essentially free, while a too-short window signals that recovery capability has been left to a default rather than chosen deliberately. The healthy end state is a retention period matched to how quickly problems are realistically detected — at least a week for production, longer where compliance requires it — recorded as a decision, not an accident.

A short read for the executive who wants to know what this control protects and the one question to ask. You'll get the plain-English version of why a backup window that's too short means problems found late can't be undone, why extending it is essentially free, and what "good" looks like: a retention period chosen to match how fast problems are detected, longer where regulation demands it, and recorded as a deliberate decision. No implementation detail.

Fun fact

The corruption that hid for nine days

What it looks like when the org gets this right

At one company the CTO, Priya, used to get a hand-wavy "yes, we have backups" whenever she asked about data recovery. The honest follow-up question — "how far back?" — had no clear answer, because retention had been left at whatever default each database was created with. Some could rewind a month; some could only rewind a day.

After adopting backup retention as a tracked control, the answer changed shape: a one-page view showing every production database's recovery window, with regulated systems at 30 days and the rest at a deliberate floor. The point Priya cared about wasn't a green checkmark — it was that the recovery window had been chosen to match how long a problem might hide before anyone noticed, and written down.

That's the right end state for this control: not "maximum retention everywhere," but "every database's recovery window is a recorded decision matched to its risk." And because extending the window is essentially free, there was no cost trade-off to debate — just a default to fix and an exception list to keep honest.

Why this is on the report at all

This control answers a question every executive eventually faces after an incident: "if our data was corrupted or deleted, how far back can we recover?" A too-short window means the honest answer is "only as far as the problem was found quickly" — and the most damaging problems are exactly the ones found slowly. RDS backup retention turns that abstract recoverability question into a concrete, tracked number per database.

The reason it's a leadership-visible item despite being nearly free is precisely that it's nearly free. There is no cost trade-off to debate, so a too-short window signals only one thing: recovery capability was left to a default rather than chosen. The healthy signal is that every production database clears a deliberate floor, every regulated database meets its mandated retention, and the window length is a recorded decision matched to how quickly problems are realistically detected — resilience by policy, not by accident.

The leadership move on retention

The executive handle isn't about cost — it's about ensuring every database's recovery window is a deliberate decision matched to its risk, given the change is essentially free.

1. Set a default floor for production

Make it policy that any production database retains backups for at least a week, longer where data is regulated. Because extending the window costs almost nothing, there's no reason for the floor to be optional — a clear default ensures the systems that matter can be recovered even when a problem is found late.

2. Match the window to detection lag

The right window isn't "as long as possible" — it's "longer than the time it realistically takes to notice a problem." Ask the team how long a data issue might hide before anyone catches it, and set retention comfortably beyond that. That reasoning, recorded, is what turns the number into a decision.

3. Treat a short window as a signal, not a spend item

Because widening is free, a too-short window on a critical database signals only that recovery was left to a default. At the leadership review the one-line question is "does every production-critical and regulated database have a recovery window matched to its risk?" A consistent yes is a one-minute confidence signal.

Quick quiz

Question 1 of 5

Your reliability review shows every production-critical database with a recovery window matched to its detection lag, and regulated databases at 30 days, all recorded. What's the right read?

Keep learning

Go deeper on RDS automated backups, point-in-time recovery, the retention dial, and the AWS Backup tier for retention beyond 35 days.

That's the lesson. Two takeaways: a too-short retention window means problems found late can't be undone, and extending it is essentially free, so it's a decision left to default rather than a cost trade-off. The leadership question is whether every production-critical and regulated database has a recovery window matched to its risk and recorded as a choice — not whether the number is at maximum.

Back to the library

Part of the learning path Build in resilience

Extend RDS backup retention

RDS backup retention: the basics

The corruption that hid for nine days

Extending a too-short retention window in action

RDS backup retention under the hooddeep dive

What is the impact of a too-short retention window?

How do you extend retention safely?

1. Inventory instances below the retention floor

2. Set a target window per database

3. Widen the window — and remember it's forward-only

4. Prevent regressions and layer in AWS Backup for long-term needs

Quick quiz

Keep learning

RDS backup retention: what it means for exposure

The corruption that hid for nine days

How a finance partner frames the retention decision

Why this matters to the risk register, not the bill

What finance can actually do about retention

1. Set a documented retention standard

2. Track instances below the floor as an exposure, not a cost

3. Require a recorded reason for any exception

4. Tie the regulated tier to the compliance obligation

Quick quiz

Keep learning

RDS backup retention: the headline

The corruption that hid for nine days

What it looks like when the org gets this right

Why this is on the report at all

The leadership move on retention

1. Set a default floor for production

2. Match the window to detection lag

3. Treat a short window as a signal, not a spend item

Quick quiz

Keep learning

Related site reliability lessons