Skip to main content
emnode / learn
Site Reliability

Extend RDS backup retention

Automated backups are on, but the retention window is too short — corruption or a bad change discovered after the window rolls off is unrecoverable.

11 min·10 sections·AWS

Last reviewed

RDS backup retention: the basics

Why a short retention window is a recovery you don't have

Amazon RDS automated backups give you point-in-time recovery (PITR) — the ability to restore a database to any second within a retention window, down to roughly the last five minutes. RDS does this by taking a daily storage-level snapshot and continuously shipping transaction logs to S3. The retention period is the length of that window, configurable from 1 to 35 days. This lesson assumes automated backups are already enabled (a separate lesson covers turning them on); the issue here is that the window is set too short.

The check flags instances whose BackupRetentionPeriod is below a recommended floor — typically 7 days, with 30–35 days expected for regulated data. A retention of 1 day means you can only rewind 24 hours. That sounds fine until you realise the failures that actually hurt — a bad migration, a silent corruption, an accidental bulk delete, a logic bug that slowly poisons data — are often discovered days after they happen. If the damage landed outside the window, the clean version of the data has already rolled off and there is nothing left to restore to.

It's flagged because retention length is the single dial that decides how far back you can recover, and the default people inherit from the launch wizard or a quick script is usually too small. The cost of widening it is near zero (backup storage is free up to your allocated database size), while the cost of discovering you can't recover is total. The fix is simply lengthening the window — and the change takes effect on the next backup, so it's worth doing before you need it.

In this lesson you'll learn how RDS automated backups and point-in-time recovery actually work — the daily snapshot, the continuous transaction-log shipping, and the 1-to-35-day retention window that bounds how far back you can restore. You'll see how to inventory which instances have a dangerously short window, the single AWS CLI flag that widens it, why the change applies on the next backup rather than retroactively, how automated backups differ from manual snapshots and from AWS Backup for retention beyond 35 days, and why production should default to at least 7 days and regulated data to 30–35.

Fun fact

The corruption that hid for nine days

A SaaS team shipped a migration with a subtle off-by-one that mis-linked a fraction of records on write. Nothing alerted — the app ran fine. Nine days later a customer report surfaced the corruption. The team reached for point-in-time recovery to restore to the moment before the migration, and discovered their prod database had been left at the 1-day default retention the launch wizard set years earlier. The clean state had rolled off eight days prior. There was no point in time to recover to. The fix afterward took one command and cost almost nothing in storage — --backup-retention-period 30 — but the data reconstruction took three engineers two weeks. The window you didn't widen is the recovery you don't have.

Extending a too-short retention window in action

Lena runs the reliability review at a fintech. The dashboard flags prod-ledger-db, a db.r6g.xlarge Postgres instance, with BackupRetentionPeriod: 1. Automated backups are on, but a one-day window means a problem found on a Friday that started Tuesday is unrecoverable. For a ledger database, that's an unacceptable exposure.

She confirms the scope first: describe-db-instances shows the retention at 1 day and the backup window set, so this is genuinely a short-retention finding, not a backups-disabled one. She checks the data classification — the Compliance=pci tag means this database is subject to a retention requirement well beyond a week, so 30 days is the target, not the 7-day floor.

Lena runs modify-db-instance with --backup-retention-period 30 --apply-immediately. There's no rebuild and no downtime — RDS simply begins retaining logs and snapshots for the longer window; the change takes effect from the next backup forward, so the window grows out to 30 days over the coming month. A re-scan clears the finding, and the ledger now has a month of point-in-time recovery for essentially no added cost.

First, find every RDS DB instance with a retention window below the 7-day floor — these are the short-retention findings.

$ aws rds describe-db-instances --query "DBInstances[?BackupRetentionPeriod < \`7\`].{Id:DBInstanceIdentifier,Class:DBInstanceClass,Engine:Engine,Retention:BackupRetentionPeriod}" --output table
------------------------------------------------------------------
| DescribeDBInstances |
+------------------+----------------+-----------+----------------+
| Id | Class | Engine | Retention |
+------------------+----------------+-----------+----------------+
| prod-ledger-db | db.r6g.xlarge | postgres | 1 |
| prod-auth-db | db.r6g.large | postgres | 3 |
| dev-sandbox-db | db.t3.medium | mysql | 1 |
+------------------+----------------+-----------+----------------+
# Two production databases can only rewind 1-3 days — too short to catch late-found corruption.

Short-retention inventory; cross-reference each Id against its data classification before choosing a target window.

For a production instance, widen the retention window. There's no rebuild and no downtime — the longer window takes effect from the next backup forward.

$ aws rds modify-db-instance --db-instance-identifier prod-ledger-db --backup-retention-period 14 --apply-immediately
{
"DBInstance": {
"DBInstanceIdentifier": "prod-ledger-db",
"DBInstanceStatus": "available",
"BackupRetentionPeriod": 1,
"PendingModifiedValues": {
"BackupRetentionPeriod": 14
}
}
}
# Retention flips to 14; the window grows out over the next two weeks as logs accumulate.

The --backup-retention-period flag widens the PITR window; it applies going forward, not retroactively, so do it before you need it.

RDS backup retention under the hooddeep dive

RDS automated backups are two mechanisms working together. Once a day, during the configured backup window, RDS takes a storage-volume snapshot of the database. Continuously, it streams the database's transaction logs to Amazon S3. Point-in-time recovery works by taking the most recent snapshot at or before your target time and replaying transaction logs forward to the exact second you ask for — typically restoring to any point from the start of the retention window up to about the last five minutes. The BackupRetentionPeriod (1–35 days) is simply how long both the snapshots and the logs are kept; set it to 0 and automated backups are disabled entirely.

A critical detail: changing the retention period applies going forward, not retroactively. Lengthening it from 1 to 14 days does not magically resurrect the 13 days you already discarded — RDS begins retaining new backups for 14 days, and the recoverable window grows to its full length over the next two weeks. This is why a short window is dangerous in a quiet way: by the time you realise you need more history, it's too late to add it. The widen-it-before-you-need-it ordering is non-negotiable.

Automated backups are distinct from two related tools. Manual snapshots are user-triggered, kept until you explicitly delete them, and are point-in-time copies — not a continuous PITR window. AWS Backup is a separate service for centralized, policy-driven retention that can hold RDS backups well beyond the 35-day automated-backup ceiling (months or years) for long-term and compliance archival. So the layering is: automated backups for the rolling PITR window (≤35 days), manual snapshots for ad-hoc checkpoints, and AWS Backup for retention beyond 35 days — see the companion lesson on protecting RDS with AWS Backup for the long-term tier.

# Inspect an instance's retention window and backup configuration.
aws rds describe-db-instances \
  --db-instance-identifier prod-ledger-db \
  --query 'DBInstances[0].{Retention:BackupRetentionPeriod,Window:PreferredBackupWindow,LatestRestorable:LatestRestorableTime}' \
  --output json

# Widen retention to 14 days; applies to backups taken from now forward.
aws rds modify-db-instance \
  --db-instance-identifier prod-ledger-db \
  --backup-retention-period 14 \
  --apply-immediately

# Confirm the new retention and how recent a point you can currently restore to.
aws rds describe-db-instances \
  --db-instance-identifier prod-ledger-db \
  --query 'DBInstances[0].[BackupRetentionPeriod,EarliestRestorableTime,LatestRestorableTime]' \
  --output text

What is the impact of a too-short retention window?

The headline impact is recoverability. A retention window bounds how far back point-in-time recovery can reach. At 1 day you can only rewind 24 hours; at 7 days, a week; at 35 days, the maximum. The failures that cause the worst damage — a bad migration, a slow data-corrupting bug, an accidental bulk delete, a compromised credential quietly altering rows — are frequently discovered days after they begin, not in real time. If the corruption landed outside the window, the last clean copy of the data has already rolled off and there is nothing to restore to. The recovery simply does not exist.

The second impact is that the gap is invisible until the worst possible moment. A short window produces no alarm and no degraded behaviour in normal operation — the database runs perfectly. The exposure only materialises during an incident, when the team reaches for PITR and finds the target time is before the earliest restorable point. And because lengthening retention applies only going forward, you cannot fix it reactively: the history you needed is gone the instant you try to add it.

The third impact is compliance. Many regulatory regimes — PCI-DSS, HIPAA, SOC 2, and various financial and data-protection rules — require that recoverable data be retained for a defined minimum, frequently 30 days or more. A database carrying regulated data with a 1- or 7-day window can be a control failure in its own right, independent of any actual data loss, surfacing as an audit finding. For these systems the retention number isn't just a reliability choice; it's a documented obligation.

The cost impact runs the other way and is why this is almost always worth fixing. RDS includes backup storage free up to 100% of the total allocated storage of the database; you only pay (roughly $0.095 per GB-month in US-East) for backup storage that exceeds the database's own size. For most databases, extending retention from 1 day to 7, 14, or even 30 days stays within the free allotment or adds a trivial amount. Unlike most reliability improvements, this one has essentially no price tag — the only thing standing between you and a longer recovery window is the decision to widen it.

How do you extend retention safely?

Remediation is a four-step loop: inventory the short-retention instances, set a target window per database against risk and compliance, widen the window, and prevent regressions so new databases launch correctly configured.

1. Inventory instances below the retention floor

Pull every RDS DB instance where BackupRetentionPeriod is below your floor (7 days is a common minimum; 0 means automated backups are off entirely — that's the separate enable-backups lesson, not this one). For each, capture the engine, allocated storage, current monthly cost, and most importantly the data-classification tag, so the next step can set the right target. Anything carrying regulated data needs special attention regardless of its current setting.

2. Set a target window per database

Decide per database against two inputs: how quickly you'd realistically detect a problem, and any compliance retention requirement. Production databases should clear at least 7 days; align it upward to your real detection lag if incidents tend to surface slowly. Regulated data (PCI, HIPAA, SOC 2, financial records) typically needs 30–35 days. Record the chosen window and its rationale (a BackupRetentionDays or RetentionTier tag works) so it's auditable and not re-litigated each scan.

3. Widen the window — and remember it's forward-only

Use modify-db-instance --backup-retention-period N. There's no downtime and no rebuild; RDS simply starts retaining backups for the longer period, and the recoverable window grows to full length over the following N days. Use --apply-immediately to start the clock now, or --no-apply-immediately to land it in the next maintenance window. The key discipline: do this before you need it, because lengthening retention never recovers history you've already discarded.

4. Prevent regressions and layer in AWS Backup for long-term needs

Make the floor a default: set backup_retention_period in your CloudFormation/Terraform modules and add an AWS Config rule (db-instance-backup-enabled checks retention) or SCP so new production databases can't launch below it. For retention beyond the 35-day automated-backup ceiling — long-term archival or multi-year compliance — use AWS Backup with a policy-driven plan rather than trying to stretch automated backups, which physically cannot exceed 35 days. See the companion lesson on protecting RDS with AWS Backup.

# 1. List instances below the 7-day floor (and not already disabled at 0).
aws rds describe-db-instances \
  --query "DBInstances[?BackupRetentionPeriod > \`0\` && BackupRetentionPeriod < \`7\`].DBInstanceIdentifier" \
  --output text

# 3. Widen a production instance to 14 days immediately (no downtime, forward-only).
aws rds modify-db-instance \
  --db-instance-identifier prod-ledger-db \
  --backup-retention-period 14 \
  --apply-immediately

# 3. Confirm the new retention took effect.
aws rds wait db-instance-available \
  --db-instance-identifier prod-ledger-db
aws rds describe-db-instances \
  --db-instance-identifier prod-ledger-db \
  --query 'DBInstances[0].BackupRetentionPeriod'

Quick quiz

Question 1 of 5

prod-ledger-db is flagged with BackupRetentionPeriod=1. A migration bug nine days ago corrupted some rows, just discovered today. What's the situation and the right next move?

You've completed Extend RDS backup retention. You now know how automated backups and point-in-time recovery work, why the 1-to-35-day retention window decides how far back you can recover, why a short default leaves you unable to undo problems found late, the exact modify-db-instance --backup-retention-period flag that widens it with no downtime, why the change is forward-only so you must do it before you need it, and where AWS Backup picks up beyond 35 days. Next time the check fires, you'll know to set a deliberate window matched to risk and compliance.

Back to the library