Site Reliability

Enable automated backups on RDS

A retention period of 0 disables automated backups entirely — and with them point-in-time recovery, leaving the database one bad query or deploy away from unrecoverable data loss.

12 min·10 sections·AWS

Last reviewed 27 May 2026

RDS automated backups: the basics

Why a retention period of 0 is a silent recovery gap

Amazon RDS has a built-in safety net called automated backups. When it's on, RDS takes a full daily snapshot during a configured backup window and continuously streams the database's transaction logs to S3-backed storage. Together those two things let you restore the database to any single second within the retention window — that's point-in-time recovery (PITR). The retention window is controlled by one setting, BackupRetentionPeriod, which can be anywhere from 1 to 35 days.

Setting BackupRetentionPeriod to 0 turns the entire mechanism off. No daily snapshot, no transaction-log capture, and — the part people miss — no point-in-time recovery at all. The check flags any RDS DB instance running with a retention period of 0. It's almost always an accident: a value left at zero in a Terraform module, a copy-pasted launch config, or a deliberate choice for a throwaway database that quietly graduated into something production depends on.

It's flagged because the failure mode is invisible until the worst possible moment. The database runs fine for months. Then a bad migration, a DELETE without a WHERE, a corrupted write, or a fat-fingered deploy mangles the data — and there is nothing to roll back to. With backups off, recovery isn't slow; it's impossible. The protection costs almost nothing, which makes leaving it off one of the cheapest mistakes to fix and most expensive to discover.

In this lesson you'll learn how RDS automated backups actually work — the daily snapshot plus continuous transaction-log capture that together enable point-in-time recovery to any second in the window — and why a retention period of 0 silently removes that capability. You'll see how to inventory which instances have backups disabled, the exact AWS CLI flag to enable them, what the change does to a running instance (no downtime, with a one-time backup), why the backup window has negligible performance impact (especially on Multi-AZ, where the backup runs against the standby), the 1–35 day retention range and what drives the choice, the near-zero cost (free up to the DB's allocated size), and how this differs from a centralized AWS Backup plan.

Fun fact

The 30-second rewind that saved a Friday deploy

A growth-stage SaaS team shipped a migration on a Friday afternoon that accidentally ran an UPDATE against the wrong table, blanking a column on two million customer rows. Panic — until someone remembered the production database had a 14-day backup retention. They spun up a point-in-time restore to the second just before the migration ran, copied the good column back, and were whole again inside an hour. The total backup storage bill that had bought them that escape hatch: $0 — the database was smaller than its own allocated storage, so the backups were entirely within the free tier. A sibling service that had been launched with retention set to 0 wasn't so lucky six months later.

Enabling RDS backups in action

Priya runs the reliability cadence at a mid-sized fintech. The dashboard flags payments-ledger-db, a db.r6g.large Postgres instance, with BackupRetentionPeriod: 0 — automated backups completely off. It's been that way since launch; the Terraform module it was created from had the retention default left at zero and nobody noticed. This is the database behind settlement, so a corrupting write with no recovery path would be a reportable incident.

She confirms the scope first: describe-db-instances shows BackupRetentionPeriod: 0 and a Multi-AZ deployment, which means enabling backups will take the daily snapshot from the standby with zero impact on the primary. She picks a 7-day retention to start (the team's standard for transactional data) and a 03:00-04:00 UTC backup window that sits in the service's quiet hours.

Priya runs modify-db-instance with --backup-retention-period 7, the window, and --apply-immediately. RDS takes one immediate full backup and begins streaming transaction logs; the instance stays online and serving the whole time. A few minutes later the change settles, point-in-time recovery is live to any second in the trailing week, and the next scan flips the finding to resolved. The ledger went from zero recovery options to a 7-day rewind window — at no measurable cost, because the backup storage fits inside the database's free allocation.

First, find every RDS DB instance with automated backups disabled — these are the instances running with no point-in-time recovery.

$ aws rds describe-db-instances --query "DBInstances[?BackupRetentionPeriod==\`0\`].{Id:DBInstanceIdentifier,Class:DBInstanceClass,Engine:Engine,Retention:BackupRetentionPeriod,MultiAZ:MultiAZ}" --output table

-----------------------------------------------------------------------------

| DescribeDBInstances |

+--------------------+---------------+----------+-----------+--------------+

+--------------------+---------------+----------+-----------+--------------+

+--------------------+---------------+----------+-----------+--------------+

# Two production databases with no recovery path; the scratch DB is genuinely throwaway.

Retention-zero inventory; cross-reference each Id against its data class before deciding which must be enabled.

For a data-bearing instance, enable automated backups with a sensible retention window and an off-peak backup window. The instance stays online throughout.

$ aws rds modify-db-instance --db-instance-identifier payments-ledger-db --backup-retention-period 7 --preferred-backup-window 03:00-04:00 --apply-immediately

{

"DBInstance": {

"DBInstanceIdentifier": "payments-ledger-db",

"DBInstanceStatus": "available",

"BackupRetentionPeriod": 0,

"PreferredBackupWindow": "03:00-04:00",

"PendingModifiedValues": {

"BackupRetentionPeriod": 7

}

# RDS takes one immediate backup, then continuous log capture begins; PITR live within minutes.

The --backup-retention-period flag turns backups on; the first enable triggers a one-time backup but no downtime.

RDS automated backups under the hooddeep dive

An automated backup is two cooperating mechanisms. Once a day, during the PreferredBackupWindow, RDS takes a storage-level snapshot of the volume — incremental after the first, so only changed blocks are copied. Separately and continuously, RDS uploads the database's write-ahead/transaction logs to Amazon S3 every few minutes. To restore to a specific moment, RDS provisions a new instance from the most recent daily snapshot before that moment and then replays the transaction logs forward to the exact second you asked for. That replay is what makes point-in-time recovery possible, and it's why setting BackupRetentionPeriod to 0 — which stops the log capture — removes PITR entirely, not just the daily snapshot.

The performance cost of the backup window is small and bounded. On a single-AZ instance there can be a brief I/O suspension at the instant the daily snapshot is initiated; on a Multi-AZ instance RDS takes the snapshot from the standby, so the primary sees no impact at all. Continuous log shipping runs in the background and is negligible. This is why the standard advice is to set a backup window in off-peak hours rather than to avoid backups — the protection is effectively free of both dollars and performance.

On cost: AWS provides backup storage free of charge up to the total allocated storage of the source database. Only backup storage that exceeds the database's allocated size is billed, at roughly $0.095 per GB-month in US-East-1. So a 200 GB database with 7-day retention whose backups stay under 200 GB pays nothing; retention only starts costing money once the accumulated snapshots plus logs exceed the live database size. Note that automated backups are deleted when the instance is deleted (unless you take a final manual snapshot) and are distinct from — and complementary to — a centralized AWS Backup plan, which manages cross-service, cross-account, and longer-term retention policies on top of the per-instance setting.

# Inspect a single instance's backup posture and the window it uses.
aws rds describe-db-instances \
  --db-instance-identifier payments-ledger-db \
  --query 'DBInstances[0].{Retention:BackupRetentionPeriod,Window:PreferredBackupWindow,MultiAZ:MultiAZ,LatestRestorable:LatestRestorableTime}' \
  --output json

# Enable backups during the next maintenance window instead of immediately.
aws rds modify-db-instance \
  --db-instance-identifier payments-ledger-db \
  --backup-retention-period 7 \
  --preferred-backup-window 03:00-04:00 \
  --no-apply-immediately

# After enabling, confirm point-in-time recovery is live: LatestRestorableTime
# advances continuously once transaction-log capture is running.
aws rds describe-db-instances \
  --db-instance-identifier payments-ledger-db \
  --query 'DBInstances[0].[BackupRetentionPeriod,LatestRestorableTime]' \
  --output text

What is the impact of disabled RDS backups?

The headline impact is unrecoverable data loss. With BackupRetentionPeriod at 0 there is no daily snapshot and no transaction-log capture, so there is nothing to restore from. A bad migration, an application bug that corrupts rows, a DELETE or UPDATE run against production by mistake, or storage-level corruption all become terminal events: the data is simply gone. This is categorically different from a slow recovery — it's the absence of any recovery path, and it's invisible right up until the moment you need it.

The second impact is the loss of point-in-time recovery specifically. Even teams that keep occasional manual snapshots lose the ability to rewind to an arbitrary second when backups are off. Manual snapshots are point-in-time photographs taken whenever someone remembers; automated backups with PITR let you restore to the instant just before a known-bad event. The difference between "restore to last night's snapshot" and "restore to 14:42:09 just before the bad deploy" is often the difference between losing a day of customer transactions and losing none.

The third impact is compliance, audit, and insurance exposure. Most data-protection frameworks, customer contracts, and cyber-insurance policies assume recoverable backups exist for systems holding regulated or business-critical data. A production database with backups disabled is a finding waiting to happen: it can fail an audit, breach a data-handling commitment, or void an insurance claim at exactly the moment a payout matters. The technical gap and the contractual gap are the same gap.

The cost impact, unusually, runs almost to zero — which is what makes a disabled backup so hard to justify. Backup storage is free up to the database's own allocated size and only a few cents per GB-month beyond that, and the backup window has negligible performance cost (none at all on Multi-AZ). There is no meaningful saving from leaving backups off; the only outcomes are an uninsured liability if the database holds real data, or a trivial non-issue if it's a genuine throwaway. That asymmetry — unbounded downside, near-zero cost to remove it — is why disabling backups is almost never the right call outside truly ephemeral databases.

How do you enable RDS backups safely?

Remediation is a four-step loop: inventory the instances with backups off, set a retention window matched to each database's data class, enable backups with an off-peak window, and prevent regressions so new databases launch protected.

1. Inventory every instance with BackupRetentionPeriod at 0

Pull every RDS DB instance where BackupRetentionPeriod equals 0, across every region and account. For each, capture engine, instance class, Multi-AZ status, allocated storage, and an environment or data-class tag. Anything holding customer, financial, or otherwise real data is a must-fix; only genuinely disposable test, CI, or scratch databases are candidates to stay off. The Multi-AZ flag tells you whether enabling will have any primary-side impact (it won't, if Multi-AZ is true).

2. Choose a retention window matched to the data class

Retention can be 1 to 35 days. Pick it from how far back the business might realistically need to recover, not from cost (which is negligible either way). Transactional and customer-facing databases commonly sit at 7–14 days; systems with stricter recovery or compliance needs go higher, up to the 35-day maximum. Record the chosen window against each instance (a BackupRetentionDays or data-class tag works well) so the standard is auditable and not re-decided every scan.

3. Enable backups with an off-peak backup window

Use modify-db-instance --backup-retention-period N --preferred-backup-window HH:MM-HH:MM. The instance stays online and serving; the only side effect is one immediate full backup when the feature is first turned on. Put the window in the service's quiet hours — on Multi-AZ instances the backup runs against the standby so the primary is untouched, and on single-AZ instances the brief I/O pause lands outside peak. Use --apply-immediately for urgent production gaps, or --no-apply-immediately to let it settle in the next maintenance window.

4. Prevent regressions and layer in centralized backup where needed

Make non-zero retention the default in provisioning templates (CloudFormation/Terraform module defaults) and enforce it with an AWS Config rule (db-instance-backup-enabled) or an SCP so new production databases can't launch with backups off. For cross-account, cross-region, or longer-than-35-day retention, add a centralized AWS Backup plan on top — it complements per-instance automated backups rather than replacing them. For the rare database you intentionally leave unbacked, record a documented exception so the dashboard reflects deliberate decisions, not blind spots.

# 1. List every instance running with automated backups disabled.
aws rds describe-db-instances \
  --query 'DBInstances[?BackupRetentionPeriod==`0`].DBInstanceIdentifier' \
  --output text

# 3. Enable a 7-day window on a production instance, off-peak, immediately (no downtime).
aws rds modify-db-instance \
  --db-instance-identifier payments-ledger-db \
  --backup-retention-period 7 \
  --preferred-backup-window 03:00-04:00 \
  --apply-immediately

# 3. Confirm point-in-time recovery is live (LatestRestorableTime advances).
aws rds wait db-instance-available \
  --db-instance-identifier payments-ledger-db
aws rds describe-db-instances \
  --db-instance-identifier payments-ledger-db \
  --query 'DBInstances[0].[BackupRetentionPeriod,LatestRestorableTime]' \
  --output text

Quick quiz

Question 1 of 5

A scan flags payments-ledger-db, a standalone Postgres instance backing live settlement, with BackupRetentionPeriod=0. What's the right next move?

Keep learning

Go deeper on RDS automated backups, point-in-time recovery, the pricing that makes them nearly free, and how centralized AWS Backup layers on top.

You've completed Enable automated backups on RDS. You now know how the daily snapshot and continuous transaction-log capture combine to give you point-in-time recovery to any second in the window, why a BackupRetentionPeriod of 0 silently removes that capability entirely, the exact modify-db-instance --backup-retention-period flag that fixes it with no downtime, why the backup window has negligible performance cost (none on Multi-AZ), and why the near-zero cost makes leaving backups off almost never the right call outside true throwaways. Next time the check fires, you'll know whether to enable the window or record a deliberate exception.

Back to the library

RDS automated backups: what it means for risk and exposure

Near-zero cost protection against an unbounded data-loss liability

Amazon RDS is the managed database behind many applications. It offers a built-in backup feature that lets the team rewind the database to any moment in the recent past — useful when a bad change deletes or corrupts data. That feature is controlled by a single "retention" setting measured in days. When it's set to zero, the feature is switched off entirely, and there is no way to recover the data if something goes wrong.

The important thing for finance to understand is the cost shape: this protection is essentially free. AWS includes backup storage at no charge up to the size of the database itself, and only bills a few cents per gigabyte per month for anything beyond that. So unlike most resilience controls, there is no meaningful trade-off to weigh — the cost of the protection is near zero, while the exposure it covers (losing production data with no way to get it back) is potentially unbounded: lost revenue, breached contracts, regulatory and audit findings, and in many cases a voided cyber-insurance claim.

The right framing is exposure, not spend. A database with backups disabled is an uninsured liability sitting on the books — and because the premium is effectively zero, there is no defensible reason to carry it on anything that holds real data. The value of this control is that it surfaces those uninsured databases before an incident does, while the fix is still a one-line change rather than a board-level conversation.

This lesson is for the finance partner who sees an RDS line on the cloud bill and wants to understand what "backups" cost and what they protect. It explains why this is a near-free protection rather than a cost trade-off, how the storage is billed (free up to the database's own size), what data-loss exposure a disabled backup represents, and the governance levers — coverage tracking, retention standards by data class, and a recorded exception process — that keep the risk visible at the operational review. No commands required.

Fun fact

The 30-second rewind that saved a Friday deploy

How a finance partner frames the backup gap

Dana is the finance partner working with the platform team. At the operational review the reliability dashboard shows nine databases with automated backups disabled. Dana's instinct is to ask about the cost of fixing them — but the engineering lead explains there essentially isn't one: backup storage is free up to each database's own size, so turning it on adds nothing material to the bill.

That reframes the conversation entirely. There's no trade-off to weigh, so the only question is exposure. Dana asks which of the nine hold real customer or financial data — seven do. Those seven are uninsured liabilities: if any of them takes a bad write, the data is gone, and the firm's audit and cyber-insurance obligations both assume recoverable backups exist. The team agrees to enable backups on all seven immediately with retention matched to data class, and to check the remaining two (genuinely disposable test databases) can stay off with a recorded reason.

Dana's line for the finance pack is one sentence: "Every data-bearing database now has a recovery window at effectively zero cost; the two exceptions are throwaway test systems with sign-off." The next time an auditor or the insurer asks whether production data is recoverable, the answer is a documented yes — and it cost nothing to be able to say so.

Why this matters to the risk register, not the budget

Unlike most resilience controls, this one barely touches the budget. Enabling automated backups adds nothing to the bill until the accumulated backups exceed the database's own allocated storage, and even then it's a few cents per gigabyte per month. So there is no premium to weigh, no tiering trade-off, no forecast to build — the cost of the protection is, for practical purposes, zero. That removes the usual finance objection before it's raised.

Because the cost is negligible, this belongs on the risk register rather than the cost report. A production database with backups disabled is a quantifiable, unbounded exposure: the value of the data at risk, plus the revenue lost during an unrecoverable incident, plus the contractual and regulatory penalties — against an annual cost of protection that rounds to nothing. There is no scenario where carrying that exposure on a data-bearing system is the financially rational choice.

The compliance and insurance dimension is where this hits finance directly. Auditors and cyber-insurers routinely require evidence that critical data is recoverable; a disabled backup is a control failure that can fail an audit or void a claim. The dollar value of an insurance payout that doesn't pay out because a precondition wasn't met dwarfs any conceivable saving from the setting being off. Treat disabled backups on regulated data as a finding to clear before the next audit cycle, not a cost line.

Finally, treat the few legitimate exceptions as first-class. A genuinely disposable test or CI database can stay unbacked, but that should be a recorded, reviewed decision with a reason — not a silent default that someone later mistakes for a production system. A clean picture is one where every disabled backup is either fixed or carries a documented, finance-visible justification, so both the recovery posture and its (near-zero) cost are defensible at audit.

What finance can actually do about disabled backups

Finance can't run modify-db-instance, but it owns the framing that makes recoverability a tracked obligation rather than a setting nobody owns. Four levers, used at the regular cadence.

1. Track backup coverage as a standing operational metric

Put "percentage of data-bearing databases with automated backups enabled" on the operational review pack, alongside the count of any that aren't. Because the cost is negligible, the only acceptable target for data-bearing systems is 100% — so any non-zero gap is the prompt to act, not a number to optimize against a budget.

2. Set retention standards by data class, not by cost

Agree a simple standard with engineering — for example, transactional data gets at least 7 days, regulated data more — and make it explicit that the choice is driven by recovery needs, since backup storage is effectively free. That converts retention from an ad-hoc per-database guess into a documented policy that auditors and insurers can be shown.

3. Require a recorded reason for every exception

Any database left with backups off should carry a documented, finance-visible justification — typically "throwaway test/CI, no real data" — not a silently ignored finding. That converts "we missed it" into "we decided," which is what survives an audit, an insurer's questionnaire, or a post-incident review.

4. Frame the gap as uninsured liability, not deferred cost

Because there's no premium to weigh, the framing for leadership is exposure: a data-bearing database without backups is an uninsured liability whose downside is the value of the data plus the cost of an unrecoverable incident. Saying so explicitly makes enabling backups an obvious, no-cost risk reduction rather than a discretionary spend.

Quick quiz

Question 1 of 5

A scan shows nine databases with automated backups disabled: seven hold customer or financial data, two are throwaway CI databases. As the finance partner, what's the right approach?

Keep learning

Go deeper on RDS automated backups, point-in-time recovery, the pricing that makes them nearly free, and how centralized AWS Backup layers on top.

You've finished the finance partner's view of RDS automated backups. You know the protection is effectively free — backup storage is included up to the database's own size — so this belongs on the risk register, not the cost report, and the four levers are: tracking coverage as a standing metric, setting retention standards by data class, recording every exception, and framing a disabled backup as uninsured liability rather than a saving. Next time it shows up, you'll have a sharper question than "what does turning this on cost us?"

Back to the library

RDS automated backups: the headline

Whether the business can recover its data after a bad change

Cloud databases can be configured to keep a rolling backup that lets the team rewind to any recent moment after a mistake corrupts or deletes data. When that setting is turned off — as it sometimes is by accident — there is no recovery path at all: the data is simply gone. This protection costs almost nothing, so a database running without it is an uninsured risk with no offsetting saving.

This is a business-continuity issue, not a cost one. The healthy state is simple: every database holding real data has backups enabled with a retention window that matches how far back the business might need to recover. A database found with backups off isn't just a technical gap — it's a signal about whether the organization has basic operational discipline around the data it depends on.

A short read for the executive who wants to know what this control protects and the one question to ask. You'll get the plain-English version of automated backups and point-in-time recovery, why a disabled backup is an uninsured liability rather than a saving, and what "good" looks like: every data-bearing database backed up with a retention window that matches recovery needs, and any exception on the record. No implementation detail.

Fun fact

The 30-second rewind that saved a Friday deploy

What it looks like when the org gets this right

At one company the CFO asked a pointed question after reading about a competitor's data-loss incident: "If one of our engineers ran a bad change tonight, could we get the data back?" The honest first answer was "for most databases, yes — but we're not sure about all of them." That uncertainty was the real finding.

The team turned it into a one-line operational metric: percentage of data-bearing databases with automated backups enabled and a retention window matched to their data class. It started at 84%. Within a sprint it was 100%, with a short, named list of intentionally-unbacked throwaway databases recorded as deliberate exceptions. Because the protection is effectively free, there was no budget conversation — just a discipline one.

That's the right end state for this control: not a heroic recovery story, but a boring confidence signal. "100% of databases that hold real data are recoverable, exceptions on the record" is a one-minute answer that tells leadership the organization treats its own data as something worth protecting.

Why this is on the report at all

This control answers a question every executive eventually faces after a competitor's incident or an auditor's request: "if someone makes a bad change to one of our databases, can we get the data back?" A database with backups disabled means the honest answer is no — the data would be gone, with no recovery path. Because the protection costs essentially nothing, there is no defensible reason to carry that exposure on any system holding real data.

The reason it's a leadership item is what its presence signals about operational maturity. A database found running without backups isn't usually a deliberate choice — it's a default that slipped through, which means the organization's guardrails didn't catch a basic, free, obvious protection on something it depends on. The healthy signal isn't a complicated dashboard; it's a flat, near-100% coverage number with a short list of recorded exceptions. A gap there is a cheap problem to fix and a revealing one to find.

The leadership move on disabled backups

The executive handle isn't to micromanage retention settings — it's to require that recoverability is a guaranteed default for anything holding real data, with exceptions on the record.

1. Make backups a non-negotiable default for data-bearing systems

Because the protection is effectively free, set the policy that any database holding real data launches with backups enabled — no per-database debate, no cost justification needed. A clear default ensures the systems that matter are recoverable by design rather than by whoever configured them.

2. Accept intentional exceptions only for genuine throwaways

A disposable test, CI, or scratch database can run without backups, but that should be a deliberate, recorded choice with a stated reason. The goal isn't a perfect dashboard for its own sake — it's that every gap is known and intentional, never a surprise discovered during an incident.

3. Ask for the one coverage number at the review

The one-line question is "are all our data-bearing databases recoverable, with every exception documented?" A consistent yes means the organization treats its own data as something worth protecting and its guardrails are working. That's a one-minute confidence signal that needs no technical depth.

Quick quiz

Question 1 of 5

Your operational review shows 100% of data-bearing databases have automated backups enabled, with two throwaway CI databases recorded as intentional exceptions. What's the right read?

Keep learning

Go deeper on RDS automated backups, point-in-time recovery, the pricing that makes them nearly free, and how centralized AWS Backup layers on top.

That's the lesson. Two takeaways: a database with backups disabled means the honest answer to "can we recover after a bad change?" is no, and the protection is essentially free, so there's no cost trade-off to weigh. The leadership question is whether every data-bearing database is recoverable with exceptions on the record — not whether anyone has reviewed the storage line.

Back to the library

Part of the learning path Build in resilience

Enable automated backups on RDS

RDS automated backups: the basics

The 30-second rewind that saved a Friday deploy

Enabling RDS backups in action

RDS automated backups under the hooddeep dive

What is the impact of disabled RDS backups?

How do you enable RDS backups safely?

1. Inventory every instance with BackupRetentionPeriod at 0

2. Choose a retention window matched to the data class

3. Enable backups with an off-peak backup window

4. Prevent regressions and layer in centralized backup where needed

Quick quiz

Keep learning

RDS automated backups: what it means for risk and exposure

The 30-second rewind that saved a Friday deploy

How a finance partner frames the backup gap

Why this matters to the risk register, not the budget

What finance can actually do about disabled backups

1. Track backup coverage as a standing operational metric

2. Set retention standards by data class, not by cost

3. Require a recorded reason for every exception

4. Frame the gap as uninsured liability, not deferred cost

Quick quiz

Keep learning

RDS automated backups: the headline

The 30-second rewind that saved a Friday deploy

What it looks like when the org gets this right

Why this is on the report at all

The leadership move on disabled backups

1. Make backups a non-negotiable default for data-bearing systems

2. Accept intentional exceptions only for genuine throwaways

3. Ask for the one coverage number at the review

Quick quiz

Keep learning

Related site reliability lessons