Skip to main content
emnode / learn
Site Reliability

Protect RDS instances with AWS Backup

Native RDS backups die with the database — bring your RDS instances and Aurora clusters under a centralized AWS Backup plan so they're protected by policy, not per-DB settings.

13 min·10 sections·AWS

Last reviewed

Unprotected RDS instances: the basics

Why native automated backups aren't the whole story

Every RDS instance can take native automated backups — daily snapshots plus transaction logs that give you point-in-time recovery (PITR) anywhere in a retention window of 1 to 35 days. That's excellent for fast operational recovery: fat-fingered DELETE, a bad migration, a corrupted table at 2pm. But native backups have one fatal property — they live with the instance. They sit in the same account, can't be retained beyond 35 days, and when the database is deleted they are deleted with it. A rogue admin, a compromised root credential, or terraform destroy against the wrong workspace takes the database and every native backup in a single action.

AWS Backup is the centralized alternative. Instead of each DB owning its own retention setting, a backup plan defines policy once — schedule, retention, lifecycle, copy rules — and selects resources by tag. Recovery points land in a separate backup vault that can live in another Region and another AWS account entirely. Crucially, they survive deletion of the source database. Delete the RDS instance and the AWS Backup recovery point is still sitting in the vault, ready to restore. For ransomware resilience, rogue-admin protection, and any compliance regime that demands isolated, long-retained, immutable copies, this is the difference between a recoverable incident and a resume-generating one.

Continuity check COV-002 ("Unprotected RDS Instances") cross-references every RDS instance and Aurora cluster against the resources covered by an AWS Backup plan. A database with only native automated backups and no AWS Backup coverage fails the check — because native backups alone do not survive the deletion of the thing they protect. Severity is HIGH: the gap is invisible until the worst possible moment, and by then there's nothing left to restore from.

In this lesson you'll learn the difference between native RDS automated backups and centralized AWS Backup, why the two are complementary rather than competing, and how to detect RDS databases that have no AWS Backup coverage. You'll see how tag-based resource selection auto-protects new databases, how backup vaults plus Vault Lock give you immutable WORM copies, how lifecycle rules tier old recovery points into cold storage, and how cross-Region and cross-Account copy give you a backup an attacker in the production account can't reach. You'll get the exact CLI calls to find unprotected databases, attach them to a plan by tag, and kick off an on-demand backup of a specific RDS ARN.

Fun fact

The deletion that took the backups with it

In a widely-discussed 2014 incident, a code-hosting startup was hit by an attacker who gained access to its AWS console and deleted the production database, the EC2 instances, and — critically — the backups, all from the same account in a matter of minutes. The native backups lived in the same blast radius as production, so a single set of stolen credentials was enough to erase everything. The company never recovered and shut down within days. The lesson the whole industry took away: a backup that an attacker in your production account can delete is not a backup. AWS Backup's cross-account copy into a separate, Vault-Locked account exists precisely so the same set of credentials can't reach both production and its recovery points.

Closing the RDS coverage gap in action

Marco is the SRE on call when COV-002 fires for the production account: 9 RDS instances and 2 Aurora clusters with no AWS Backup coverage, 4 of them flagged HIGH because they're tagged Environment=prod and DataClass=pii. The flagged set includes the primary orders database — exactly the system of record that has to survive a worst-case event, not just a Tuesday-afternoon bad migration.

He doesn't assume the absence of AWS Backup means no backups at all. Native automated backups are almost certainly on with a 7-day window, which is fine for fast PITR. What's missing is the isolated, long-retained, deletion-proof copy — and AWS Backup's list-protected-resources endpoint is the source of truth for who has one. Anything not in that list has nothing that survives deletion of the source database, regardless of what the native retention setting says.

He starts by cross-referencing the live RDS inventory against the set of resources AWS Backup currently protects, scoped to RDS and Aurora.

First, build the coverage gap query — every RDS instance minus everything AWS Backup currently considers protected.

$ comm -23 <(aws rds describe-db-instances --query 'DBInstances[].DBInstanceArn' --output text | tr '\t' '\n' | sort) <(aws backup list-protected-resources --query "Results[?ResourceType=='RDS'].ResourceArn" --output text | tr '\t' '\n' | sort)
arn:aws:rds:us-east-1:123456789012:db:orders-prod
arn:aws:rds:us-east-1:123456789012:db:billing-prod
arn:aws:rds:us-east-1:123456789012:db:identity-prod
arn:aws:rds:us-east-1:123456789012:db:analytics-stage
# ... 5 more ...
# 9 databases have native backups only — nothing survives a delete of the instance.

The set-difference between live RDS instances and AWS Backup protected resources is the coverage gap.

Create a tag-based selection on the backup plan so every database tagged BackupRequired=true is picked up automatically at the next plan run — including databases that don't exist yet.

$ aws backup create-backup-selection --backup-plan-id 8a2c5e9f-prod-daily-35d --backup-selection 'SelectionName=tag-based-rds,IamRoleArn=arn:aws:iam::123456789012:role/AWSBackupDefaultServiceRole,ListOfTags=[{ConditionType=STRINGEQUALS,ConditionKey=BackupRequired,ConditionValue=true}]'
{
"SelectionId": "b1d4...selection",
"BackupPlanId": "8a2c5e9f-prod-daily-35d",
"CreationDate": "2026-05-26T10:14:22.000Z"
}
# Now tag the unprotected databases for inclusion:
$ aws rds add-tags-to-resource --resource-name arn:aws:rds:us-east-1:123456789012:db:orders-prod \
--tags Key=BackupRequired,Value=true Key=BackupTier,Value=daily-35d
# Tag-based selection means coverage scales with the fleet, not with engineering hours.

One selection rule, tag-driven — new databases inherit protection the moment they're tagged.

For an immediate isolated copy of a flagged database, kick off an on-demand backup job against the RDS ARN. This lands a recovery point in the vault now, before the next scheduled window.

$ aws backup start-backup-job --backup-vault-name prod-isolated-vault --resource-arn arn:aws:rds:us-east-1:123456789012:db:orders-prod --iam-role-arn arn:aws:iam::123456789012:role/AWSBackupDefaultServiceRole --lifecycle MoveToColdStorageAfterDays=90,DeleteAfterDays=2555
{
"BackupJobId": "3f7c-8a91-orders-prod",
"CreationDate": "2026-05-26T10:18:44.000Z"
}
# Recovery point lands in prod-isolated-vault; cross-account copy rule replicates it off-account.
# DeleteAfterDays=2555 = 7-year retention — far beyond native's 35-day ceiling.

On-demand backup gives you an isolated, long-retained recovery point immediately — no waiting for the schedule.

Native backups vs AWS Backup under the hooddeep dive

Native RDS automated backups are storage-level snapshots plus a continuous stream of transaction logs, both retained for a window you set between 1 and 35 days. The transaction logs are what enable point-in-time recovery to any second in that window. They're free up to the size of your database, take no operational effort, and are perfect for fast operational recovery. Their constraint is architectural: they're a property of the DB instance, stored in an AWS-managed account associated with your account, capped at 35 days, and reaped automatically when the instance is deleted. There is no native way to keep a copy beyond 35 days or to put it somewhere the source account can't reach.

AWS Backup turns backup into a policy object that lives independently of the database. A backup plan defines schedule, retention (up to indefinite), lifecycle transition to cold storage, and copy rules; a backup selection binds resources to the plan by tag or ARN. Recovery points land in a backup vault — a logical container with its own access policy and KMS key. Two vault features change the risk profile entirely: Vault Lock enforces WORM (write-once-read-many) immutability so that, in compliance mode, not even the AWS account root can delete a recovery point before its retention expires; and cross-Region / cross-Account copy rules replicate recovery points into a vault in a different Region and a different AWS account. That isolated copy is the one that survives ransomware, a compromised root credential, or a delete-db-instance against the wrong account.

The two are complementary, not competing, and a mature setup runs both. Native automated backups handle the high-frequency, low-latency case: "a migration corrupted a table ten minutes ago, rewind PITR." AWS Backup handles the low-frequency, high-stakes case: "the account was compromised / we need a 7-year retained copy for the auditor / the Region is down." Lifecycle rules in the plan transition older recovery points from warm to cold (Glacier-class) storage automatically — typically after 90 days — dropping the per-GB cost by roughly 75% while keeping multi-year retention affordable. RDS snapshot storage is incremental and deduplicated, so the marginal cost of each additional recovery point is only the changed blocks.

# Confirm a database has ONLY native backups and no AWS Backup recovery points.
# 1. Native automated backup window (lives with the instance, max 35 days):
aws rds describe-db-instances \
  --db-instance-identifier orders-prod \
  --query 'DBInstances[0].{Retention:BackupRetentionPeriod,Window:PreferredBackupWindow}'

# 2. Any AWS Backup recovery points for the same ARN (survive instance deletion):
aws backup list-recovery-points-by-resource \
  --resource-arn arn:aws:rds:us-east-1:123456789012:db:orders-prod \
  --query 'RecoveryPoints[].{Arn:RecoveryPointArn,Vault:BackupVaultName,Status:Status,Created:CreationDate}' \
  --output table

# Empty result from step 2 = native-only = fails COV-002.

What is the impact of leaving an RDS instance unprotected?

The direct impact is binary, exactly like EC2: when a recovery event happens, either you have a copy that survives the event or you don't. For a database the failure modes that matter most are the ones native backups can't help with — a deleted instance (and with it every native snapshot), a compromised account where an attacker wipes everything reachable, or a need to restore data older than the 35-day native ceiling. In all three cases a database relying on native backups alone has nothing to restore from. The recovery path collapses to "reconstruct from application logs and replicas if any survived," which for a system of record means hours to days of downtime and probable permanent data loss.

The second-order impact is decision pressure during the incident, magnified because it's data rather than compute. Lose a stateless app server and you rebuild it. Lose the orders database and its only backups together, and the incident commander is choosing between "restore from a manual snapshot someone took for a migration four months ago," "reconstruct balances from event logs and accept the gaps," or "tell customers their data is gone." A current, isolated AWS Backup recovery point reduces that entire decision tree to "restore the recovery point into a new instance and verify."

On the regulatory side the bar is higher for databases than almost anything else, because they hold the regulated data itself. SOC 2 CC9.1, ISO 27001 A.12.3, PCI-DSS requirement 12.10, and HIPAA's contingency-planning rule all expect demonstrable, tested, and crucially isolated backups for systems holding regulated records. A database flagged unprotected by your own continuity check is documented awareness of a gap — and once the gap is a known, recorded finding, leaving it open is far worse in an audit than never having checked. Immutable Vault-Locked copies are increasingly the explicit control auditors look for as ransomware-recovery expectations harden.

The cost side is real but modest, and the asymmetry is the whole point. AWS Backup for RDS is snapshot storage — roughly $0.095/GB-month warm in US-East, dropping to about $0.02/GB-month once lifecycle moves recovery points to cold storage, deduplicated so each new point bills only changed blocks. For a 200 GB database with daily backups, 35-day warm retention, and a cross-account copy, expect somewhere in the range of $15-40/month all-in. Against that you're insuring the system of record — the single most expensive thing on the bill to lose, and the only one you genuinely cannot rebuild. This is the cheapest insurance policy you'll buy and the one you'll be most grateful for exactly once.

How do you bring an RDS instance under protection?

Closing the gap is a four-step loop: find what's exposed, decide the protection tier each database needs, bring it under a centralized plan with isolation, and make sure new databases can't slip through uncovered.

1. Inventory the coverage gap and confirm it's native-only

Cross-reference describe-db-instances (and describe-db-clusters for Aurora) against backup list-protected-resources scoped to RDS. The set difference is your gap. For each database in the gap, confirm what protection actually exists: native automated backups are almost always on, but they're capped at 35 days and die with the instance. A database with native backups only still fails the check, because the failure modes that matter — deletion, account compromise, long-retention compliance — are exactly the ones native backups can't cover. Prioritize by data class: production and regulated databases first.

2. Define plans by tier and select resources by tag

Create one or two backup plans (e.g. daily-35d for general production, daily-7y-isolated for regulated data) with tag-based resource selection on BackupRequired=true and a BackupTier tag that routes to the right plan. Tagging for inclusion makes coverage scale with the fleet — every new database gets the tag in its Terraform module or launch process and AWS Backup picks it up at the next plan run. Native automated backups stay on alongside this: they're complementary. Native handles fast PITR; the centralized plan handles isolated long-term recovery.

3. Isolate and immutabilize: separate vault, cross-account copy, Vault Lock

The recovery point has to land somewhere an attacker in the production account can't reach. Configure the plan's copy rule to replicate into a vault in a separate AWS account (and ideally a different Region) — that's what survives a compromised root credential or a delete-db-instance against the wrong account. Apply AWS Backup Vault Lock in compliance mode on the destination vault so recovery points are WORM-immutable and cannot be deleted before retention expires, even by the account root. Add a lifecycle rule (MoveToColdStorageAfterDays=90) to keep multi-year retention affordable.

4. Prevent recurrence with AWS Config and IaC defaults

Enable the AWS Config managed rule rds-resources-protected-by-backup-plan to alert on any RDS instance or Aurora cluster without recent backup coverage. For prevention, bake BackupRequired=true and the appropriate BackupTier into your Terraform/CloudFormation modules for every database pattern, and lint pull requests to flag new RDS resources without a backup tag. The goal is that an unprotected production database simply can't be created — protection is a property of the module, not a step someone has to remember.

# Bulk-tag every RDS instance in the account that isn't already protected by AWS Backup.
UNPROTECTED=$(comm -23 \
  <(aws rds describe-db-instances \
      --query 'DBInstances[].DBInstanceArn' --output text | tr '\t' '\n' | sort) \
  <(aws backup list-protected-resources \
      --query "Results[?ResourceType=='RDS'].ResourceArn" --output text | tr '\t' '\n' | sort))

for arn in $UNPROTECTED; do
  aws rds add-tags-to-resource --resource-name "$arn" \
    --tags Key=BackupRequired,Value=true Key=BackupTier,Value=daily-35d
done

# Verify coverage at the next plan run.
aws backup list-backup-jobs \
  --by-state COMPLETED --by-created-after $(date -u -d '24 hours ago' +%FT%TZ) \
  --query 'BackupJobs[?ResourceType==`RDS`].ResourceArn'

Quick quiz

Question 1 of 5

Your production orders database has native automated backups with a 14-day retention window. COV-002 flags it as unprotected. What's the right next move?

Keep learning

Dig deeper into centralized backup, immutability, cross-account isolation, and continuous coverage detection for databases.

You've completed Protect RDS instances with AWS Backup. You now know why native automated backups — capped at 35 days and deleted with the instance — aren't enough on their own, how a centralized AWS Backup plan adds isolated, immutable, long-retained copies via tag-based selection, Vault Lock, and cross-account/cross-Region copy, and that the two are complementary. The next time COV-002 fires on a production database, you'll have a four-step loop ready: inventory the gap, tier the protection, isolate the copy, and prevent recurrence.

Back to the library