Site Reliability

Establish AWS Backup Plans

Without a Backup Plan there is no policy — recovery becomes whatever someone hopes is there. Wire up a plan that covers resources by tag.

16 min·10 sections·AWS

Last reviewed 27 May 2026

Backup Plans: the basics

What is an AWS Backup Plan and why does its absence get flagged?

An AWS Backup Plan is a single piece of configuration that bundles three things: a schedule (when to take a recovery point), a retention rule (how long to keep it before AWS deletes it), and a resource selection (which resources the plan applies to). Once a plan exists and a selection matches a resource, AWS Backup takes recovery points automatically — no Lambda, no cron, no team owning a homegrown snapshot script.

Without a plan, snapshots are ad hoc. Engineers create RDS snapshots manually before a release, EBS snapshots get taken once and never rotated, DynamoDB point-in-time recovery is enabled on some tables and not others. The fleet ends up in a state where nobody can answer the question "is this resource backed up?" without opening each console and squinting at it — and the answer to "can we restore yesterday's 14:00 state of this database?" is usually "let's see."

Continuity check BKP-003 ("No Backup Plans") fires on any account where there is no AWS::Backup::BackupPlan resource in a given region. The check is intentionally blunt: if no plan exists, no policy exists, and recovery is whatever someone happens to remember to do manually. Severity is CRITICAL because a region with zero backup plans is one accidental DeleteDBInstance away from a permanent data loss event.

In this lesson you'll learn how AWS Backup Plans are structured, how to use tag-based selection so coverage scales without per-resource toggles, how to layer multiple retention rules (daily/weekly/monthly) in a single plan, and how to verify coverage with AWS Config and Backup audit reports. You'll also see the KMS and vault setup that prevents a compromised account from also deleting its own recovery points.

Fun fact

The snapshot graveyard

An AWS field survey of mid-sized accounts found a median of 1,200 untagged manual EBS snapshots per account, sitting at $0.05/GB-month, with no record of which volume they came from or whether they were still needed. The average account was spending around $9k/year on snapshots nobody could identify. The fix wasn't "take fewer snapshots" — it was "replace manual snapshots with a Backup Plan that has explicit retention." The graveyard cleared itself within 35 days once the plan owned the lifecycle.

Establishing a Backup Plan in action

Marco runs platform reliability at a healthcare SaaS. A continuity scan flags BKP-003 (CRITICAL) across all four of their active regions — there are zero AWS::Backup::BackupPlan resources anywhere in the account. He has roughly 300 EBS volumes, 40 RDS instances, and a dozen DynamoDB tables that he believes are "backed up," but he can't prove it.

His goal isn't to back up everything — that's expensive and most of the fleet is ephemeral. He wants a single plan, applied by tag (BackupRequired=true), with three tiered rules: daily-7days, weekly-1month, monthly-1year. Anything tagged is in. Anything not tagged is intentionally out.

He starts by confirming the gap — no plans, no selections — and then drops the plan in.

First, confirm the finding. List every Backup Plan in the region. An empty list is the failure mode.

$ aws backup list-backup-plans --region eu-west-2 --query 'BackupPlansList[*].{Name:BackupPlanName,Id:BackupPlanId,Last:LastExecutionDate}' --output table

┌───────┬─────┬──────┐

│ Name │ Id │ Last │

├───────┼─────┼──────┤

└───────┴─────┴──────┘

# Zero plans in eu-west-2. Same result in us-east-1, eu-west-1, ap-southeast-2.

# Every protected-data resource in this region is relying on ad hoc snapshots.

Empty BackupPlansList — BKP-003 confirmed across every region with workloads.

Now create the plan. Three rules in one document — daily-7days, weekly-1month, monthly-1year — all writing to a dedicated vault with its own KMS key.

$ aws backup create-backup-plan --region eu-west-2 --backup-plan file://prod-tiered-plan.json

{

"BackupPlanId": "a1b2c3d4-5e6f-7890-abcd-ef0123456789",

"BackupPlanArn": "arn:aws:backup:eu-west-2:123456789012:backup-plan:a1b2c3d4-5e6f-7890-abcd-ef0123456789",

"CreationDate": "2026-05-15T09:42:11.103000+00:00",

"VersionId": "NDQyZmJmMWEt..."

}

# Plan created. Next: attach a selection so it actually picks up resources.

# Selection rule: any resource tagged BackupRequired=true, across EBS, RDS, DynamoDB, EFS, FSx.

Tiered plan landed. The plan is inert until a selection is attached — that's the next call (create-backup-selection).

AWS Backup under the hooddeep dive

A Backup Plan is a JSON document. Each rule inside it specifies a ScheduleExpression (cron), a TargetBackupVaultName, a Lifecycle (move-to-cold-after / delete-after), and a CompletionWindowMinutes after which an in-flight backup is abandoned. The plan itself does nothing until you create a Backup Selection that maps it to resources — either by ARN, by tag (StringEquals/StringLike on a tag key/value), or by resource type. Tag-based selection is the right default: tag a resource, it's covered; remove the tag, it's not.

Recovery points land in a Backup Vault. The vault is the storage container and also the access-control boundary — it has its own KMS key and its own resource-based policy. Best practice is one vault per environment with a dedicated CMK, and a vault access policy that denies backup:DeleteRecoveryPoint and backup:DeleteBackupVault to everyone except a small break-glass role. This means even a compromised admin in the source account cannot wipe the recovery points by accident or by design.

Pricing is per-recovery-point storage, not per-plan or per-rule. EBS snapshots in AWS Backup are billed at standard EBS snapshot rates (~$0.05/GB-month for the source-region copy); RDS, DynamoDB, EFS, and FSx each have their own per-GB-month rate, generally higher than EBS. Cross-region copies double the storage cost. A plan with 35-day retention against a 10 TB fleet typically costs $500-$800/month — budget for it before turning on monthly-1year retention across the whole org.

# The plan document referenced above — three tiered rules into a dedicated vault.
cat > prod-tiered-plan.json <<'JSON'
{
  "BackupPlanName": "prod-tiered",
  "Rules": [
    {
      "RuleName": "daily-7days",
      "TargetBackupVaultName": "prod-backup-vault",
      "ScheduleExpression": "cron(0 5 ? * * *)",
      "StartWindowMinutes": 60,
      "CompletionWindowMinutes": 180,
      "Lifecycle": { "DeleteAfterDays": 7 }
    },
    {
      "RuleName": "weekly-1month",
      "TargetBackupVaultName": "prod-backup-vault",
      "ScheduleExpression": "cron(0 5 ? * SUN *)",
      "Lifecycle": { "DeleteAfterDays": 30 }
    },
    {
      "RuleName": "monthly-1year",
      "TargetBackupVaultName": "prod-backup-vault",
      "ScheduleExpression": "cron(0 5 1 * ? *)",
      "Lifecycle": { "MoveToColdStorageAfterDays": 30, "DeleteAfterDays": 365 }
    }
  ]
}
JSON

aws backup create-backup-plan --backup-plan file://prod-tiered-plan.json

What is the impact of running without a Backup Plan?

The headline impact is permanent data loss. A DROP TABLE in production, a terraform destroy against the wrong workspace, a ransomware event, or an IAM compromise that walks the account looking for things to delete — all of these are recoverable if a Backup Plan with vault-level deletion protection exists, and unrecoverable if it doesn't. Most data-loss incidents that make the news aren't infrastructure failures; they're operator errors or compromised credentials against accounts that had no enforced backup policy.

The second-order impact is the recovery objective gap. Even when teams do take ad hoc snapshots, the snapshots almost never align with the business's stated RPO (recovery point objective) or RTO (recovery time objective). Marketing tells customers "we recover within 4 hours to within 1 hour of data loss"; ops have hourly snapshots on two databases and weekly snapshots on the rest. A Backup Plan makes the RPO a single piece of configuration — ScheduleExpression: rate(1 hour) — that you can show an auditor.

On the compliance side, SOC 2 CC7.5, ISO 27001 A.12.3, HIPAA §164.308(a)(7)(ii)(A), and PCI DSS Requirement 9.5.1 all expect a documented, tested, automated backup policy. "We have some snapshots" doesn't pass any of these. A Backup Plan with an audit report from AWS Backup Audit Manager is the cheapest possible answer to a control owner asking for evidence — it's a single artifact that proves coverage, retention, and recoverability.

There is also a cost impact in the wrong direction: turning on backups without budgeting for them. A 10 TB fleet with 35-day daily retention is somewhere around $500-$800/month in standard storage; add cross-region copies and monthly-1year cold storage and the number can climb past $2k. The cost is fine as long as it's a deliberate choice — but teams that flip the switch organisation-wide without running the math get a billing surprise the following month.

How do you establish backup coverage that actually holds?

Standing up a Backup Plan is a four-step loop. Skip any step and you end up with either uncovered resources, surprise bills, or recovery points that can be deleted by the same compromise that triggered the disaster.

1. Tag the fleet, then select by tag

Pick one tag — BackupRequired=true is the convention. Apply it to every resource that should be in the policy: RDS instances, EBS volumes attached to stateful workloads, DynamoDB tables, EFS file systems, FSx file systems. Then create the Backup Selection on that single tag. New resources tagged the same way are picked up automatically — no plan edits required. Resources that genuinely don't need backups (ephemeral autoscaling nodes, scratch volumes) stay untagged and stay out.

2. Run multiple rules in one plan for tiered retention

A single rule (daily, 35-day) is the AWS-managed Daily-35day-Retention plan — fine as a starter, not enough for compliance. Add a weekly rule retained 1 month and a monthly rule retained 1 year (with cold-storage transition after 30 days). The total storage cost is modest compared to single-tier daily, and you get long-horizon recovery without keeping every daily for a year.

3. Use a dedicated vault with its own KMS key and a deletion-deny policy

Never write to the default vault. Create a prod-backup-vault with a dedicated CMK, then attach a vault access policy that denies backup:DeleteRecoveryPoint and backup:DeleteBackupVault to every principal except a single break-glass role that requires MFA. This is the difference between "backups exist" and "backups survive a compromise." For the highest-risk workloads, enable AWS Backup Vault Lock in compliance mode — recovery points become immutable for the retention period even to AWS support.

4. Verify coverage continuously and roll plans org-wide

Enable the AWS Config managed rule aws-resource-backup-protected to fire whenever a resource type that should be covered isn't. Pair it with AWS Backup Audit Manager's built-in frameworks (Backup Resources Protected by Backup Plan, Backup Plan Min Frequency and Min Retention) for a nightly compliance report. For multi-account orgs, use AWS Backup central management with AWS Organizations to push the same plan from the management account to every member account — the alternative is hand-rolling plans per account and inevitably missing one.

# Step 2 of the loop: attach a tag-based selection so the plan actually picks up resources.
cat > prod-selection.json <<'JSON'
{
  "SelectionName": "backup-required-tag",
  "IamRoleArn": "arn:aws:iam::123456789012:role/service-role/AWSBackupDefaultServiceRole",
  "ListOfTags": [
    {
      "ConditionType": "STRINGEQUALS",
      "ConditionKey": "BackupRequired",
      "ConditionValue": "true"
    }
  ]
}
JSON

aws backup create-backup-selection \
  --backup-plan-id a1b2c3d4-5e6f-7890-abcd-ef0123456789 \
  --backup-selection file://prod-selection.json

# Verify which resources the selection now covers.
aws backup list-protected-resources \
  --query 'Results[*].{Arn:ResourceArn,Type:ResourceType,Last:LastBackupTime}' \
  --output table

Quick quiz

Question 1 of 5

Continuity check BKP-003 fires with severity CRITICAL because no Backup Plans exist in the region. You have ~300 EBS volumes, ~40 RDS instances, and a dozen DynamoDB tables — most are ephemeral, but the stateful ones must hit a 1-hour RPO and 1-year retention for compliance. What's the right first move?

Keep learning

Dig deeper into AWS Backup, vault hardening, and org-wide policy enforcement.

You've completed Establish AWS Backup Plans. You can now build a tag-driven Backup Plan with tiered retention, land recovery points in a vault that survives a compromised account, verify coverage with Config and Audit Manager, and roll the same policy to every account in the org. The next time a continuity scan flags BKP-003, you'll have a four-step loop ready to run — and recovery stops being whatever someone hopes is there.

Back to the library

Backup Plans: the cost and compliance framing

What zero backup plans means for spend predictability, recovery cost, and audit evidence

An AWS Backup Plan is a configuration document that answers three questions at once: when do recovery points get taken (the schedule), how long do they get kept (the retention rule), and which resources do they apply to (the selection). Once wired up, backup cost becomes a predictable, budgeted line: storage billed per GB at rates ranging from ~$0.05/GB-month for EBS to higher per-GB rates for RDS and DynamoDB, multiplied by the fleet size and the retention window. Without a plan, that cost doesn't disappear — it becomes invisible and uncontrolled, spread across manual snapshots nobody tracks.

The finance risk in having no plan isn't just data loss — it's the unmodelled recovery cost that materialises when something goes wrong. A database rebuild from a month-old manual snapshot can take hours of engineering time plus potential data re-entry labour, customer credits, and SLA breach penalties. None of that shows up in a budget until it happens. A Backup Plan caps the worst-case recovery scenario at a known retention tier and makes the tradeoff explicit: pay a predictable monthly storage cost, or carry the open-ended liability of ad hoc recovery.

Continuity check BKP-003 fires at CRITICAL severity when a region has zero AWS::Backup::BackupPlan resources. From a finance and audit standpoint, CRITICAL means this is not a best-practice gap — it is a documented uninsured position. SOC 2, HIPAA, and ISO 27001 all expect a formal, automated backup policy with evidence; 'we have some snapshots somewhere' is not evidence. Establishing even a minimal plan with tiered retention resolves the finding and produces the audit artefact at the same time.

This lesson is for the finance partner who sees a backup storage line on the AWS bill and needs to understand what it buys, what it costs to do properly, and what it costs to ignore. You'll learn how retention tiers translate directly into predictable monthly storage spend, why a tag-based selection model makes coverage auditable without per-resource overhead, and how to frame the backup premium as planned insurance rather than a surprise cost. No CLI commands required — focus is on the spend model, the tiering decision, and the audit artefact a compliant plan produces.

Fun fact

The snapshot graveyard

How a finance partner frames the backup coverage decision

Priya is the finance partner for a healthcare SaaS running 300 EBS volumes, 40 RDS instances, and a dozen DynamoDB tables. The continuity dashboard shows BKP-003 at CRITICAL across all four active regions — no Backup Plans anywhere. Rather than ask engineering to 'just fix it,' Priya starts by scoping the spend impact of doing it right.

She asks the tiering question first: which resources actually need formal backup coverage, and at what retention level? Engineering tags production databases and EFS file systems as BackupRequired=true. Ephemeral autoscaling nodes, scratch EBS volumes, and dev instances stay untagged and intentionally out. The tag-based model means Priya can see exactly what's in scope and cost-model it before any plan is created.

With the in-scope fleet identified, she runs the storage math: daily recovery points at 7-day retention, weekly at 1 month, monthly at 1 year with cold-storage transition after 30 days. The monthly storage cost comes out to roughly $600 — a planned, predictable line item. Priya puts it in the budget with the note 'backup coverage for 40 RDS + 12 DynamoDB + stateful EBS, tiered retention, compliant with HIPAA §164.308(a)(7)(ii)(A).' The CRITICAL finding clears, the audit artefact exists, and the cost was decided in advance rather than discovered on the next bill.

The four costs a missing Backup Plan exposes — and the one it creates

The primary exposure is unmodelled recovery cost. When a production database is lost without a formal backup, the cost of recovery isn't zero — it's engineering hours to rebuild, data re-entry labour for anything not captured in the last manual snapshot, customer credits, potential SLA breach payments, and regulatory notification if the data was personal or healthcare-related. None of that is budgeted before the incident. A Backup Plan caps the worst-case recovery scope at the retention window and makes the remaining exposure an explicit, finite number.

The second exposure is audit and compliance cost. SOC 2 CC7.5, HIPAA §164.308(a)(7)(ii)(A), and ISO 27001 A.12.3 all require a documented, tested backup policy. Without a formal Backup Plan, the evidence collection before an audit is manual and unreliable — engineers pull together screenshots and spreadsheets of 'what we think we have.' With a plan and AWS Backup Audit Manager enabled, the evidence is a single automated report. The time saved per audit cycle is typically measured in days.

The third exposure is hidden snapshot cost without the lifecycle control. Manual snapshots taken outside a Backup Plan don't expire unless someone explicitly deletes them. AWS field data shows a median of 1,200 untagged manual EBS snapshots per mid-size account, costing around $9k/year with no value. A Backup Plan with explicit retention rules replaces this graveyard with a self-managing lifecycle.

The one cost a Backup Plan does create is storage: recovery points billed at per-GB-month rates (~$0.05/GB for EBS, higher for RDS and DynamoDB). For a 10 TB stateful fleet with tiered retention, budget $500–$800/month as the baseline; add cross-region copies or extend monthly retention to a year and that can climb past $2k. This cost is predictable and should be modelled before enabling the plan, not discovered on the next bill. The finance contribution is to run that model in advance and treat backup storage as a planned infrastructure line, not an afterthought.

What finance can actually drive on backup coverage

Finance can't create backup plans, but it owns the framing that makes backup a deliberate spend rather than either an unchecked cost or an uninsured gap. Four levers, applied at the regular cadence.

1. Scope the fleet before the plan goes live

The tag-based selection model (BackupRequired=true) is also a cost model: every tagged resource is a predictable line item at known per-GB-month rates. Before the plan is enabled, run the storage math on the tagged fleet across all three retention tiers (daily-7days, weekly-1month, monthly-1year with cold transition). This converts backup from a bill surprise into a planned infrastructure cost. Finance's job is to insist on this modelling step, not to approve a plan without knowing what it will cost.

2. Treat cross-region copies and long retention as a separate budget decision

The base plan (single-region, tiered retention) is the baseline cost. Cross-region copies roughly double the storage cost for the copied data; extending monthly retention past 1 year adds proportionally. These are not automatic — they require explicit rules in the plan. Finance should review them as separate line items rather than accepting them as defaults. The question for each is: what regulatory or business requirement justifies this incremental cost, and is that requirement documented?

3. Require a tag audit before tagging becomes the selection mechanism

If the plan selects by tag, the cost is determined by what gets tagged. A tagging audit before the plan goes live — which resources are tagged BackupRequired=true, which are intentionally untagged, and which are untagged by omission — is the only way to control scope. Resources backed up by accident (ephemeral nodes, scratch volumes) add cost with no benefit. Finance should ask for the tagged-resource inventory and sign off on it as the coverage scope before the plan is created.

4. Track untagged manual snapshots as waste to reclaim

Once the Backup Plan is managing the lifecycle, manual snapshots taken outside the plan are either redundant or orphaned. They don't expire automatically and can accumulate at meaningful cost. A monthly check on manual snapshots older than the plan's longest retention tier is a practical waste-reclamation step — those snapshots can be deleted once the plan has taken over. Finance should put this on the regular cloud cost review agenda until the graveyard clears.

Quick quiz

Question 1 of 5

Before enabling a new tiered Backup Plan (BackupRequired=true tag selection, daily-7days, weekly-1month, monthly-1year), the finance partner is asked to approve the recurring spend. What is the right process?

Keep learning

Dig deeper into AWS Backup, vault hardening, and org-wide policy enforcement.

You've finished the finance partner's view of establishing AWS Backup Plans. You know the storage cost model — per-GB-month by resource type, multiplied by fleet size and retention tier — and why that model needs to run before the plan is created, not after. You understand the tag-based selection as both a coverage mechanism and a cost-scope control, the case for treating cross-region copies and extended retention as separate budget decisions, and why the vault deletion-deny policy is non-negotiable rather than optional. Next time BKP-003 appears, your contribution is the pre-approval cost model and the tag-fleet audit that makes backup spend predictable and defensible.

Back to the library

Backup Plans: the one thing leadership needs to know

No plan means no policy — and no defensible answer if something is lost

An AWS Backup Plan is the mechanism that turns 'we think our data is backed up' into 'we have a policy that says what gets backed up, how often, and for how long.' Without one, backup happens — if it happens at all — by individual engineers remembering to do it manually. That is not a policy; it is a hope.

Continuity check BKP-003 fires at CRITICAL severity when there are zero Backup Plans in a region. CRITICAL here means a single accidental deletion, a ransomware event, or a compromised credential walking the account could result in permanent, unrecoverable data loss. The control is blunt by design: if no plan exists, no policy exists.

The leadership question is not whether the team is trying hard — it is whether recovery is governed by policy or by accident. A Backup Plan is the answer to that question. It is also the cheapest possible defence against the kind of incident that generates customer notification letters, regulatory inquiries, and board-level post-mortems.

A short read for leaders who want to know what this control protects, what the accountability question is, and what 'good' looks like. You'll get the plain-English version of why zero backup plans is a CRITICAL finding, why recovery 'by policy not by memory' is the right frame, and what the minimum bar looks like — a single tag-driven plan covering the resources that matter, with vault protection that survives a compromised account. No implementation detail.

Fun fact

The snapshot graveyard

What it looks like when the org gets backup right

At one company the CISO used to get 'we have some snapshots' as the answer to 'can we recover from a data loss event?' After the team established a formal Backup Plan with tag-based selection and vault deletion protection, the answer changed shape: a one-page coverage report showing exactly which resources are backed up, at what frequency, for how long, and that the recovery points cannot be deleted even by a compromised admin account.

The point leadership cared about wasn't the technical mechanism — it was that recovery was now a policy, not a memory. The question 'if an engineer accidentally drops the wrong database at 2am, can we recover to within the last hour?' had a documented, auditable yes as the answer. The backup cost was a small, planned line on the cloud bill rather than a variable that materialised after an incident.

That's the right end state for this control: not 'we take some snapshots,' but 'we have a policy that defines what is covered, how long it's kept, and who can delete it — and none of those answers are accidental.'

Why a missing Backup Plan is a board-level risk item

The headline risk is simple: without a Backup Plan, a single bad event — a wrong deletion, a ransomware payload, a compromised credential walking the account — can result in permanent, unrecoverable data loss. The question 'can we recover?' has no confident answer. That is the class of incident that generates customer notification letters, regulatory inquiries, and post-mortems at the board level.

The second risk is that ad hoc backup is invisible to governance. When engineers take manual snapshots without a policy, there is no single place to verify coverage, no audit trail that satisfies a control owner, and no mechanism that prevents the snapshots from being quietly deleted. A Backup Plan makes coverage a policy artefact — something that can be shown to an auditor, a regulator, or a customer security questionnaire as evidence of a deliberate, maintained commitment.

The right frame for leadership is not 'how much does backup cost?' It is 'are we governing recovery by policy or by memory?' A Backup Plan is the answer. The cost — a predictable monthly storage line proportional to the fleet — is the price of being able to answer that question with a yes.

The leadership move on backup coverage

The executive handle here isn't to mandate backup on every resource — it's to require that every resource's backup status is a deliberate, recorded decision matched to its business importance.

1. Set a default: production data resources are backed up by policy

Make it a standing requirement that any resource holding customer data, transaction records, or regulated information carries a formal backup policy. A tag-based plan makes this simple to enforce: tag the resource, it's covered. The default removes per-resource debate and ensures coverage is governed by policy, not by whoever happened to be on call the night the database was created.

2. Accept intentional exclusions for ephemeral resources

Not everything needs a backup. Ephemeral autoscaling nodes, scratch volumes, and throwaway dev instances are correctly untagged and correctly uncovered. Don't drive the backup bill to infinity by covering everything — drive it to the right level by ensuring every exclusion is a deliberate decision. The goal is 'every production data resource covered and every exclusion documented,' not 'maximum coverage.'

3. Ask for vault deletion protection as a non-negotiable

A backup that can be deleted by the same account compromise that triggered the disaster is not a backup — it is a comfort. Vault deletion protection (deny-delete policy on the vault, plus Vault Lock for highest-risk workloads) is the difference between 'we have backups' and 'our backups survive a bad day.' This is the one technical detail leadership should ask about: not whether backups exist, but whether they are protected from deletion by a compromised admin.

Quick quiz

Question 1 of 5

A continuity review shows BKP-003 CRITICAL in two regions: zero Backup Plans. Engineering proposes a tag-driven plan covering production databases and stateful storage, with a vault policy that blocks recovery-point deletion. What is the right leadership response?

Keep learning

Dig deeper into AWS Backup, vault hardening, and org-wide policy enforcement.

That's the lesson. The one question that matters: is recovery governed by policy or by memory? A Backup Plan with tag-based selection and vault deletion protection is the answer — it defines what's covered, at what tier, and ensures recovery points survive even a compromised account. The leadership signal to look for isn't a zero finding count; it's that every production data resource is covered by design, every exclusion is documented, and the vault cannot be wiped in a bad moment. That's resilience by policy.

Back to the library

Part of the learning path Build in resilience

Establish AWS Backup Plans

Backup Plans: the basics

The snapshot graveyard

Establishing a Backup Plan in action

AWS Backup under the hooddeep dive

What is the impact of running without a Backup Plan?

How do you establish backup coverage that actually holds?

1. Tag the fleet, then select by tag

2. Run multiple rules in one plan for tiered retention

3. Use a dedicated vault with its own KMS key and a deletion-deny policy

4. Verify coverage continuously and roll plans org-wide

Quick quiz

Keep learning

Backup Plans: the cost and compliance framing

The snapshot graveyard

How a finance partner frames the backup coverage decision

The four costs a missing Backup Plan exposes — and the one it creates

What finance can actually drive on backup coverage

1. Scope the fleet before the plan goes live

2. Treat cross-region copies and long retention as a separate budget decision

3. Require a tag audit before tagging becomes the selection mechanism

4. Track untagged manual snapshots as waste to reclaim

Quick quiz

Keep learning

Backup Plans: the one thing leadership needs to know

The snapshot graveyard

What it looks like when the org gets backup right

Why a missing Backup Plan is a board-level risk item

The leadership move on backup coverage

1. Set a default: production data resources are backed up by policy

2. Accept intentional exclusions for ephemeral resources

3. Ask for vault deletion protection as a non-negotiable

Quick quiz

Keep learning

Related site reliability lessons