Cost

Delete idle ElastiCache clusters

An ElastiCache cluster serving no traffic still bills per node-hour for its full reserved capacity — find the idle ones, take a final snapshot, and tear them down cleanly.

13 min·10 sections·AWS

Last reviewed 27 May 2026

Idle ElastiCache clusters: the basics

Why a cache with no traffic still costs full price

Amazon ElastiCache (Redis OSS, Valkey, or Memcached) is billed per node-hour for the full provisioned capacity of every node in the cluster — not for how much of the cache you actually read or write. A cache.t3.medium node runs about $0.068/hour, or roughly $50 a month, and that meter keeps ticking whether the cache serves a million requests a second or none at all. Multi-node clusters scale the bill linearly: a three-node replication group is three times the node-hours, every hour, forever, until you delete it.

The check flags a cluster that shows effectively no activity — below 1% memory utilization (BytesUsedForCache near zero), near-zero CurrConnections, and flat CmdGet/CmdSet counts over a 7-to-14-day window. That pattern almost always means one of two things: the cluster was over-provisioned at launch "to be safe" and the workload never grew into it, or the application that depended on it was retired and the cache was left behind. Either way it's reserved capacity nobody is using.

It's flagged because caches are easy to forget. They sit one layer behind the application, they don't appear in the user-facing architecture, and "is anything still pointing at this Redis?" is a harder question than it looks. A single idle cache.r6g.large costs about $115 a month; a forgotten three-node cluster is several hundred. The longer it runs unexamined, the more it costs and the less anyone remembers why it exists.

In this lesson you'll learn the node-hour billing model that makes an idle cache cost full price, how to tell a genuinely abandoned cluster from a legitimate warm-standby, staging, or batch cache, and the safe teardown pattern — confirm idleness over 14 days, identify the owning app, take a final snapshot, delete the replication group, then clean up the orphaned subnet group, parameter group, and security-group rules. You'll see the CloudWatch metric pulls that prove a cache is idle and the CLI sequence to snapshot-then-delete without leaving plumbing behind for someone to wire up by mistake.

Fun fact

The cache that outlived its app by a year

A platform team auditing a long-lived staging account found a three-node cache.r5.large Redis replication group humming along with CurrConnections flat at zero for 380 days straight. The session-store service it had been built for was decommissioned the previous spring — but the cache, its subnet group, and its security group were never touched. At roughly $0.226/node-hour it had quietly billed over $5,900 across the year for a cache nothing had connected to since the service shipped its last release.

Retiring an idle cache in action

Devi runs the FinOps cadence at a mid-sized SaaS company. The dashboard flags a Redis OSS replication group, sessions-stg, a single cache.r6g.large node, with BytesUsedForCache under 1% and CurrConnections flat at zero for the last 14 days — about $115/month on the bill, provisioned 11 months ago.

She doesn't delete it on the metric alone. A cache can look idle and still be a warm standby, a staging cache that only sees traffic during release windows, or a batch cache that's hot for two hours once a week. So she pulls 14 days of CurrConnections, CacheHits, and NetworkBytesIn at hourly granularity to rule out a periodic pattern, then chases the owning app via the cluster's security-group references and its Owner tag.

The metrics are dead flat — no weekly spike, no release-window blip — and the security group is referenced by nothing. The Owner=raj.p tag leads to a Slack message: "sessions-stg Redis still needed?" Raj replies in minutes: "That was for the old session service, we moved to DynamoDB last year — kill it." Devi takes a final snapshot to S3 as insurance, waits for available, deletes the replication group, then cleans up the now-orphaned subnet group and parameter group so nothing reattaches to them by accident.

First, confirm the cache is genuinely idle: pull 14 days of connection counts at hourly granularity so a weekly batch or release-window spike can't hide in a daily average.

$ aws cloudwatch get-metric-statistics --namespace AWS/ElastiCache --metric-name CurrConnections --dimensions Name=CacheClusterId,Value=sessions-stg-001 --start-time $(date -u -d '14 days ago' +%FT%TZ) --end-time $(date -u +%FT%TZ) --period 3600 --statistics Maximum

{

"Datapoints": [

{ "Timestamp": "2026-05-12T00:00:00Z", "Maximum": 0.0, "Unit": "Count" },

{ "Timestamp": "2026-05-13T00:00:00Z", "Maximum": 0.0, "Unit": "Count" },

{ "Timestamp": "2026-05-14T00:00:00Z", "Maximum": 0.0, "Unit": "Count" },

{ "Timestamp": "2026-05-15T00:00:00Z", "Maximum": 0.0, "Unit": "Count" }

]

}

# Peak connections never rose above 0 in 14 days — no client, no batch, no warm standby.

Maximum (not Sum) at hourly granularity exposes any periodic client — a weekly batch would show a non-zero peak.

Confirmed idle and owner signed off — take a final snapshot to S3 first, then delete the whole replication group (this removes every node, not just one).

$ aws elasticache create-snapshot --replication-group-id sessions-stg --snapshot-name sessions-stg-final-2026-05-26 && aws elasticache delete-replication-group --replication-group-id sessions-stg --final-snapshot-identifier sessions-stg-final-2026-05-26

{

"ReplicationGroup": {

"ReplicationGroupId": "sessions-stg",

"Status": "deleting",

"MemberClusters": ["sessions-stg-001", "sessions-stg-002", "sessions-stg-003"],

"SnapshottingClusterId": "sessions-stg-002"

}

# Final snapshot captured before teardown — restorable for ~the snapshot retention window if needed.

Delete the replication group, not individual nodes — and Redis/Valkey can snapshot first; Memcached cannot, so confirm extra-hard before deleting one.

How ElastiCache billing actually worksdeep dive

ElastiCache bills per node-hour for every node in the cluster, on demand, regardless of utilisation. Rough US-East on-demand rates: cache.t3.micro ≈ $0.017/hr (~~$12/mo), cache.t3.medium ≈ $0.068/hr (~~$50/mo), cache.r6g.large ≈ $0.156/hr (~~$115/mo), cache.r6g.xlarge ≈ $0.312/hr (~~$230/mo). A Redis OSS or Valkey cluster-mode replication group multiplies that by its node count — primaries and replicas alike bill at the node rate — so a three-node r6g.large group is ~$345/month whether it serves a request or not. There is no "stopped" state for ElastiCache the way there is for EC2: the only way to stop the meter is to delete the cluster.

The metrics that prove idleness live in the AWS/ElastiCache CloudWatch namespace. BytesUsedForCache under ~1% of node capacity means almost nothing is stored; CurrConnections flat at zero means no client is even connected; and CmdGet/CmdSet (Redis/Valkey) or GetHits/CmdGet (Memcached) flat at zero means no reads or writes are happening. Pull Maximum at hourly granularity over 14 days rather than daily Average — a daily average can flatten a once-a-week batch spike into apparent silence, and a warm-standby or release-window cache is exactly the false positive you want to avoid deleting.

Teardown differs by engine. Redis OSS and Valkey support a final snapshot — the cache serialises to an .rdb file stored in S3-backed snapshot storage (billed at standard backup rates, far cheaper than the live node-hours), so you get a restore path. Memcached has no persistence and cannot snapshot; deleting it is irreversible, so the connection and command-rate checks have to carry the full weight. For cluster-mode replication groups, delete the replication group rather than individual cache clusters — deleting nodes one at a time can trigger failovers and leaves a partial cluster billing on the way down.

# Describe the cluster to capture engine, node type, node count, and creation time.
aws elasticache describe-replication-groups \
  --replication-group-id sessions-stg \
  --query 'ReplicationGroups[0].{Status:Status, NodeType:CacheNodeType, Members:MemberClusters, AutomaticFailover:AutomaticFailover}'

# Confirm memory is effectively empty before deletion (Average over 14 days, MB).
aws cloudwatch get-metric-statistics \
  --namespace AWS/ElastiCache --metric-name BytesUsedForCache \
  --dimensions Name=CacheClusterId,Value=sessions-stg-001 \
  --start-time $(date -u -d '14 days ago' +%FT%TZ) \
  --end-time $(date -u +%FT%TZ) \
  --period 86400 --statistics Average Maximum

What's the impact of leaving idle ElastiCache clusters running?

The direct cost is node-hours billed for capacity nothing is using. A single idle cache.t3.medium is ~$50/month; a cache.r6g.large is ~$115; a three-node r6g.large replication group provisioned for HA is ~$345. Across a long-lived org with staging accounts, decommissioned services, and over-provisioned launches, idle caches routinely add up to thousands of dollars a month — and unlike EC2 there's no stop button, so the meter runs every hour until someone deletes the cluster outright.

There's a sizing trap layered on top of the idle-cache problem. Teams pick a node type at launch "to be safe," and a cache that genuinely needed a t3.micro (~~$12/mo) ends up on an r6g.large (~~$115/mo) — a 10x overspend that looks like normal usage on the bill because the cache isn't idle, just oversized. The idle check catches the clusters with no traffic; a right-sizing review catches the ones running hot on far more memory than they'll ever fill.

Cluster topology multiplies any mistake. A multi-AZ replication group bills every replica node at the full node rate, so an over-provisioned or abandoned cluster-mode cache compounds three- or six-fold versus a single node. And because ElastiCache has no stopped state, the usual "pause it for a sprint" instinct doesn't exist here — the only lever is delete-and-restore-from-snapshot, which means idle caches tend to just sit there indefinitely rather than being parked.

Finally, abandoned caches are a hygiene and security drag, not only a cost one. They hold network plumbing — subnet groups, parameter groups, security-group rules — that clutters the VPC, and a forgotten Redis with an open security group is exactly the kind of stale, unowned data store that shows up as an audit or pentest finding. A short list of caches with clear owners is far easier to reason about, and to defend at audit, than a long one nobody remembers provisioning.

How do you retire idle ElastiCache clusters safely?

Deleting a cache is cheap; deleting one that a weekly batch quietly depends on is how you cause a 2 a.m. incident. Run this four-step loop for every flagged cluster.

1. Confirm idleness over a real 14-day window

Pull CurrConnections, CacheHits/CmdGet, and NetworkBytesIn at hourly granularity for at least 14 days, using Maximum not Average. A daily average flattens a once-a-week batch or a release-window cache into apparent silence; hourly peaks won't. If any hour is non-zero, treat the cluster as in use until you've identified the client — a warm standby, staging cache, or monthly compliance job is exactly the false positive you don't want to delete.

2. Identify the owning application before touching anything

Trace who depends on the cache via its security-group references (which security groups are allowed to reach it, and what's attached to them), its parameter group and tags, and the cluster's Owner tag. A short message to the owner — "this cache has been idle 14 days, OK to retire?" — resolves most decisions in a day. Never delete on the metric alone on a first pass; build trust with the team before automating the destruction path.

3. Snapshot (if you can), then delete the replication group

For Redis OSS or Valkey, take a final snapshot to S3-backed storage — it's far cheaper than the live node-hours and gives you a restore path for the retention window. For cluster-mode, delete the whole replication group rather than individual nodes; deleting nodes one at a time can trigger failovers and leaves a partial cluster billing. Memcached cannot snapshot at all, so its deletion is irreversible — make the connection and command checks carry the full weight before you pull it.

4. Clean up the orphaned plumbing so it can't be reused by mistake

Deleting the cluster leaves its subnet group, parameter group, and security-group rules behind. Remove or clearly tag them so nobody wires a new cache to a stale config — and so the next audit doesn't have to puzzle over networking that points at nothing. Adopt a tag convention (Owner, ExpiresAt, Lifecycle=ephemeral|persistent) on new caches and enforce it with AWS Config, so the next month's idle-cache report is shorter by default.

# 1. Final snapshot (Redis/Valkey only) and delete the replication group in one go.
aws elasticache delete-replication-group \
  --replication-group-id sessions-stg \
  --final-snapshot-identifier sessions-stg-final-2026-05-26

# 2. Wait for full deletion before cleaning up dependent resources.
aws elasticache wait replication-group-deleted \
  --replication-group-id sessions-stg

# 3. Remove the now-orphaned subnet group and custom parameter group.
aws elasticache delete-cache-subnet-group --cache-subnet-group-name sessions-stg-subnets
aws elasticache delete-cache-parameter-group --cache-parameter-group-name sessions-stg-params

# 4. For a standalone Memcached/single-node cache there's no snapshot — delete directly.
# aws elasticache delete-cache-cluster --cache-cluster-id legacy-memcached-001

Quick quiz

Question 1 of 5

You find a single-node Redis OSS cluster with CurrConnections flat at zero and BytesUsedForCache under 1% for 14 days. The owner confirms the dependent service was retired last year. What's the right next move?

Keep learning

Go deeper into ElastiCache economics, the metrics that prove idleness, and the cleanup tooling around it.

You've completed Delete idle ElastiCache clusters. You now know why a cache with no traffic still bills full price per node-hour, how to prove idleness over a real 14-day window without nuking a warm standby or batch cache, and the safe teardown loop — confirm, identify the owner, snapshot-then-delete the replication group, clean up the plumbing. The next time the wastage report flags an idle cache, you'll have a defensible path from "flagged" to "resolved" with a snapshot in your back pocket.

Back to the library

Idle ElastiCache clusters: what it means for the bill

Reserved capacity that bills in full whether it's used or not

A managed cache like ElastiCache is charged for the capacity you reserve, not the work it does. AWS bills it by the node-hour — a fixed amount per node per hour — and that charge is identical whether the cache is handling heavy traffic or sitting completely idle. So a team can switch off the application that used a cache, leave the cache running, and keep paying full price for it every hour with nothing flowing through it. A single mid-sized node is roughly $50 a month; a multi-node cluster is a multiple of that.

This finding flags caches that are demonstrably idle — almost no memory in use, almost no connections, almost no read or write commands over a two-week window. In practice these are caches that were sized generously "to be safe" at launch and never grew into it, or caches stranded after the service that needed them was decommissioned. Each one is a fixed monthly charge that recurs on zero business activity, and because caches live behind the application they're easy to miss in a manual review.

From a budgeting standpoint this category behaves like other orphaned infrastructure: predictable in cost but sticky on the way down, because nobody wants to be the person who deleted the cache that turned out to be load-bearing. The right framing is that an idle cache is an ownership signal as much as a cost signal — it means a provisioned, billing resource has no clear owner who can confirm it's safe to remove. The dollar amount is small per cluster; the discipline gap it exposes is the thing worth tracking.

This lesson is for the finance partner who sees "ElastiCache" or "cache" on the cloud invoice and assumes a quiet cache is a cheap one. It walks through why a cache bills the same idle as busy, how the recurring cost behaves under your budgets, what a reasonable size for this category looks like, and the three levers you actually control — reporting cadence, ownership-as-a-budget-condition, and trend-over-absolute. By the end you'll know what number to ask for at the monthly review and what to push engineering on when it doesn't move.

Fun fact

The cache that outlived its app by a year

How a finance partner closes the loop

Sam is the finance partner embedded with the platform team. At the monthly cost review the engineering lead is walking through a flat data-services line, and Sam asks the question that's now standard on the agenda: "How many caches are sitting idle, and what's that costing us a month?" The dashboard says $290 across four ElastiCache clusters — small in absolute terms, but it's been roughly that every month for most of the year, so the cumulative is into the low thousands.

The conversation isn't technical. Sam doesn't ask about node types, Redis versus Valkey, or memory utilisation curves. She asks three things: who owns each cluster, how long it's been idle, and whether anything has connected to it in the last two weeks. The lead pulls the report — every cluster has an owner tag, the oldest has been idle eleven months, and connection counts are flat. That's enough to act. Engineering commits to a cleanup sprint with a one-week deadline; Sam adds a recurring line to the finance pack so the number can't drift back up unnoticed.

Two months later the same line reads $50 — one legitimate staging cache that's idle between releases, with sign-off. Sam knows that's the right floor, and she knows that if it creeps back above $200, ownership tagging is slipping somewhere and that's the conversation, not the dollar amount. The dollars are a leading indicator of a governance gap; the cleanup itself is almost incidental.

Why this matters to the budget, not just the bill

The per-resource impact is small and the aggregate is material. Idle caches typically sit in the hundreds-to-low-thousands of dollars a month for a mid-sized estate — call it well under 1% of total cloud spend. Not enough to move quarterly numbers alone, but enough that it keeps landing in the "unexplained variance" bucket every month and finance ends up chasing it across teams instead of focusing on the larger commitments.

The more important point is that this category has no natural floor unless someone enforces one. Unlike compute, there's no stopped state for a cache — the meter only stops on deletion — so an idle cache doesn't decay, it persists at full cost until a human acts. That makes it a pure function of operating discipline: the number is exactly as large as the gap between "service retired" and "someone remembered to delete its cache." When you see this line growing, you're not looking at a usage problem, you're looking at a cleanup-cadence problem.

The third impact is on chargeback credibility. An untagged idle cache is a real cost that can't be assigned to a business unit because the resource has no clear owner — so it gets absorbed centrally or argued about in meetings. Every untagged dollar here weakens the chargeback model. Cleaning these up is less about the dollar saving and more about keeping the cost-allocation report defensible.

Finally, treat it as a leading indicator. Sustained month-over-month growth in idle caches almost always means tagging or lifecycle conventions are slipping — the same drift that predicts bigger waste categories (oversized databases, idle compute, unused load balancers) trending the same way a few months later. Watch it as a signal, not a target to zero out.

What finance can actually do about this

Finance can't delete a cache, but it can set the conditions that keep the number small. Three levers, used together at the monthly cost cadence.

1. Put it on the monthly report as a standing line

Add "idle caches still billing" as a recurring line on the cost-review pack, with both the dollar amount and the count idle more than 14 days. The dollars are the headline; the count is the leading indicator. If either trends up two months running, that's the prompt to escalate — not the absolute size.

2. Make ownership tagging a precondition for budget approval

Agree that any untagged or unowned cache counts against the responsible team's budget regardless of who created it. That single rule does more for this category than anything else — it turns "clean up old caches" from an engineering chore into a budget-protection activity teams pursue on their own.

3. Bake a cleanup cadence into the FinOps operating model

Ask for sustained discipline, not heroic one-off purges. A short monthly review per team — walk through any cache idle more than 14 days and either confirm or retire it — keeps the number flat indefinitely. The cadence is cheap; the annual scramble that replaces it is expensive and rarely actually happens twice.

4. Treat the trend as the metric, not the absolute number

A residual is normal — some staging caches are legitimately idle between releases. The right question isn't "why isn't this zero?" but "is it staying flat?" A flat $200 a month with clean ownership is healthy; a growing $50-a-month is worse than a static $500, because it means the underlying discipline is slipping.

Quick quiz

Question 1 of 5

The idle-cache line has grown from $200 to $480 a month over six months. The total is small but the trend is up. As the finance partner, what's the right next move?

Keep learning

Go deeper into ElastiCache economics, the metrics that prove idleness, and the cleanup tooling around it.

You've finished the finance partner's view of idle ElastiCache clusters. You know why a quiet cache still bills in full, why this category has no natural floor without operating discipline, and what the three finance levers are — reporting cadence, ownership-as-a-budget-condition, and trend-not-absolute as the metric. Next time the line shows up at the monthly review, you'll have a sharper question to ask than "can we delete some of these?"

Back to the library

Idle ElastiCache clusters: the headline

Paying full price for reserved capacity nobody is using

Managed caches bill for the capacity you reserve, not the traffic they serve. An idle cache costs exactly the same as a busy one — so a cluster left running after its application retired keeps charging in full, month after month, for doing nothing. At scale across an organisation, that's a recurring drag on the cloud bill from infrastructure the business has already stopped using.

This is a hygiene and accountability issue, not a pricing one. Each flagged cluster is money spent on something with no clear owner and no active value. Cleaning them up is low-risk, immediate-saving work; the more durable outcome is the discipline it sets — every provisioned resource has an owner, a purpose, and a review date, not just a launch date.

A short read on a small but stubborn category of cloud waste, written for the exec who wants the headline and the one question to ask. You'll get the rule-of-thumb framing, what an idle-cache trend signals about wider cloud hygiene, and what "good" looks like at an org level — no commands, no internals.

Fun fact

The cache that outlived its app by a year

What it looks like when the org gets this right

At one company the quarterly cloud-cost review used to carry a recurring slide: "$290/month of idle cache infrastructure across 4 clusters." Small number, but it sat there every quarter — sometimes growing, never shrinking. The exec sponsor stopped asking about the dollar figure and started asking about the shape: "Why do we have any of these? Who owns this cache? Why is there no review date?"

Within two quarters the slide changed. The number was no longer the headline; the headline was "every data service has a named owner and a documented review date." The $290 dropped to a residual $50 of intentionally-idle staging caches with sign-off, and the cleanup stopped being an agenda item. The exec hadn't asked engineering to chase dollars — she'd asked them to fix the absence of ownership the dollars exposed. "Zero idle caches" was never the goal; "every resource has an owner and a date" was.

Why this is on the report at all

The dollar amount in this category is usually small. It's tracked because its size and trend say something about the discipline underneath. A small, flat, fully-tagged number means working ownership conventions and a normal cleanup cadence. A growing or large number means the inverse — resources provisioned without clear owners or review dates, nobody closing the loop, and almost certainly the same pattern in bigger spend categories that don't show up on the dashboard as cleanly.

There's a second-order risk too. A forgotten data store with stale ownership is as much an audit and security finding as a cost one — an idle Redis still has a network footprint and may still hold data. So this category sits at the intersection of cost, security posture, and operational hygiene. Most CFOs care about the first and most CIOs about the second; both should care about the underlying pattern the trend reveals.

The leadership move on this category

The handle for an executive isn't to drive the dollar number — it's to set the operating norms that make the number a non-issue.

1. Insist that every resource has a named owner

Not a team, not a system — a person, recorded at creation and reviewed when people change roles. Cloud waste compounds because ownership goes ambiguous; making it unambiguous is the highest-leverage policy you can set.

2. Require a review or expiry date on creation

Every cache should carry a date by which someone confirms it's still needed or lets it be cleaned up. This converts the question from "is this safe to delete?" (which everyone avoids) to "is this still needed?" (which is easy to answer), and it's especially important for caches, which have no stopped state to fall back on.

3. Make this a confidence signal at the leadership review

Ask for the trend, not the dollar: "Is the idle-cache line flat or growing?" is a one-minute item that tells you whether cloud governance is working without any technical depth. Flat for three quarters running means the hygiene is healthy and the team can spend its attention elsewhere.

Quick quiz

Question 1 of 5

You're reviewing the cloud cost pack and see the idle-cache category has been roughly flat for three quarters at around $250/month, with every cache carrying an owner tag. What's the right read?

Keep learning

Go deeper into ElastiCache economics, the metrics that prove idleness, and the cleanup tooling around it.

That's the lesson. Two takeaways worth holding onto: a cache bills the same idle as busy, and the size of this category is a hygiene signal, not the headline. The leadership question is about ownership and review dates — not about the dollar amount.

Back to the library

Part of the learning path Kill idle waste

Delete idle ElastiCache clusters

Idle ElastiCache clusters: the basics

The cache that outlived its app by a year

Retiring an idle cache in action

How ElastiCache billing actually worksdeep dive

What's the impact of leaving idle ElastiCache clusters running?

How do you retire idle ElastiCache clusters safely?

1. Confirm idleness over a real 14-day window

2. Identify the owning application before touching anything

3. Snapshot (if you can), then delete the replication group

4. Clean up the orphaned plumbing so it can't be reused by mistake

Quick quiz

Keep learning

Idle ElastiCache clusters: what it means for the bill

The cache that outlived its app by a year

How a finance partner closes the loop

Why this matters to the budget, not just the bill

What finance can actually do about this

1. Put it on the monthly report as a standing line

2. Make ownership tagging a precondition for budget approval

3. Bake a cleanup cadence into the FinOps operating model

4. Treat the trend as the metric, not the absolute number

Quick quiz

Keep learning

Idle ElastiCache clusters: the headline

The cache that outlived its app by a year

What it looks like when the org gets this right

Why this is on the report at all

The leadership move on this category

1. Insist that every resource has a named owner

2. Require a review or expiry date on creation

3. Make this a confidence signal at the leadership review

Quick quiz

Keep learning

Related cost lessons