Cost

Consolidate redundant NAT Gateways

Many VPCs run more NAT Gateways than they need — each one bills ~$32/month just to exist, so collapse the duplicates without breaking your AZ-resilience or cross-AZ data-transfer math.

12 min·10 sections·AWS

Last reviewed 27 May 2026

Redundant NAT Gateways: the basics

When does a VPC have more NAT Gateways than it needs?

A NAT Gateway lets resources in a private subnet reach the internet outbound without being reachable inbound. It bills in two parts: a flat $0.045 per hour — about $32.40 per gateway per month — just for being provisioned, plus $0.045 per GB of traffic processed. The hourly charge runs whether or not the gateway is the right number of gateways for the VPC. This lesson isn't about a NAT with zero traffic; it's about a VPC carrying more NATs than its design actually requires, each one quietly adding $32 to the bill.

The over-provisioning patterns are predictable. A NAT per private subnet when one per AZ would do. Two or more NATs in the same Availability Zone — pure duplication, zero added resilience. A dev or test VPC built from the production template that inherited a NAT-per-AZ layout it never needed. Or leftover NATs from an old subnet design where the routes were re-pointed but the original gateways were never torn down. In every case multiple gateways exist where fewer would carry the same traffic with the same availability.

It's flagged because the redundancy is invisible until someone groups the gateways by VPC and AZ and reads the route tables. The classic best practice — one NAT per AZ — exists for a reason, so consolidation is a genuine tradeoff rather than free money: it isn't always safe to collapse to a single gateway. The skill is telling true duplication from legitimate per-AZ redundancy, and knowing which environments tolerate which.

In this lesson you'll learn how to spot a VPC carrying more NAT Gateways than its design needs — duplicates within an AZ, a NAT-per-subnet sprawl, non-prod VPCs inheriting prod-grade redundancy — and how to consolidate them without breaking egress. You'll see how to group gateways by VPC and AZ to find duplication, how to read route tables to learn which subnets depend on which NAT, and the central tradeoff: collapsing to one NAT saves ~~$32/month per removed gateway but can add cross-AZ data-transfer charges (~~$0.01/GB each way) and shrink your failure domain to a single AZ.

Fun fact

Two NATs, one zone, zero extra resilience

An audit of a media company's main account found a VPC running four NAT Gateways: one each in us-east-1a and us-east-1b — and two in us-east-1c. The second us-east-1c gateway was created during a failed Terraform apply 18 months earlier; nothing routed to it, and even if something had, a second gateway in the same AZ adds no availability — an AZ outage takes both down together. It had billed roughly $580 in pure provisioning charges for redundancy that, by definition, could never exist. The fix was a one-line route check and a single delete-nat-gateway call.

Consolidating NAT Gateways in action

Lena runs quarterly VPC hygiene for a logistics company with 60-odd AWS accounts. Her tooling flags the orders-dev VPC: three NAT Gateways, one per AZ, on a development environment that only ever needs egress to pull packages and hit a couple of APIs. Three gateways is $97/month of provisioning for a sandbox that has no uptime SLA at all.

Before touching anything she groups every NAT in the VPC by AZ, then walks the route tables to learn which private subnets point at which gateway. Two findings: the per-AZ pattern was copied straight from the production template, and one of the three NATs has a route table whose subnets are all empty — a leftover from a subnet redesign last year.

The plan splits by environment. For orders-dev she keeps a single NAT and re-points the other two AZs' route tables at it — accepting the small cross-AZ transfer charge because dev egress volume is tiny and there's no resilience requirement. In production she does the opposite: she leaves the legitimate one-per-AZ layout alone and only removes the genuine duplicate and the orphaned NAT. Net: three gateways removed across the two VPCs, ~$97/month saved, no production failure domain touched.

First, list every NAT Gateway in the VPC and group by AZ — the fastest way to spot same-AZ duplicates and over-provisioning.

$ aws ec2 describe-nat-gateways --filter Name=vpc-id,Values=vpc-0orders123dev --query 'NatGateways[?State==`available`].{Nat:NatGatewayId,AZ:SubnetId,Created:CreateTime}' --output table

-----------------------------------------------------------------------

| DescribeNatGateways |

+----------------------+----------------------+-----------------------+

| Created | AZ | Nat |

+----------------------+----------------------+-----------------------+

| 2024-11-02 (1a) | subnet-0aaa (us-1a) | nat-0aaa111 |

| 2024-11-02 (1b) | subnet-0bbb (us-1b) | nat-0bbb222 |

| 2024-11-02 (1c) | subnet-0ccc (us-1c) | nat-0ccc333 |

+----------------------+----------------------+-----------------------+

# 3 NATs, one per AZ, on a DEV VPC with no uptime SLA — over-provisioned.

Grouping by AZ exposes both same-AZ duplicates and per-AZ sprawl on environments that don't need it.

Now find which subnets route through each NAT — you must re-point these before deleting, or egress breaks for anything that subnet hosts.

$ aws ec2 describe-route-tables --filters Name=route.nat-gateway-id,Values=nat-0ccc333 --query 'RouteTables[].{RT:RouteTableId,Subnets:Associations[?SubnetId!=null].SubnetId}'

[

{

"RT": "rtb-0c1c1c1c1c1",

"Subnets": ["subnet-0ccc"]

}

]

# subnet-0ccc routes through nat-0ccc333; re-point it to nat-0aaa111 first.

Read the route tables before deleting — re-point dependent subnets to the surviving NAT, then tear the redundant one down.

How NAT redundancy and the cross-AZ trap actually workdeep dive

A NAT Gateway is a managed Elastic Network Interface that lives inside a single Availability Zone. That single-AZ scope is the whole reason the per-AZ pattern exists: if the AZ hosting your only NAT goes down, every private subnet that routes through it loses egress, even subnets in healthy AZs. Running one NAT per AZ — with each AZ's private subnets routing to the NAT in their own AZ — means a single AZ failure only takes out that AZ's workloads, which were already affected. This is genuine resilience, not waste, which is why blindly collapsing to one gateway is wrong for production.

The second reason the same-AZ routing matters is money, and it's the trap of careless consolidation. When a private subnet in us-east-1b routes to a NAT in us-east-1a, every byte crosses an AZ boundary and incurs cross-AZ data-transfer charges of ~$0.01/GB each way — once on the way to the NAT and again on the return path. Collapse three NATs to one and you save 2 × $32.40 = $64.80/month in provisioning, but you've just routed two AZs' worth of traffic across zone boundaries. A workload pushing 4 TB/month through that single NAT now pays roughly $80/month in cross-AZ transfer it didn't pay before — more than the provisioning you saved.

So the decision is data-volume- and environment-dependent. Same-AZ duplicates (two NATs in one AZ) are always safe to consolidate — the second adds neither resilience nor different routing. NATs that no subnet routes to are always safe to delete. Per-AZ NATs in production should usually stay. Per-AZ NATs in low-volume non-prod are the sweet spot for consolidation: the cross-AZ charge is negligible at low traffic and there's no resilience requirement to protect.

# Group every NAT Gateway in an account by VPC and AZ to spot duplicates fast.
aws ec2 describe-nat-gateways \
  --filter Name=state,Values=available \
  --query 'NatGateways[].{VPC:VpcId, Subnet:SubnetId, Nat:NatGatewayId}' \
  --output text | sort

# For a candidate NAT, list every subnet whose route table points at it.
# These must be re-pointed to the surviving NAT BEFORE deletion, or egress breaks.
aws ec2 describe-route-tables \
  --filters Name=route.nat-gateway-id,Values=nat-0ccc333 \
  --query 'RouteTables[].Associations[?SubnetId!=null].SubnetId'

What's the impact of running redundant NAT Gateways?

The direct cost is $32.40 per redundant gateway per month. One duplicate is rounding error; a fleet-wide pattern is not. A 60-account org that copies a 3-NAT-per-VPC template onto every environment — including dozens of dev, staging, and sandbox VPCs that need no AZ resilience — is carrying hundreds of gateways, a large fraction of them redundant by design. At $32/month each, trimming non-prod from three gateways to one across 40 VPCs is ~$2,600/month of pure provisioning saved with no functional change.

The most expensive mistake here is over-consolidating and triggering cross-AZ data transfer. If a team collapses three NATs to one to chase the provisioning saving, every byte from the other two AZs' subnets now crosses zone boundaries at ~$0.01/GB each way. For a chatty, high-egress workload this variable charge can dwarf the $64/month they saved — turning a tidy win into a net loss that only shows up on next month's data-transfer line, where nobody's looking.

There's a resilience cost that doesn't appear on the bill at all. Collapse production to a single NAT and you've created a single-AZ failure domain for egress: an AZ outage now takes down outbound connectivity for every private subnet, including those in AZs that are otherwise healthy. The $64/month saved is invisible against the cost of an avoidable multi-AZ egress outage during an incident. This is why production per-AZ NATs are usually left alone — the redundancy is the point.

Finally, NAT sprawl clutters the network the way any unmanaged primitive does. Extra gateways mean extra Elastic IPs, extra route-table entries, and a more complicated diagram to reason about during an incident or a PCI/SOC 2 audit. Consolidating the genuinely redundant ones isn't only a cost story — it's a simpler, more defensible network.

How do you consolidate NAT Gateways safely?

Consolidation is a four-step loop per VPC: map the gateways by AZ, decide the target topology for that environment, re-point routes to the survivor, then delete the redundant gateways and release their EIPs. The decision is environment-dependent — aggressive in non-prod, surgical in prod.

1. Map every NAT by VPC and AZ to find true redundancy

Group all gateways by VPC and Availability Zone. Two NATs in the same AZ are always redundant — the second adds no resilience and no different routing. A NAT-per-subnet layout where one-per-AZ would do is over-provisioned. Any NAT that no route table points at is dead. These three categories are the consolidation candidates; per-AZ NATs serving live subnets in their own AZ are not, until you've decided the target topology.

2. Pick the target topology per environment, not globally

Production usually keeps one NAT per AZ — the redundancy protects against a zone outage and keeps traffic same-AZ to avoid transfer charges. Non-prod (dev, staging, sandbox) can almost always collapse to a single NAT: there's no uptime SLA, and low egress volume makes the resulting cross-AZ transfer charge negligible. Decide the target per VPC before touching anything, and write it down so the next audit doesn't re-flag a deliberate choice.

3. Re-point dependent subnets to the surviving NAT first

For every subnet that routes through a NAT you intend to delete, update its route table's 0.0.0.0/0 route to point at the surviving gateway — before deletion, not after. Use replace-route, confirm egress still works from a test instance, then proceed. Re-pointing a subnet to a NAT in a different AZ is the moment cross-AZ transfer starts billing; that's fine in low-volume non-prod and worth re-checking in anything chatty.

4. Delete the redundant NAT, release its EIP, then watch

Run delete-nat-gateway, wait for deleted, then release-address on its Elastic IP — an orphaned EIP keeps billing ~$3.65/month. Watch CloudWatch and app alerting for 24–48 hours in case a long-tail dependency surfaces. Tag the change in your log with the target topology decision so the gateway isn't quietly recreated by the next Terraform apply.

# 1. Re-point the dependent subnet's route to the SURVIVING NAT, before deleting.
aws ec2 replace-route \
  --route-table-id rtb-0c1c1c1c1c1 \
  --destination-cidr-block 0.0.0.0/0 \
  --nat-gateway-id nat-0aaa111

# 2. Now the redundant NAT has no dependents — delete it.
aws ec2 delete-nat-gateway --nat-gateway-id nat-0ccc333
aws ec2 wait nat-gateway-deleted --nat-gateway-ids nat-0ccc333

# 3. Release its Elastic IP, or it keeps billing ~$3.65/month standalone.
aws ec2 release-address --allocation-id eipalloc-0xxxxxxxxx

Quick quiz

Question 1 of 5

A production VPC runs three NAT Gateways — one each in us-east-1a, 1b, and 1c — plus a fourth in 1c left over from a failed deploy. Each AZ's private subnets route to the NAT in their own AZ; nothing routes to the fourth. The workload is high-egress. What's the right move?

Keep learning

Go deeper into NAT Gateway topology, the per-AZ pattern, and the data-transfer economics that shape consolidation.

You've completed Consolidate redundant NAT Gateways. You now know how to spot a VPC carrying more gateways than it needs — same-AZ duplicates, per-subnet sprawl, non-prod inheriting prod-grade redundancy, and orphans nothing routes to — and the four-step loop to consolidate safely: map by AZ, choose the target topology per environment, re-point routes, then delete and release. Most importantly, you can weigh the ~$32/month-per-gateway saving against the cross-AZ data-transfer charge and the single-AZ failure domain, so you consolidate aggressively in non-prod and surgically in prod.

Back to the library

Redundant NAT Gateways: what it means for the bill

Paying for resilience the environment doesn't actually need

A NAT Gateway is a piece of network plumbing that lets internal systems reach the internet. The relevant fact for finance is that each one carries a fixed charge of roughly $32 a month before a single byte of traffic flows through it. That charge appears whether the gateway is necessary or duplicated. This finding isn't about gateways doing nothing — it's about VPCs (the isolated networks workloads run in) that are carrying more of these gateways than their design needs, so you're paying the fixed fee several times for capacity one gateway could provide.

It happens because there's a sensible engineering default — run one gateway in each data-centre zone so a single zone failure can't take the whole network offline — and teams apply that default everywhere, including test environments and small workloads where the extra zones add cost but no real protection. So you end up paying for production-grade redundancy on a dev sandbox, or running two gateways in the same zone where one would do. Individually it's $32 a month; across an organisation with hundreds of these networks it lands in five figures a year for resilience nothing is using.

The important nuance — and the reason this isn't a simple delete — is that the redundancy is real in production: the per-zone gateways exist to keep critical systems online and, just as importantly, to avoid an internal data-transfer surcharge (~$0.01/GB) that gets added when traffic from one zone has to cross to a gateway in another. Consolidating too aggressively can replace a $32 fixed saving with a larger variable transfer bill, or weaken a production failure boundary. So the right framing for finance is environment-by-environment: aggressive consolidation in non-production, careful pruning of genuine duplicates in production.

This lesson is for the finance partner who sees a flat networking line and wants to know whether it reflects necessary resilience or duplicated defaults. It explains why each gateway is a fixed monthly charge, why the per-zone redundancy is genuinely worth paying for in production but wasteful in test, and how aggressive consolidation can backfire by adding a variable internal-transfer charge. By the end you'll know the question to ask at the monthly review — "are these gateways sized to the environment's actual importance?" — and how to read the answer.

Fun fact

Two NATs, one zone, zero extra resilience

How a finance partner frames the tradeoff

Raj is the finance partner for the platform org at a logistics company. At the monthly cost review the networking line is flat, and he asks the question that's now standard: "Of our NAT Gateways, how many are there for production resilience versus duplicated defaults we don't need?" The platform lead pulls the report: across the dev and staging accounts there are 40-odd gateways, and the per-zone pattern was applied uniformly — including on environments with no uptime requirement at all.

The conversation isn't about which AZ or which route. Raj asks three things: which of these environments actually need zone-level resilience, what the saving is if non-prod drops to one gateway each, and — critically — whether consolidating adds any internal data-transfer charge that eats the saving. The lead confirms dev/staging egress is low-volume, so the cross-zone surcharge is negligible there; production stays as-is. That's enough to act: a consolidation pass on non-prod, production left untouched.

The point Raj holds onto is that this is a resilience-sizing decision dressed as a cost line. He doesn't push to minimise gateways everywhere — that would trade a $32 fixed saving for a larger variable transfer bill or a weaker production boundary. He pushes to make the redundancy match the environment's importance, and he adds a recurring line to the finance pack so non-prod doesn't silently drift back to the expensive default.

Why this matters to the budget, not just the bill

Per resource the number is small and in aggregate it's material. An org applying a uniform per-AZ default across many environments typically carries a few thousand dollars a month of NAT provisioning, a meaningful slice of it redundant. It rarely moves a quarterly number alone, but it's the kind of steady, defensible saving that builds credibility for the FinOps function — and it recurs every month until someone acts.

The budgeting subtlety is that the saving is not risk-free, and finance needs to hold that distinction. Cutting a gateway saves a fixed ~$32/month, but if the consolidation pushes traffic across AZ boundaries it adds a variable charge that scales with usage. So a 'saving' booked in provisioning can quietly reappear, larger, on the data-transfer line. When engineering proposes a consolidation, the right finance question is "does this add any cross-AZ transfer, and at what volume?" — not just "how many gateways does it remove?"

The second impact is on how resilience spend is justified. Production per-AZ NATs are a deliberate availability investment; non-prod copies of them are an accident of templating. Treating both the same — either cutting both or sparing both — is the error. A defensible budget distinguishes resilience the business is choosing to pay for from redundancy it inherited by default, and only the second is a saving.

Finally, it's a leading indicator of templating discipline. If non-prod environments are quietly inheriting production-grade networking, the same uniform-default habit is almost certainly inflating other categories — oversized instances, prod-tier databases, multi-AZ everything — in environments that don't need them. Watch this line as a proxy for whether the org sizes infrastructure to purpose.

What finance can actually do about this

Finance can't re-point a route, but it can make sure consolidation is sized to each environment and that the saving is real and not just shifted to another line. Four levers at the monthly cadence.

1. Track NAT count split by prod vs non-prod

Add NAT Gateway count and cost to the cost pack, broken out by environment tier. The number that matters is gateways on non-production environments running the per-AZ pattern — that's the consolidation opportunity. Production count should be stable and is not a target; flag it only if it's growing without new workloads.

2. Always ask whether consolidation adds cross-AZ transfer

When engineering proposes removing gateways, the standing question is "does this push traffic across AZ boundaries, and at what volume?" A provisioning saving that reappears as a larger data-transfer charge isn't a saving. Ask for both lines — gateways removed and projected transfer impact — before counting the win.

3. Make environment-appropriate resilience a budget norm

Agree that non-prod environments default to minimal NAT (single gateway) unless a specific need is documented, and that production keeps per-AZ. Encoding the norm — lean by default in non-prod, deliberate in prod — stops the expensive template from silently propagating and turns consolidation into the steady state rather than a one-off.

4. Treat the trend, not the absolute, as the signal

A stable production NAT count with lean non-prod is healthy. A non-prod gateway count creeping up month over month means the per-AZ default is leaking back in through templating — that's the prompt to raise it, regardless of the small absolute dollars, because the same habit inflates bigger categories.

Quick quiz

Question 1 of 5

Engineering proposes collapsing a chatty production VPC from three per-AZ NAT Gateways to one to save ~$65/month in provisioning. As the finance partner, what's the right response?

Keep learning

Go deeper into NAT Gateway topology, the per-AZ pattern, and the data-transfer economics that shape consolidation.

You've finished the finance partner's view of redundant NAT Gateways. You know each gateway is a fixed monthly charge, why the per-zone redundancy is worth paying for in production but wasteful when copied onto non-prod, and why a provisioning saving can quietly reappear as a larger cross-AZ transfer charge. Next time the networking line comes up, you'll ask whether resilience is sized to the environment and whether the saving is net rather than shifted.

Back to the library

Redundant NAT Gateways: the headline

Resilience spend applied where it isn't earning its keep

Cloud networks use small managed gateways to let internal systems reach the internet, and each one carries a fixed monthly charge. The default engineering pattern runs one per data-centre zone for resilience. When that pattern gets copied onto environments that don't need it — test systems, small workloads, old layouts never cleaned up — the business pays the resilience premium several times over with nothing protecting it.

This is a right-sizing-of-resilience question, not pure waste. The per-zone pattern is correct for production and protects against real failures, so the move isn't to strip it out everywhere — it's to make sure the level of redundancy matches the importance of each environment. Done well, it trims a recurring cost and signals that the org is matching its spend to actual risk rather than applying one expensive default to everything.

A short read on a recurring networking cost that's really a question about matching resilience spend to risk. You'll get the headline, the one question to ask, and what the answer signals about how disciplined the org is at applying expensive defaults — no commands, no internals.

Fun fact

Two NATs, one zone, zero extra resilience

What it looks like when the org gets this right

At one company the cloud review used to carry a flat networking number that nobody questioned — it was "just infrastructure." When an exec finally asked what was inside it, the answer was that a single expensive resilience pattern had been applied uniformly to every environment, production and throwaway alike, because it was the template default.

The leadership move wasn't "cut networking spend." It was "make resilience proportional to importance." Production kept its per-zone redundancy; everything non-critical dropped to the minimum that still worked. The recurring saving was real but modest — the bigger win was that the org stopped paying production prices for sandbox risk, and the same proportionality thinking spread to other duplicated defaults.

That's the right outcome state. "Fewest possible gateways" is the wrong goal — it would weaken production or add transfer cost. "Resilience matched to the value of what it protects" is the right one, and once it's the norm the line stops being a question.

Why this is on the report at all

The dollar amount is modest; the reason it's tracked is what it reveals about discipline. A network where resilience spend is proportional to importance — production protected, throwaway environments lean — signals an org that sizes infrastructure to purpose. A network where one expensive default is stamped everywhere signals the opposite, and that same habit is almost certainly inflating bigger categories that don't show up as cleanly.

There's a risk dimension too: this is explicitly not a 'cut to the minimum' exercise. The per-zone redundancy protects production from a zone failure, and over-consolidating to save a few dollars can create an avoidable outage or shift cost onto an internal-transfer line. So the leadership read is balance — resilience matched to risk — not minimisation. That balance is the thing worth confirming, not the dollar figure.

The leadership move on this category

The handle for an executive isn't to minimise gateways — it's to set the norm that resilience spend is proportional to what it protects.

1. Require resilience to match environment importance

Production gets per-zone redundancy; throwaway environments get the minimum that works. Make 'don't stamp the expensive default everywhere' an explicit expectation, not a thing teams discover during an audit. This single principle generalises well beyond NAT Gateways.

2. Insist a saving is net, not shifted

When a team reports a consolidation saving, ask whether it accounts for any new internal data-transfer charge. The lesson here — a fixed cost cut can reappear larger as a variable cost — is one worth your team internalising across cost work, not just networking.

3. Read the trend as a discipline signal

Ask one question at the review: "Is our resilience spend proportional to risk, and is that holding over time?" A stable, proportional answer means infrastructure is being sized to purpose. Drift back toward uniform defaults is the early warning that the same habit is inflating spend elsewhere.

Quick quiz

Question 1 of 5

The cost pack shows production NAT count flat and matched to AZs, while non-prod has dropped to a single gateway per VPC after a consolidation pass. What's the right read?

Keep learning

Go deeper into NAT Gateway topology, the per-AZ pattern, and the data-transfer economics that shape consolidation.

That's the lesson. Two takeaways: cloud networks copy an expensive resilience default everywhere unless someone checks, and the goal isn't the fewest gateways — it's resilience proportional to what it protects. The leadership question is about that proportionality and whether it holds over time, not the dollar amount.

Back to the library

Part of the learning path Trim your network spend

Consolidate redundant NAT Gateways

Redundant NAT Gateways: the basics

Two NATs, one zone, zero extra resilience

Consolidating NAT Gateways in action

How NAT redundancy and the cross-AZ trap actually workdeep dive

What's the impact of running redundant NAT Gateways?

How do you consolidate NAT Gateways safely?

1. Map every NAT by VPC and AZ to find true redundancy

2. Pick the target topology per environment, not globally

3. Re-point dependent subnets to the surviving NAT first

4. Delete the redundant NAT, release its EIP, then watch

Quick quiz

Keep learning

Redundant NAT Gateways: what it means for the bill

Two NATs, one zone, zero extra resilience

How a finance partner frames the tradeoff

Why this matters to the budget, not just the bill

What finance can actually do about this

1. Track NAT count split by prod vs non-prod

2. Always ask whether consolidation adds cross-AZ transfer

3. Make environment-appropriate resilience a budget norm

4. Treat the trend, not the absolute, as the signal

Quick quiz

Keep learning

Redundant NAT Gateways: the headline

Two NATs, one zone, zero extra resilience

What it looks like when the org gets this right

Why this is on the report at all

The leadership move on this category

1. Require resilience to match environment importance

2. Insist a saving is net, not shifted

3. Read the trend as a discipline signal

Quick quiz

Keep learning

Related cost lessons