Cost

Right-size Lambda provisioned concurrency

Provisioned concurrency keeps execution environments warm to kill cold starts — but you pay for every warm second whether traffic uses it or not. Set it wrong and you're heating an empty room around the clock.

13 min·10 sections·AWS

Last reviewed 27 May 2026

Right-sizing provisioned concurrency: the basics

Why warm capacity is a second meter you opted into

On-demand Lambda is pay-per-use: an environment spins up when a request arrives, and a brand-new environment incurs a cold start — the time to download code, start the runtime, and run your init code. Provisioned concurrency (PC) is the fix for latency-sensitive functions: you tell Lambda to keep N execution environments initialised and warm at all times, so requests skip the cold start entirely. The catch is that this is a separate, always-on price dimension. In US-East-1 you pay roughly $0.0000041667 per GB-second of provisioned concurrency for every second the capacity is reserved — whether a single request ever touches it or not — and you still pay the normal per-invocation duration cost on top when requests do arrive.

That changes the cost model fundamentally. On-demand Lambda scales to zero: no traffic, no bill. PC does not scale to zero — it bills 86,400 seconds a day per reserved environment, times the function's memory in GB, times the PC rate. A 1024 MB function with 50 units of PC reserved 24/7 is about $450 a month in PC charges before a single invocation. If that function only sees real concurrency of 5–8 during business hours and nothing overnight, you're paying to keep 40-plus environments warm that traffic never reaches.

It's flagged because PC is almost always provisioned for the peak and then forgotten. Someone sets it during a launch to guarantee p99 latency, picks a comfortable round number, and never revisits it as traffic patterns settle. The signal to watch is ProvisionedConcurrencyUtilization — the fraction of warm environments actually serving traffic. Persistently low utilization (say, under 40-50%) means most of the warm capacity is idle heat: you're carrying the cold-start insurance for concurrency you don't have, every second of every day.

In this lesson you'll learn how provisioned concurrency differs from on-demand Lambda, the separate per-GB-second price you pay for warm capacity around the clock, and how to read ProvisionedConcurrencyUtilization and ProvisionedConcurrencySpilloverInvocations to tell whether you're over- or under-provisioned. You'll see the real CLI to inspect utilization and set the PC count, how to use Application Auto Scaling — scheduled or target-tracking — to ramp warm capacity up only during peak windows instead of 24/7, and when to drop PC entirely: latency-tolerant workloads where cold starts are fine, or Java functions where SnapStart is the cheaper fix.

Fun fact

The reservation that worked weekends for free

A payments team set 100 units of provisioned concurrency on a checkout-validation function ahead of a Black Friday launch, sizing for the busiest expected minute. The launch went fine — and the reservation stayed at 100, 24/7, for the next eleven months. Their own dashboards later showed average ProvisionedConcurrencyUtilization of 11%: outside a two-hour weekday lunch spike, real concurrency rarely cleared 12. At 1536 MB that idle 89% was burning roughly $4,000 a month to keep environments warm that essentially never took a request on evenings or weekends. The fix wasn't to delete the safety net — it was a scheduled scaling action that set PC to 15 overnight and weekends and ramped to 60 for the lunch window, cutting the bill by two-thirds with no latency regression.

Right-sizing provisioned concurrency in action

Dana runs the platform team at a fintech. A cost review flags one Lambda — a real-time fraud-scoring function on the checkout path — carrying about $3,800 of monthly provisioned-concurrency charges. It's configured with 80 units of PC at 1536 MB, reserved 24/7, and it was set during last year's product launch.

She pulls the ProvisionedConcurrencyUtilization metric for the last two weeks. The picture is stark: a weekday peak of about 55% concurrent usage between 9am and 6pm, dropping to under 8% overnight and barely 12% at weekends. Spillover invocations — requests that exceeded the warm pool and fell back to cold starts — are essentially zero, which confirms 80 is well above what even the peak needs.

Dana doesn't just delete the reservation; the latency guarantee matters on a checkout path. She sets the steady reservation to 45 (covering the weekday peak with headroom) and adds two Application Auto Scaling scheduled actions: ramp to 45 at 8am UTC and back to 12 at 7pm UTC on weekdays, with a flat 12 over the weekend. Utilization climbs from ~25% blended to ~70%, spillover stays at zero, and the PC bill drops from $3,800 to about $1,400 a month — with p99 latency on the critical path unchanged.

First, read how much of the reserved warm capacity is actually being used. ProvisionedConcurrencyUtilization is the fraction (0-1) of warm environments serving traffic — pull the average and peak over the last week.

$ aws cloudwatch get-metric-statistics --namespace AWS/Lambda --metric-name ProvisionedConcurrencyUtilization --dimensions Name=FunctionName,Value=fraud-scoring Name=Resource,Value=fraud-scoring:live --start-time $(date -u -d '7 days ago' +%FT%TZ) --end-time $(date -u +%FT%TZ) --period 3600 --statistics Average Maximum --query 'Datapoints | sort_by(@,&Timestamp)[-4:]'

[

{ "Timestamp": "2026-05-25T02:00:00Z", "Average": 0.07, "Maximum": 0.11 },

{ "Timestamp": "2026-05-25T03:00:00Z", "Average": 0.06, "Maximum": 0.09 },

{ "Timestamp": "2026-05-25T13:00:00Z", "Average": 0.55, "Maximum": 0.62 },

{ "Timestamp": "2026-05-25T14:00:00Z", "Average": 0.53, "Maximum": 0.60 }

]

# Peak ~55%, overnight ~6-7%. The reservation is sized for a peak it rarely hits and never drops.

Utilization is the ground truth: warm environments serving traffic versus warm environments billed. Low overnight = idle heat.

Confirm you're not under-provisioned before cutting. ProvisionedConcurrencySpilloverInvocations counts requests that exceeded the warm pool and fell back to on-demand (cold-start) execution. Near-zero means there's room to trim.

$ aws cloudwatch get-metric-statistics --namespace AWS/Lambda --metric-name ProvisionedConcurrencySpilloverInvocations --dimensions Name=FunctionName,Value=fraud-scoring Name=Resource,Value=fraud-scoring:live --start-time $(date -u -d '7 days ago' +%FT%TZ) --end-time $(date -u +%FT%TZ) --period 86400 --statistics Sum

{

"Datapoints": [

{ "Timestamp": "2026-05-24T00:00:00Z", "Sum": 0.0 },

{ "Timestamp": "2026-05-25T00:00:00Z", "Sum": 0.0 }

"Label": "ProvisionedConcurrencySpilloverInvocations"

}

# Zero spillover at 80 units + low utilization = clear over-provisioning. Safe to right-size and schedule.

Spillover is the safety check: zero spillover plus low utilization confirms you can cut the reservation without forcing cold starts.

Right-sizing provisioned concurrency under the hooddeep dive

Provisioned concurrency is billed on its own price dimension, distinct from on-demand invocation cost. In US-East-1 you pay roughly $0.0000041667 per GB-second of PC for every second a unit is reserved, plus a reduced duration charge (about $0.0000097222 per GB-second on x86) for invocations that run on that warm capacity, plus the standard per-request fee. The reservation charge is the part that doesn't sleep: one PC unit on a 1024 MB function held for a full 30-day month is roughly 1 GB x 2,592,000 seconds x $0.0000041667 ≈ $10.80 a month before any traffic. Multiply by the unit count and you have a fixed monthly floor that on-demand Lambda never has, because on-demand scales to zero and PC does not.

The two metrics that tell you whether the floor is right-sized are ProvisionedConcurrencyUtilization and ProvisionedConcurrencySpilloverInvocations. Utilization is a 0-1 ratio: the number of warm environments actively running an invocation divided by the number reserved. Persistently low utilization means you're paying to keep environments warm that traffic never reaches. Spillover counts invocations that arrived when all warm environments were busy and therefore ran as on-demand cold starts — it's the under-provisioning signal. The sweet spot is high utilization (you're using what you pay for) with near-zero spillover (you're not forcing cold starts on the requests you provisioned to protect). Both are emitted per function version or alias, which is why the CloudWatch dimensions include the Resource qualifier alongside FunctionName.

Because demand is rarely flat, the cost-efficient move is usually to make the reservation move with traffic rather than sizing it for a 24/7 peak. Application Auto Scaling registers the function's PC as a scalable target and supports two policies: scheduled actions (set PC to a value at a cron time — ideal for predictable daily/weekly patterns) and target-tracking (Application Auto Scaling watches utilization and adds or removes warm capacity to hold it near a target like 0.7). Scheduled scaling is cheapest and most predictable when the traffic shape is known; target-tracking adapts to noisier patterns but reacts with a lag, so it's paired with a floor that covers baseline. For latency-tolerant workloads the right answer is often to drop PC entirely and accept cold starts; for Java specifically, SnapStart restores a snapshot of an initialised environment at no extra reservation charge, making PC unnecessary for most Java cold-start problems.

# Inspect the current provisioned-concurrency config on an alias.
aws lambda get-provisioned-concurrency-config \
  --function-name fraud-scoring \
  --qualifier live \
  --query '{Requested:RequestedProvisionedConcurrentExecutions,Available:AvailableProvisionedConcurrentExecutions,Status:Status}'

# Set the steady reservation to a right-sized value covering the weekday peak.
aws lambda put-provisioned-concurrency-config \
  --function-name fraud-scoring \
  --qualifier live \
  --provisioned-concurrent-executions 45

# Register PC as a scalable target so Application Auto Scaling can ramp it on a schedule.
aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:fraud-scoring:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --min-capacity 12 --max-capacity 45

What is the impact of over-provisioned concurrency?

The direct cost is a fixed monthly floor that runs whether or not traffic uses it. Unlike on-demand Lambda — where idle is free — every reserved PC unit bills around the clock at the reservation rate. At 1024 MB that's about $10.80 a month per unit before a single invocation; at 1536 MB it's roughly $16.20. A function over-provisioned at 80 units when 30 would cover peak is wasting 50 units — $540 to $810 a month on that one function — every month, all year, on capacity that never serves a request.

The waste concentrates exactly where on-demand thinking misleads people. Engineers reason about Lambda as pay-per-use and assume quiet hours are cheap; PC inverts that, so the most expensive hours are the idle ones, when utilization collapses but the reservation keeps billing. A function reserved for a weekday lunch spike pays peak rates through every night and weekend — often two-thirds of the calendar — for capacity sitting cold-but-billed. Scheduling the reservation to match the traffic shape, rather than the peak, is where most of the saving lives.

There's a commitment angle too. Provisioned concurrency is covered by Compute Savings Plans, which discount the PC dimension alongside on-demand Lambda, Fargate, and EC2. If you commit a Savings Plan against an over-provisioned PC baseline, you lock a discount onto idle warm capacity — the same stranded-commitment trap as reserving an oversized EC2 fleet. The reservation should be right-sized and scheduled first, so any Savings Plan sits on the efficient warm-capacity run-rate rather than the inflated one.

Finally, over-provisioning hides the real latency question. When a function is slow, the reflexive fix is "add more provisioned concurrency," and a generous reservation papers over genuine init-time problems — heavy SDK initialisation, large deployment packages, slow VPC ENI attachment — that more warm environments only mask at increasing cost. Worse, PC is sometimes reached for on workloads that don't need it at all: latency-tolerant batch or async functions where a cold start is irrelevant, or Java functions where SnapStart would solve cold starts for free. Each of those is a reservation bill with no latency justification underneath it.

How do you right-size provisioned concurrency safely?

Right-sizing PC is a four-step loop that runs on the FinOps cadence: find the functions carrying warm-capacity charges, measure their utilization and spillover, right-size and schedule the reservation to the real traffic shape, and re-check whether the workload needs PC at all.

1. Find every function with provisioned concurrency and its cost

List the functions and aliases that carry PC configs across every region and account, with the reserved unit count, memory size, and estimated monthly reservation cost (units x memory-GB x ~$0.0000041667 x seconds in month). PC is opt-in and per alias, so the list is short — usually a handful of latency-critical functions. Rank by reservation cost, not invocation volume, because the bill follows reserved units x memory x time, independent of how much traffic actually hits them.

2. Measure utilization and spillover before changing anything

Pull ProvisionedConcurrencyUtilization (the fraction of warm capacity in use) and ProvisionedConcurrencySpilloverInvocations (requests that overflowed to cold starts) over at least one full traffic cycle — a week covering weekdays and weekend. Low utilization with zero spillover means you're over-provisioned and can cut safely. High utilization with rising spillover means you're under-provisioned and should hold or raise the floor. The daily and weekly shape of utilization is what tells you whether to schedule.

3. Right-size the steady value, then schedule with Application Auto Scaling

Set the steady reservation to cover the weekday peak with modest headroom (use the peak utilization, not the average). Then register the alias as a scalable target and add scheduled actions to ramp capacity up before the busy window and down for nights and weekends — or target-tracking to hold utilization near ~0.7 for noisier patterns, paired with a min-capacity floor for baseline. Scheduling against a known daily shape captures the bulk of the saving because it stops paying peak rates through the two-thirds of the week that is quiet.

4. Ask whether the workload needs PC at all

Provisioned concurrency only earns its cost on functions where cold-start latency is genuinely user-visible and SLA-bound. For latency-tolerant async, batch, or queue-driven functions, drop PC entirely and accept cold starts — the saving is the whole reservation. For Java functions, SnapStart restores a pre-initialised snapshot at no reservation charge and removes most cold-start pain for free, so it's almost always the better lever than PC. Re-confirm the justification on a cadence: a function that needed PC at launch may not need it once traffic and init time have settled.

# Add scheduled scaling: ramp warm capacity to 45 for the weekday peak, down to 12 overnight.
aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:fraud-scoring:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name ramp-up-weekday-peak \
  --schedule 'cron(0 8 ? * MON-FRI *)' \
  --scalable-target-action MinCapacity=45,MaxCapacity=45

aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:fraud-scoring:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name ramp-down-overnight \
  --schedule 'cron(0 19 ? * MON-FRI *)' \
  --scalable-target-action MinCapacity=12,MaxCapacity=12

Quick quiz

Question 1 of 5

A checkout function reserves 80 units of provisioned concurrency at 1536 MB, 24/7. Over a week, ProvisionedConcurrencyUtilization peaks at 55% on weekday afternoons and sits under 8% overnight, with zero spillover invocations. What's the right move?

Keep learning

Dig deeper into provisioned concurrency pricing, the scaling tooling, and the cold-start alternatives.

You've completed Right-size Lambda provisioned concurrency. You now know how PC differs from on-demand Lambda, why warm capacity bills around the clock on its own price dimension, how to read ProvisionedConcurrencyUtilization and ProvisionedConcurrencySpilloverInvocations, and the four-step loop — find the provisioned functions, measure utilization and spillover, right-size and schedule with Application Auto Scaling, then re-check whether the workload needs PC at all. The next time a cost review flags warm-capacity charges, you'll have a defensible path from "flagged" to "right-sized" without touching the latency guarantee that matters.

Back to the library

Right-sizing provisioned concurrency: what it means for the bill

A serverless charge that runs around the clock even at zero traffic

Normally the appeal of serverless is that it costs nothing when nothing is running — you pay per execution, and an idle function is free. Provisioned concurrency breaks that rule on purpose. To make a function respond instantly with no warm-up delay, the team reserves a fixed number of always-ready environments. Reserving them costs money every second they're held, around the clock, whether or not any customer request ever uses them. So a function can have light overnight traffic and still bill steadily through the night for capacity sitting idle.

The non-obvious part is that this is a deliberate trade the team made for speed, and it's easy to over-buy. Reserving for the busiest minute of the busiest day and leaving that reservation in place 24/7 means paying peak rates during quiet hours and weekends. The right setting isn't "as much as possible to be safe" and it isn't zero — it's matched to the real concurrent demand, and ideally ramped up only during the hours that actually need it. A function reserved for peak but mostly idle is the serverless version of running a full data centre overnight for a workload that only matters at 10am.

From a budgeting standpoint the number to watch is utilization: what fraction of the reserved warm capacity is actually serving traffic. If a function reserves enough warm environments for 50 simultaneous requests but rarely exceeds 8, the other 42 are pure waste billing every hour. The right question at the cost review is not "how much do we spend on Lambda" but "which functions have provisioned concurrency, what's their utilization, and could the reservation follow the traffic instead of standing still at peak all day?"

This lesson is for the finance partner who sees "Lambda" on the invoice and doesn't realise part of it can bill continuously like a reserved server. It explains provisioned concurrency without internals, why a serverless function can cost money at zero traffic, what good looks like as a number (high utilization, and reservations that ramp down off-hours), and the questions to ask at the monthly review: which functions reserve warm capacity, what's their utilization, and could the reservation follow the traffic. By the end you'll know what to push engineering on and what a healthy warm-capacity trend looks like.

Fun fact

The reservation that worked weekends for free

How a finance partner frames the warm-capacity line

Raj is the finance partner embedded with a fintech platform team. At the monthly cost review the Lambda line has a chunk that behaves oddly — it doesn't fall on weekends the way the rest of usage-based spend does. Instead of asking "why is Lambda flat at weekends," he asks the sharper version: "Which functions reserve warm capacity, and what's the utilization on it?" The engineering lead pulls it up: one fraud-scoring function reserves 80 always-on environments and hasn't been revisited since launch.

The conversation isn't technical. Raj doesn't ask about GB-seconds or scaling policies. He asks for one number — the percentage of that reserved warm capacity actually serving traffic — and whether the reservation could follow the traffic instead of standing at peak all day. The answer comes back: blended utilization is about 25%, and yes, the daily pattern is obvious enough to schedule against. That's enough to act.

A month later utilization on that function is around 70% and the warm-capacity charge has dropped by roughly two-thirds, with no latency trade-off to debate because spillover stayed at zero. Raj now tracks utilization on provisioned functions as a standing line, because the raw dollar total alone won't tell him whether warm capacity is well-sized. High utilization with scheduled ramps is the signal it's being managed; a reservation that bills flat through quiet hours is the prompt to ask which function hasn't been revisited.

Why this matters to the budget, not just the bill

The per-function impact is modest and the aggregate is real. Provisioned concurrency is usually a slice of the Lambda line, but it behaves unlike the rest of serverless: it's a fixed floor that doesn't fall when usage falls. A function over-provisioned by 50 units is several hundred dollars a month of pure idle reservation, and an org with a dozen latency-critical functions all sized for peak can carry five figures a month of warm capacity that traffic never reaches.

The unit to budget against is utilization, not the raw dollar total. A high PC bill on a function running at 70% utilization with scheduled ramps is money well spent — you're buying exactly the speed insurance you use. The same dollar amount at 15% blended utilization is mostly waste. The variance to chase is low utilization, especially the gap between weekday-peak utilization and the round-the-clock reservation, because that gap is the schedulable saving sitting in plain sight.

There's a commitment dimension finance owns directly. Compute Savings Plans discount the provisioned-concurrency dimension alongside on-demand Lambda, Fargate, and EC2. Committing against an over-provisioned, unscheduled baseline locks a discount onto idle warm capacity and strands part of the commitment the moment the function is right-sized. The sequencing rule mirrors EC2 right-sizing: right-size and schedule the reservation first, commit second, so the Savings Plan sits on the efficient run-rate.

Finally, treat the presence of always-on, unscheduled PC as a leading indicator. If the answer to "is this reservation scheduled to match traffic, and what's its utilization?" is "it's flat 24/7 and we've never checked," that usually correlates with set-and-forget habits in other categories. A team that right-sizes and schedules its warm capacity is signalling a healthy operating cadence; one that can't name its provisioned functions or their utilization is signalling the opposite, and this line is just where it surfaces.

What finance can actually do about this

Finance can't change a reservation, but it can set the conditions that keep warm capacity sized to demand. Three levers, used at the monthly cost cadence.

1. Track utilization on provisioned functions, not the raw total

Put utilization for each function carrying provisioned concurrency on the standing cost-review pack, alongside the reservation cost. The dollar amount tells you little on its own — a high bill at 70% utilization is fine. Low utilization, especially a wide gap between weekday-peak and the 24/7 reservation, is the prompt to ask why warm capacity isn't following the traffic.

2. Ask 'is this reservation scheduled, and does it need PC at all?'

Make 'scheduled to traffic' and 'still latency-justified' known attributes of each provisioned function, the way 'last reviewed' is for a budget. A flat 24/7 reservation on a function with an obvious daily traffic shape is a quick scheduling win; a reservation on a latency-tolerant or Java function is often removable outright. The questions, asked routinely, keep the capacity on the team's radar.

3. Sequence right-sizing before any Savings Plan commitment

Compute Savings Plans discount the provisioned-concurrency dimension. Commit against an over-provisioned, unscheduled baseline and you lock a discount onto idle warm capacity, then strand part of the commitment when the reservation is right-sized. The rule mirrors EC2 right-sizing: right-size and schedule first, commit second, so the Savings Plan sits on the efficient warm-capacity run-rate.

4. Treat utilization as the metric, not the absolute dollar

The goal isn't the lowest possible PC bill — some warm capacity is genuinely worth paying for on critical paths. The goal is high utilization on what's reserved and reservations that ramp down when traffic does. A higher bill at 70% utilization with schedules is healthier than a lower one at 15% flat, because the first is buying speed it uses and the second is heating an empty room.

Quick quiz

Question 1 of 5

The provisioned-concurrency charge on a critical function has been flat all quarter, billing the same on weekends as on weekdays, and its utilization sits around 18%. The total is moderate. As the finance partner, what's the right next move?

Keep learning

Dig deeper into provisioned concurrency pricing, the scaling tooling, and the cold-start alternatives.

You've finished the finance partner's view of provisioned concurrency. You know why a serverless function can bill at zero traffic, why utilization is the metric rather than the raw total, and the three levers — track utilization on provisioned functions, ask whether the reservation is scheduled and still PC-justified, and sequence right-sizing before any Savings Plan. Next time warm-capacity charges show up at the monthly review, you'll have a sharper question than "why is this part of Lambda flat at weekends?"

Back to the library

Right-sizing provisioned concurrency: the headline

Pre-paid speed insurance that keeps billing when no one is asking

Provisioned concurrency is a feature teams buy to guarantee that a few critical functions respond instantly, with no start-up lag. It works by keeping capacity permanently warm — which means it bills continuously, even overnight and at weekends when traffic is low. Unlike normal serverless, which costs nothing at idle, this charge runs whether the capacity is used or not, so over-buying it is a quiet, recurring drain.

This is an efficiency-discipline issue, not a big-ticket savings item. The headline is that warm capacity should be sized to real demand and ideally scheduled to match traffic, rather than reserved at peak levels around the clock. Where the discipline exists, the team can show high utilization on the capacity they pay for and reservations that ramp down off-hours; where it doesn't, the business is paying premium rates to keep environments warm that customers never reach.

A short read for the exec who wants the headline and the one question. You'll get the rule-of-thumb — pre-paid speed insurance bills around the clock and should be sized to real demand — plus what this category signals about engineering's cost discipline and what "good" looks like at an org level. No commands, no internals.

Fun fact

The reservation that worked weekends for free

What it looks like when the org gets this right

At one company the cloud-cost review used to show a serverless line that, oddly, didn't dip on weekends like the rest of usage-based spend. The standing explanation was "that's the latency guarantee on our critical paths." That was partly true and partly an excuse — nobody could say whether the always-on capacity matched real demand or was simply set high once and left.

The exec sponsor stopped accepting "it's for latency" and started asking for a number: what fraction of the reserved warm capacity is actually used, and does the reservation follow the traffic? Within a quarter the team had right-sized the heavy functions and put them on schedules that ramped capacity down overnight and at weekends. Utilization climbed, the weekend charge fell, and critical-path latency was unchanged.

That's the right outcome state for provisioned concurrency. "Turn off the latency guarantee" is the wrong goal; "warm capacity is sized to real demand and scheduled to match traffic" is the right one. The cost line stops being an argument about whether speed is worth it and becomes a confidence signal that the team is buying exactly as much insurance as it needs.

Why this is on the report at all

The dollar amount in this category is rarely the headline on its own. The reason it's tracked is what its shape says about engineering's cost discipline. Warm capacity that bills flat around the clock, with low utilization and no schedule, signals that a speed feature was switched on once and never revisited; high utilization with reservations that ramp down off-hours signals a team actively matching cost to demand. The difference between those two states, on the same critical paths, is often a two-thirds cost gap with identical latency.

The second-order point is that provisioned concurrency is where teams most readily confuse "we bought the latency guarantee" with "the latency guarantee is sized correctly." It's a useful canary: if the always-on capacity has never been checked against real demand, the same buy-for-peak-and-forget habit likely runs through other reserved-capacity decisions. Asking for utilization and schedule on the provisioned functions costs leadership nothing and surfaces a discipline pattern early, while it's still cheap to fix.

The leadership move on this category

The actionable handle for an executive isn't to cut the warm-capacity bill — it's to set the norm that makes the reservation demonstrably match demand.

1. Ask for utilization, not the total

"What fraction of our reserved warm capacity is actually used, and does it ramp down off-hours?" is a one-minute review item that tells you whether the latency insurance is sized right, without any technical depth. High utilization with off-hours ramp-down is exactly what good looks like.

2. Require right-sizing before committing

Before a Compute Savings Plan covers provisioned concurrency, ask whether the reservations have been right-sized and scheduled. Committing on an over-provisioned baseline locks idle warm capacity into a multi-year discount — the same mistake as reserving capacity for an oversized fleet.

3. Use it as a discipline canary

Warm capacity is where teams most readily confuse 'we bought the guarantee' with 'the guarantee is sized correctly.' If the provisioned functions have never been checked against real demand, the buy-for-peak-and-forget habit likely runs deeper. A healthy utilization trend is a cheap, reliable signal that broader cost discipline is working.

Quick quiz

Question 1 of 5

You ask 'what's the utilization on our provisioned Lambda capacity, and does it ramp down off-hours?' and the answer is 'we set it for the launch peak and never changed it — it runs flat 24/7.' What's the right read?

Keep learning

Dig deeper into provisioned concurrency pricing, the scaling tooling, and the cold-start alternatives.

That's the lesson. Two takeaways worth holding onto: provisioned concurrency is pre-paid speed insurance that bills around the clock even at idle, and the right metric is utilization with reservations that ramp down off-hours — not the raw dollar total. The leadership question is about whether the warm capacity is sized to real demand, not whether to keep the latency guarantee.

Back to the library

Part of the learning path Right-size your compute

Right-size Lambda provisioned concurrency

Right-sizing provisioned concurrency: the basics

The reservation that worked weekends for free

Right-sizing provisioned concurrency in action

Right-sizing provisioned concurrency under the hooddeep dive

What is the impact of over-provisioned concurrency?

How do you right-size provisioned concurrency safely?

1. Find every function with provisioned concurrency and its cost

2. Measure utilization and spillover before changing anything

3. Right-size the steady value, then schedule with Application Auto Scaling

4. Ask whether the workload needs PC at all

Quick quiz

Keep learning

Right-sizing provisioned concurrency: what it means for the bill

The reservation that worked weekends for free

How a finance partner frames the warm-capacity line

Why this matters to the budget, not just the bill

What finance can actually do about this

1. Track utilization on provisioned functions, not the raw total

2. Ask 'is this reservation scheduled, and does it need PC at all?'

3. Sequence right-sizing before any Savings Plan commitment

4. Treat utilization as the metric, not the absolute dollar

Quick quiz

Keep learning

Right-sizing provisioned concurrency: the headline

The reservation that worked weekends for free

What it looks like when the org gets this right

Why this is on the report at all

The leadership move on this category

1. Ask for utilization, not the total

2. Require right-sizing before committing

3. Use it as a discipline canary

Quick quiz

Keep learning

Related cost lessons