Cost

Right-size Lambda function memory

Memory is the only dial on a Lambda — and it controls CPU and network too. Set it wrong and you either overpay on idle RAM or pay more on slow runs.

14 min·10 sections·AWS

Last reviewed 27 May 2026

Right-sizing Lambda memory: the basics

Why the one dial you have controls three things at once

AWS Lambda bills in GB-seconds: the memory you allocate to a function multiplied by how long each invocation runs, summed across every invocation. A 1024 MB function that runs for 2 seconds costs the same per call as a 2048 MB function that runs for 1 second — both are 2 GB-seconds. Memory is the only resource knob you get; you never pick a vCPU count directly.

The catch that surprises everyone is that memory also scales CPU and network proportionally. At 1769 MB you get the equivalent of one full vCPU; below that you get a fraction of a core, above it you get more than one. So raising memory doesn't just give a function more RAM — it makes a CPU-bound function run faster. If doubling memory halves the duration, your GB-second cost is identical but the function is twice as responsive. If it more-than-halves the duration, you've actually made the function cheaper and faster by giving it more memory.

It's flagged because most functions are set once — often copied from a template at the default 128 MB, or bumped to some round number during an incident — and never revisited. An I/O-bound function waiting on a database at 1024 MB is burning allocated RAM it never touches; a CPU-bound image resizer stuck at 128 MB is paying for a long, slow, single-threaded grind. Both are mis-sized in opposite directions, and only measurement tells you which.

In this lesson you'll learn the GB-second billing model, why the memory dial silently controls CPU and network, and the counter-intuitive truth that more memory can cost less. You'll see how to read a function's actual Max Memory Used from the CloudWatch REPORT line, how to sweep memory settings with AWS Lambda Power Tuning to find the cost/speed sweet spot, the real CLI to inspect and update memory, and the edge cases — cold starts, I/O-bound vs CPU-bound workloads, and the separate compounding lever of ARM/Graviton.

Fun fact

The 128 MB tax

Because 128 MB is Lambda's default, an enormous share of production functions run there forever — not because anyone measured it, but because nobody changed it. For CPU-bound work this is often the most expensive setting, not the cheapest: at 128 MB a function gets a small fraction of a vCPU, so a task that would take 0.4s at 1024 MB grinds for 3s. You pay 8x less memory but 7.5x more time — and end up paying roughly the same GB-seconds while delivering far worse latency. The lowest memory setting feels frugal and is frequently the worst of both worlds.

Right-sizing Lambda memory in action

Marco runs the platform team at a logistics company. A cost review flags that one Lambda — an image-thumbnail generator triggered on every upload — is responsible for about $4,200 of monthly spend across roughly 90 million invocations. It's configured at 512 MB and averages 1.8 seconds per run.

He pulls the CloudWatch REPORT lines and sees the function uses at most 180 MB of memory — so it's not memory-starved. But it's CPU-bound: image resizing is pure compute, and at 512 MB the function only gets about a third of a vCPU. He suspects more memory will buy more CPU and cut the duration enough to pay for itself.

He runs AWS Lambda Power Tuning across 512 / 1024 / 1769 / 3008 MB. The state machine reports that 1769 MB (one full vCPU) drops average duration to 0.55s. The math: 512 MB x 1.8s = 0.9 GB-s per call; 1769 MB x 0.55s = 0.95 GB-s per call — almost cost-neutral, but the function is now 3x faster. Stepping to 1024 MB lands at 0.78s for 0.78 GB-s — cheaper and faster than the original. He ships 1024 MB and the bill drops about 13% while p99 latency improves.

First, check the function's current memory setting and architecture — the two things that set its per-invocation price.

$ aws lambda get-function-configuration --function-name thumbnail-generator --query '{Memory:MemorySize,Arch:Architectures[0],Timeout:Timeout}'

{

"Memory": 512,

"Arch": "x86_64",

"Timeout": 30

}

# 512 MB on x86 — gets ~1/3 of a vCPU, and image resizing is pure compute.

Current config: the memory dial and architecture together determine GB-second price and CPU share.

Now find what the function actually uses. Every REPORT line in CloudWatch Logs records Max Memory Used; this Logs Insights query summarises it against the allocation.

$ aws logs start-query --log-group-name /aws/lambda/thumbnail-generator --start-time $(date -u -d '7 days ago' +%s) --end-time $(date -u +%s) --query-string 'filter @type = "REPORT" | stats max(@maxMemoryUsed/1000/1000) as peakMB, avg(@billedDuration) as avgMs, max(@memorySize/1000/1000) as allocMB'

{

"results": [

[

{ "field": "peakMB", "value": "180.2" },

{ "field": "avgMs", "value": "1804.6" },

{ "field": "allocMB", "value": "512.0" }

]

}

# Peak 180 MB of 512 — not memory-starved. But ~1.8s avg = CPU-bound. More CPU = faster.

Logs Insights parses @maxMemoryUsed and @billedDuration from every REPORT line — the ground truth for tuning.

Right-sizing Lambda memory under the hooddeep dive

Lambda bills GB-seconds plus a flat per-request charge. In US-East-1 on x86, compute is roughly $0.0000166667 per GB-second and $0.20 per million requests; on ARM/Graviton it's about $0.0000133334 per GB-second — roughly 20% cheaper for the same memory. The GB-second is allocated memory (in GB) times billed duration (rounded up to the nearest millisecond). Crucially you pay for allocated memory, not used memory — a 1024 MB function that touches 180 MB still bills at 1 GB, which is exactly why reading Max Memory Used matters: it tells you the floor you can't go below without OOM, but the price is set by what you allocate above it.

CPU and network scale linearly with the memory setting. AWS gives one full vCPU at 1769 MB; below that you get a proportional fraction, above it more than one (up to 6 vCPUs at the 10,240 MB ceiling). This is why the cost curve isn't monotonic. For a CPU-bound function, raising memory shortens duration, and the GB-second product (memory x time) can stay flat or fall even as the per-millisecond rate rises. For an I/O-bound function blocked on a network call, extra CPU does nothing — duration stays fixed and you simply pay more per call. The only way to know which regime a function is in is to measure across several memory settings.

Cold starts complicate the picture. Initialisation code (imports, SDK clients, connection setup) runs once per new execution environment and historically was not billed; modern Lambda does bill the INIT phase for managed runtimes. Higher memory also speeds up cold-start init because it brings more CPU, so for latency-sensitive low-volume functions the memory choice trades a slightly higher per-invocation cost for faster cold starts. AWS Lambda Power Tuning — an open-source Step Functions state machine you deploy from the Serverless Application Repository — automates the sweep: it invokes your function at a list of memory values, plots cost against speed, and recommends the setting that optimises for cost, speed, or a balance you choose.

# Read actual peak memory and billed duration from the REPORT lines (last 24h).
aws logs filter-log-events \
  --log-group-name /aws/lambda/thumbnail-generator \
  --filter-pattern 'REPORT' \
  --start-time $(date -u -d '1 day ago' +%s000) \
  --query 'events[].message' --output text | tail -5

# Sweep memory settings with AWS Lambda Power Tuning (deployed as a Step Functions state machine).
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:powerTuningStateMachine \
  --input '{
    "lambdaARN": "arn:aws:lambda:us-east-1:123456789012:function:thumbnail-generator",
    "powerValues": [512, 1024, 1769, 3008],
    "num": 50,
    "strategy": "cost"
  }'
# Returns a recommended memory value plus a cost/speed visualisation URL.

What is the impact of mis-sized Lambda memory?

The direct cost is paid one fraction-of-a-cent at a time, which is exactly why it goes unnoticed. A single invocation of an over-provisioned function might waste $0.000004; multiply by 90 million invocations a month and that's $360 on one function, and a large org runs hundreds of functions. The waste is invisible per-call and material in aggregate — the serverless equivalent of an oversized fleet, just spread across executions instead of instances.

The under-provisioned direction is the sneakier trap. A CPU-bound function pinned at 128 or 256 MB feels frugal but often costs nearly as much as a well-tuned one because the extra duration cancels the memory saving — and it ships worse latency at the same time. Teams "save money" by lowering memory and end up with slower functions for no real cost benefit, then add provisioned concurrency or caching to fix the latency, spending more to paper over a tuning miss.

There's a commitment angle too. Lambda compute can be covered by Compute Savings Plans, which discount GB-seconds across Lambda, Fargate, and EC2. If you commit based on an untuned, over-provisioned baseline, you've locked in a discount on waste — the same stranded-commitment trap as buying Reserved Instances for an oversized EC2 fleet. Tuning should come before committing, so the commitment sits on the efficient baseline.

Finally, mis-sized memory distorts the architecture conversation. When a function is slow, the reflexive fix is provisioned concurrency, a queue, or a rewrite — when the actual fix is often a one-line memory change that buys more CPU. Untuned functions hide which latency problems are real engineering problems and which are just the wrong number in a config file, and that misdirection costs engineering hours far beyond the Lambda bill.

How do you right-size Lambda memory safely?

Right-sizing Lambda is a four-step loop that runs on the FinOps cadence: find the functions that matter, measure what they actually use and how fast they run, sweep memory to find the sweet spot, and re-tune as the code evolves.

1. Rank functions by GB-seconds, not invocation count

Cost follows memory x duration x volume, so a low-volume 10 GB function can outweigh a high-volume tiny one. Pull total GB-seconds per function from Cost Explorer or the CloudWatch billed-duration metrics and tune the top 10-20% first — they almost always carry the overwhelming majority of the spend. Ignore the long tail of rarely-invoked functions; tuning them is effort that never pays back.

2. Measure peak memory and duration before changing anything

Every REPORT line carries Max Memory Used, Billed Duration, and the configured memory size. Read them via Logs Insights (or enable Lambda Insights for richer metrics). Peak memory tells you the floor you can't drop below without OOM; the duration profile tells you whether the function is CPU-bound (duration falls as memory rises) or I/O-bound (duration is flat regardless). The two regimes call for opposite moves.

3. Sweep with AWS Lambda Power Tuning, don't guess

Deploy the open-source Power Tuning state machine and run each high-cost function across a range like 512 / 1024 / 1769 / 3008 MB with a representative payload. It plots cost against speed and recommends a setting for your chosen strategy — cost, speed, or balanced. Trust it for stateless functions; for ones with side effects, run the sweep in a non-production account or with a payload that's safe to repeat dozens of times.

4. Consider ARM/Graviton, then re-tune on a cadence

Switching architecture to arm64 cuts the GB-second rate ~20% for compatible runtimes — a compounding lever on top of the memory tuning, applied with a single config change for most pure-Node/Python functions. After tuning, fold a quarterly re-check into the cadence: code changes shift the memory/CPU profile, and a function tuned for last quarter's logic can drift. Wire Power Tuning into CI for the heaviest functions so regressions surface before they ship.

# Apply the recommended memory and switch to ARM in one update.
aws lambda update-function-configuration \
  --function-name thumbnail-generator \
  --memory-size 1024 \
  --architectures arm64

# Confirm the new settings took effect.
aws lambda get-function-configuration \
  --function-name thumbnail-generator \
  --query '{Memory:MemorySize,Arch:Architectures[0]}'

Quick quiz

Question 1 of 5

A CPU-bound image function runs 90M times a month at 512 MB, averaging 1.8s, and peaks at 180 MB of memory used. Power Tuning shows 1024 MB averages 0.78s. What's the right move?

Keep learning

Dig deeper into Lambda pricing, tuning tooling, and the architecture choices around it.

You've completed Right-size Lambda function memory. You now know the GB-second model, why the memory dial controls CPU and network, the counter-intuitive truth that more memory can cost less on CPU-bound work, and the four-step loop — rank by GB-seconds, measure peak and duration, sweep with Power Tuning, then apply Graviton and re-tune on a cadence. The next time a cost review flags a high-spend function, you'll have a defensible path from "flagged" to "tuned" in an afternoon.

Back to the library

Right-sizing Lambda memory: what it means for the bill

A serverless line item priced on a unit most people never see

Lambda is the "serverless" part of the cloud bill — no instances, no servers, just functions that run when something triggers them. AWS doesn't charge you per hour of a machine; it charges per run, and the price of each run is the memory you assigned to the function times how long it ran. The industry unit is the GB-second. You can think of it as "how much RAM, for how long, how many times."

The non-obvious part is that the memory setting also controls how fast the function runs. Give a compute-heavy function more memory and it finishes faster, which can mean the bill stays flat or even drops while performance improves. Give a function that mostly sits waiting on a database more memory and you simply pay for RAM it never uses. So the right setting is not "as low as possible" and not "plenty of headroom" — it's the specific number a short tuning exercise finds, and it differs per function.

From a budgeting standpoint Lambda is easy to under-manage precisely because each invocation is tiny — fractions of a cent. The cost hides until a high-traffic function runs billions of times a month, at which point a 30-40% efficiency gap that nobody noticed becomes a five-figure monthly line. The right framing for the cost review is per-function unit economics: cost per million invocations, and whether the highest-volume functions have ever been tuned. An untuned high-traffic function is the serverless equivalent of an oversized instance you forgot to right-size.

This lesson is for the finance partner who sees "Lambda" on the invoice as one undifferentiated number and wants to know whether it's well-managed. It explains the GB-second unit without internals, why the cheapest setting isn't always the lowest memory, what good unit economics look like (cost per million invocations, trending flat as traffic grows), and the two questions to ask at the monthly review: which functions drive the spend, and when were they last tuned. By the end you'll know what to push engineering on and what a healthy serverless trend looks like as a number.

Fun fact

The 128 MB tax

How a finance partner frames the Lambda line

Priya is the finance partner embedded with a logistics platform team. At the monthly cost review the Lambda line has crept up alongside traffic, and instead of asking "why is serverless growing" she asks the sharper version: "Which three functions drive most of this, and when were they last tuned?" The engineering lead pulls it up — one image-processing function is over half the spend, at about $4,200 a month, and it hasn't been touched since it was first deployed.

The conversation isn't technical. Priya doesn't ask about memory megabytes or vCPU fractions. She asks for one number — cost per million invocations for that function — and whether anyone has ever run a tuning pass on it. The answer is no. That's enough: she asks engineering to spend an afternoon on the top three functions and report the before/after cost-per-million at the next review.

A month later the same function's cost-per-million has dropped about 13% and latency improved, so there's no SLA trade-off to debate. Priya now tracks cost-per-million-invocations on the standing pack rather than the raw Lambda dollar total, because the raw total will keep rising with healthy traffic growth. A flat or falling cost-per-million as invocations climb is the signal that serverless is being managed; a rising one is the prompt to ask which functions slipped.

Why this matters to the budget, not just the bill

The per-resource impact is tiny and the aggregate is real. Lambda is usually a single-digit percentage of total cloud spend, but within it a 20-40% efficiency gap on the highest-traffic functions is common, and it scales directly with usage. As the product grows, an untuned serverless tier grows its waste in lockstep — so this is a category where a small one-time tuning effort compounds into avoided cost every month thereafter.

The unit to budget and forecast against is cost per million invocations, not the raw dollar total. The raw total should rise as traffic grows; that's healthy. What you want to see is the per-million unit holding flat or falling on the functions that drive spend. If the raw Lambda line is growing faster than invocation volume, efficiency is slipping somewhere, and that's the variance to chase — not the headline number, which growth alone will always inflate.

There's a commitment dimension finance owns directly. Compute Savings Plans discount Lambda GB-seconds alongside Fargate and EC2. Committing against an untuned baseline locks a discount onto waste and strands part of the commitment when the functions are later tuned. The sequencing rule is the same as for EC2 right-sizing: tune first, commit second, so the Savings Plan sits on the efficient run-rate rather than the inflated one.

Finally, treat tuning status as a leading indicator. If the answer to "when were the top functions last tuned?" is "never," that usually correlates with weak cost discipline in other serverless-adjacent categories too. A team that tunes its heavy functions quarterly is signalling a healthy operating cadence; one that can't name its top three functions is signalling the opposite, and the Lambda line is just where it shows up first.

What finance can actually do about this

Finance can't change a memory setting, but it can set the conditions that keep serverless efficient. Three levers, used at the monthly cost cadence.

1. Track cost per million invocations, not the raw total

Put cost-per-million-invocations for the top three to five functions on the standing cost-review pack. The raw Lambda total will rise with healthy traffic growth and tells you little; the per-unit number tells you whether each dollar of growth is efficient. A rising per-unit cost is the prompt to ask which function slipped.

2. Ask 'when was this last tuned?' for the heavy hitters

Make 'last tuned' a known attribute of the top-spend functions, the way 'last reviewed' is for a budget. If the answer is 'never,' that's a one-afternoon engineering task with a clear before/after number. The question itself, asked routinely, does most of the work — it keeps tuning on the team's radar without finance needing any internals.

3. Sequence tuning before any Savings Plan commitment

Compute Savings Plans discount Lambda GB-seconds. Commit against an untuned baseline and you lock a discount onto waste, then strand part of the commitment when the functions are tuned later. The rule mirrors EC2 right-sizing: tune first, commit second, so the Savings Plan sits on the efficient run-rate.

4. Treat the unit-cost trend as the metric

The goal is not the lowest possible Lambda bill — that fights traffic growth you want. The goal is a flat or falling cost-per-million on the functions that matter. A growing raw total with a falling per-unit cost is a healthy, growing product; a growing total with a flat-or-rising per-unit cost is the signal to dig in.

Quick quiz

Question 1 of 5

The raw Lambda line has grown 25% this quarter while invocation volume grew 40%, and cost-per-million-invocations on the top functions has fallen. As the finance partner, what's the right read?

Keep learning

Dig deeper into Lambda pricing, tuning tooling, and the architecture choices around it.

You've finished the finance partner's view of Lambda right-sizing. You know the GB-second unit without the internals, why the cheapest setting isn't always the lowest memory, why cost-per-million-invocations is the metric rather than the raw total, and the three levers — track the unit, ask when functions were last tuned, and sequence tuning before any Savings Plan. Next time the Lambda line shows up at the monthly review, you'll have a sharper question than "why is serverless growing?"

Back to the library

Right-sizing Lambda memory: the headline

Serverless spend that can get cheaper and faster at the same time

Lambda is the part of the cloud that charges per execution rather than per server. The single setting that governs both the cost and the speed of a function is its memory allocation, and counter-intuitively, raising it can lower the total bill when it makes the work finish proportionally faster. Most functions are configured once and never revisited, so the gap between what they cost and what they should cost compounds quietly with traffic.

This is an efficiency-discipline issue more than a big-ticket savings item. The headline is that serverless is not automatically optimal just because it scales to zero — the highest-traffic functions need periodic tuning the same way oversized instances need right-sizing. Where the discipline exists, the team can show cost-per-million-invocations trending flat or down as traffic grows; where it doesn't, serverless spend grows linearly with usage and nobody can say whether that's necessary.

A short read for the exec who wants the headline and the one question. You'll get the rule-of-thumb — serverless isn't automatically efficient, and the highest-traffic functions need periodic tuning — plus what this category signals about engineering's cost discipline and what "good" looks like at an org level. No commands, no internals.

Fun fact

The 128 MB tax

What it looks like when the org gets this right

At one company the cloud-cost review used to show "Lambda" as a single line that grew every quarter, and the standing explanation was "traffic is up." That was partly true and partly an excuse — nobody could separate growth that came from more usage from growth that came from never tuning the functions carrying that usage.

The exec sponsor stopped accepting the raw dollar figure and started asking for a unit: cost per million invocations on the top few functions. Within a quarter the team had tuned the heavy hitters, the unit cost dropped on several of them, and the line on the pack changed from a raw dollar total to a per-unit trend. Spend still grew with traffic — that's expected and healthy — but now it was demonstrably efficient growth.

That's the right outcome state for serverless. "Spend less on Lambda" is the wrong goal when traffic is genuinely growing; "cost per million invocations is flat or falling on the functions that matter" is the right one. The cost line stops being an argument and becomes a confidence signal.

Why this is on the report at all

The Lambda dollar amount is rarely large enough to matter on its own. The reason it's tracked is what its trend says about engineering's cost discipline. A serverless tier where cost-per-million-invocations holds flat or falls as traffic grows is a sign the team is actively managing efficiency; one where the raw line grows faster than usage is a sign that "serverless scales automatically" has been mistaken for "serverless is automatically optimal."

The second-order point is sequencing. Serverless is the area where teams most readily assume the platform handles efficiency for them, so it's a useful canary: if the highest-traffic functions have never been tuned, the same set-and-forget habit almost certainly applies to bigger categories. Watching cost-per-million as a confidence signal costs leadership nothing and surfaces a discipline problem early, while it's still cheap to fix.

The leadership move on this category

The actionable handle for an executive isn't to cut the Lambda bill — it's to set the norm that makes serverless demonstrably efficient.

1. Ask for the unit, not the total

"Is cost per million invocations flat or falling on our biggest functions?" is a one-minute review item that tells you whether serverless is being managed, without any technical depth. A flat or improving unit cost as traffic grows is exactly what good looks like.

2. Require tuning before committing

Before the team buys a Compute Savings Plan that covers Lambda, ask whether the heavy functions have been tuned. Committing on an inefficient baseline locks waste into a multi-year discount — the same mistake as reserving capacity for an oversized fleet.

3. Use it as a discipline canary

Serverless is where teams most assume the platform handles efficiency for them. If the top functions have never been tuned, the set-and-forget habit likely runs deeper. A healthy serverless trend is a cheap, reliable signal that broader cost discipline is working.

Quick quiz

Question 1 of 5

You ask 'when were our top Lambda functions last tuned?' and the answer is 'we've never tuned them — serverless scales automatically.' What's the right read?

Keep learning

Dig deeper into Lambda pricing, tuning tooling, and the architecture choices around it.

That's the lesson. Two takeaways worth holding onto: serverless that scales automatically is not the same as serverless that's optimal, and the right metric is cost per million invocations trending flat or down as traffic grows — not the raw dollar total. The leadership question is about tuning discipline and unit economics, not the headline number.

Back to the library

Part of the learning path Right-size your compute

Right-size Lambda function memory

Right-sizing Lambda memory: the basics

The 128 MB tax

Right-sizing Lambda memory in action

Right-sizing Lambda memory under the hooddeep dive

What is the impact of mis-sized Lambda memory?

How do you right-size Lambda memory safely?

1. Rank functions by GB-seconds, not invocation count

2. Measure peak memory and duration before changing anything

3. Sweep with AWS Lambda Power Tuning, don't guess

4. Consider ARM/Graviton, then re-tune on a cadence

Quick quiz

Keep learning

Right-sizing Lambda memory: what it means for the bill

The 128 MB tax

How a finance partner frames the Lambda line

Why this matters to the budget, not just the bill

What finance can actually do about this

1. Track cost per million invocations, not the raw total

2. Ask 'when was this last tuned?' for the heavy hitters

3. Sequence tuning before any Savings Plan commitment

4. Treat the unit-cost trend as the metric

Quick quiz

Keep learning

Right-sizing Lambda memory: the headline

The 128 MB tax

What it looks like when the org gets this right

Why this is on the report at all

The leadership move on this category

1. Ask for the unit, not the total

2. Require tuning before committing

3. Use it as a discipline canary

Quick quiz

Keep learning

Related cost lessons