Right-sizing Lambda memory: the basics
Why the one dial you have controls three things at once
AWS Lambda bills in GB-seconds: the memory you allocate to a function multiplied by how long each invocation runs, summed across every invocation. A 1024 MB function that runs for 2 seconds costs the same per call as a 2048 MB function that runs for 1 second — both are 2 GB-seconds. Memory is the only resource knob you get; you never pick a vCPU count directly.
The catch that surprises everyone is that memory also scales CPU and network proportionally. At 1769 MB you get the equivalent of one full vCPU; below that you get a fraction of a core, above it you get more than one. So raising memory doesn't just give a function more RAM — it makes a CPU-bound function run faster. If doubling memory halves the duration, your GB-second cost is identical but the function is twice as responsive. If it more-than-halves the duration, you've actually made the function cheaper and faster by giving it more memory.
It's flagged because most functions are set once — often copied from a template at the default 128 MB, or bumped to some round number during an incident — and never revisited. An I/O-bound function waiting on a database at 1024 MB is burning allocated RAM it never touches; a CPU-bound image resizer stuck at 128 MB is paying for a long, slow, single-threaded grind. Both are mis-sized in opposite directions, and only measurement tells you which.
In this lesson you'll learn the GB-second billing model, why the memory dial silently controls CPU and network, and the counter-intuitive truth that more memory can cost less. You'll see how to read a function's actual Max Memory Used from the CloudWatch REPORT line, how to sweep memory settings with AWS Lambda Power Tuning to find the cost/speed sweet spot, the real CLI to inspect and update memory, and the edge cases — cold starts, I/O-bound vs CPU-bound workloads, and the separate compounding lever of ARM/Graviton.
The 128 MB tax
Because 128 MB is Lambda's default, an enormous share of production functions run there forever — not because anyone measured it, but because nobody changed it. For CPU-bound work this is often the most expensive setting, not the cheapest: at 128 MB a function gets a small fraction of a vCPU, so a task that would take 0.4s at 1024 MB grinds for 3s. You pay 8x less memory but 7.5x more time — and end up paying roughly the same GB-seconds while delivering far worse latency. The lowest memory setting feels frugal and is frequently the worst of both worlds.
Right-sizing Lambda memory in action
Marco runs the platform team at a logistics company. A cost review flags that one Lambda — an image-thumbnail generator triggered on every upload — is responsible for about $4,200 of monthly spend across roughly 90 million invocations. It's configured at 512 MB and averages 1.8 seconds per run.
He pulls the CloudWatch REPORT lines and sees the function uses at most 180 MB of memory — so it's not memory-starved. But it's CPU-bound: image resizing is pure compute, and at 512 MB the function only gets about a third of a vCPU. He suspects more memory will buy more CPU and cut the duration enough to pay for itself.
He runs AWS Lambda Power Tuning across 512 / 1024 / 1769 / 3008 MB. The state machine reports that 1769 MB (one full vCPU) drops average duration to 0.55s. The math: 512 MB x 1.8s = 0.9 GB-s per call; 1769 MB x 0.55s = 0.95 GB-s per call — almost cost-neutral, but the function is now 3x faster. Stepping to 1024 MB lands at 0.78s for 0.78 GB-s — cheaper and faster than the original. He ships 1024 MB and the bill drops about 13% while p99 latency improves.
First, check the function's current memory setting and architecture — the two things that set its per-invocation price.
Current config: the memory dial and architecture together determine GB-second price and CPU share.
Now find what the function actually uses. Every REPORT line in CloudWatch Logs records Max Memory Used; this Logs Insights query summarises it against the allocation.
Logs Insights parses @maxMemoryUsed and @billedDuration from every REPORT line — the ground truth for tuning.
Right-sizing Lambda memory under the hooddeep dive
Lambda bills GB-seconds plus a flat per-request charge. In US-East-1 on x86, compute is roughly $0.0000166667 per GB-second and $0.20 per million requests; on ARM/Graviton it's about $0.0000133334 per GB-second — roughly 20% cheaper for the same memory. The GB-second is allocated memory (in GB) times billed duration (rounded up to the nearest millisecond). Crucially you pay for allocated memory, not used memory — a 1024 MB function that touches 180 MB still bills at 1 GB, which is exactly why reading Max Memory Used matters: it tells you the floor you can't go below without OOM, but the price is set by what you allocate above it.
CPU and network scale linearly with the memory setting. AWS gives one full vCPU at 1769 MB; below that you get a proportional fraction, above it more than one (up to 6 vCPUs at the 10,240 MB ceiling). This is why the cost curve isn't monotonic. For a CPU-bound function, raising memory shortens duration, and the GB-second product (memory x time) can stay flat or fall even as the per-millisecond rate rises. For an I/O-bound function blocked on a network call, extra CPU does nothing — duration stays fixed and you simply pay more per call. The only way to know which regime a function is in is to measure across several memory settings.
Cold starts complicate the picture. Initialisation code (imports, SDK clients, connection setup) runs once per new execution environment and historically was not billed; modern Lambda does bill the INIT phase for managed runtimes. Higher memory also speeds up cold-start init because it brings more CPU, so for latency-sensitive low-volume functions the memory choice trades a slightly higher per-invocation cost for faster cold starts. AWS Lambda Power Tuning — an open-source Step Functions state machine you deploy from the Serverless Application Repository — automates the sweep: it invokes your function at a list of memory values, plots cost against speed, and recommends the setting that optimises for cost, speed, or a balance you choose.
# Read actual peak memory and billed duration from the REPORT lines (last 24h).
aws logs filter-log-events \
--log-group-name /aws/lambda/thumbnail-generator \
--filter-pattern 'REPORT' \
--start-time $(date -u -d '1 day ago' +%s000) \
--query 'events[].message' --output text | tail -5
# Sweep memory settings with AWS Lambda Power Tuning (deployed as a Step Functions state machine).
aws stepfunctions start-execution \
--state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:powerTuningStateMachine \
--input '{
"lambdaARN": "arn:aws:lambda:us-east-1:123456789012:function:thumbnail-generator",
"powerValues": [512, 1024, 1769, 3008],
"num": 50,
"strategy": "cost"
}'
# Returns a recommended memory value plus a cost/speed visualisation URL. What is the impact of mis-sized Lambda memory?
The direct cost is paid one fraction-of-a-cent at a time, which is exactly why it goes unnoticed. A single invocation of an over-provisioned function might waste $0.000004; multiply by 90 million invocations a month and that's $360 on one function, and a large org runs hundreds of functions. The waste is invisible per-call and material in aggregate — the serverless equivalent of an oversized fleet, just spread across executions instead of instances.
The under-provisioned direction is the sneakier trap. A CPU-bound function pinned at 128 or 256 MB feels frugal but often costs nearly as much as a well-tuned one because the extra duration cancels the memory saving — and it ships worse latency at the same time. Teams "save money" by lowering memory and end up with slower functions for no real cost benefit, then add provisioned concurrency or caching to fix the latency, spending more to paper over a tuning miss.
There's a commitment angle too. Lambda compute can be covered by Compute Savings Plans, which discount GB-seconds across Lambda, Fargate, and EC2. If you commit based on an untuned, over-provisioned baseline, you've locked in a discount on waste — the same stranded-commitment trap as buying Reserved Instances for an oversized EC2 fleet. Tuning should come before committing, so the commitment sits on the efficient baseline.
Finally, mis-sized memory distorts the architecture conversation. When a function is slow, the reflexive fix is provisioned concurrency, a queue, or a rewrite — when the actual fix is often a one-line memory change that buys more CPU. Untuned functions hide which latency problems are real engineering problems and which are just the wrong number in a config file, and that misdirection costs engineering hours far beyond the Lambda bill.
How do you right-size Lambda memory safely?
Right-sizing Lambda is a four-step loop that runs on the FinOps cadence: find the functions that matter, measure what they actually use and how fast they run, sweep memory to find the sweet spot, and re-tune as the code evolves.
1. Rank functions by GB-seconds, not invocation count
Cost follows memory x duration x volume, so a low-volume 10 GB function can outweigh a high-volume tiny one. Pull total GB-seconds per function from Cost Explorer or the CloudWatch billed-duration metrics and tune the top 10-20% first — they almost always carry the overwhelming majority of the spend. Ignore the long tail of rarely-invoked functions; tuning them is effort that never pays back.
2. Measure peak memory and duration before changing anything
Every REPORT line carries Max Memory Used, Billed Duration, and the configured memory size. Read them via Logs Insights (or enable Lambda Insights for richer metrics). Peak memory tells you the floor you can't drop below without OOM; the duration profile tells you whether the function is CPU-bound (duration falls as memory rises) or I/O-bound (duration is flat regardless). The two regimes call for opposite moves.
3. Sweep with AWS Lambda Power Tuning, don't guess
Deploy the open-source Power Tuning state machine and run each high-cost function across a range like 512 / 1024 / 1769 / 3008 MB with a representative payload. It plots cost against speed and recommends a setting for your chosen strategy — cost, speed, or balanced. Trust it for stateless functions; for ones with side effects, run the sweep in a non-production account or with a payload that's safe to repeat dozens of times.
4. Consider ARM/Graviton, then re-tune on a cadence
Switching architecture to arm64 cuts the GB-second rate ~20% for compatible runtimes — a compounding lever on top of the memory tuning, applied with a single config change for most pure-Node/Python functions. After tuning, fold a quarterly re-check into the cadence: code changes shift the memory/CPU profile, and a function tuned for last quarter's logic can drift. Wire Power Tuning into CI for the heaviest functions so regressions surface before they ship.
# Apply the recommended memory and switch to ARM in one update.
aws lambda update-function-configuration \
--function-name thumbnail-generator \
--memory-size 1024 \
--architectures arm64
# Confirm the new settings took effect.
aws lambda get-function-configuration \
--function-name thumbnail-generator \
--query '{Memory:MemorySize,Arch:Architectures[0]}' Quick quiz
Question 1 of 5A CPU-bound image function runs 90M times a month at 512 MB, averaging 1.8s, and peaks at 180 MB of memory used. Power Tuning shows 1024 MB averages 0.78s. What's the right move?
You scored
0 / 5
Keep learning
Dig deeper into Lambda pricing, tuning tooling, and the architecture choices around it.
- AWS Lambda pricing Current GB-second and per-request rates for x86 and ARM/Graviton — the numbers behind the tuning math.
- AWS Lambda Power Tuning (open source) The Step Functions state machine that sweeps memory settings and plots the cost/speed sweet spot.
- AWS Lambda — memory and compute power How the memory setting controls CPU and network allocation, and how to configure it.
- FinOps Foundation — Cloud Rate and Usage Optimization How serverless tuning fits the broader FinOps lifecycle and operating model.
You've completed Right-size Lambda function memory. You now know the GB-second model, why the memory dial controls CPU and network, the counter-intuitive truth that more memory can cost less on CPU-bound work, and the four-step loop — rank by GB-seconds, measure peak and duration, sweep with Power Tuning, then apply Graviton and re-tune on a cadence. The next time a cost review flags a high-spend function, you'll have a defensible path from "flagged" to "tuned" in an afternoon.
Back to the library