Right-sizing Fargate: the basics
Why you pay for the size you ask for, not the size you use
AWS Fargate runs containers without requiring you to manage the underlying servers. Its pricing model is based on the CPU and memory allocated to each task definition — billed per vCPU-second and per GB-second while the task is running.
The important detail is that billing is driven by the resources you request, not what the container actually consumes at runtime. A task configured for 4 vCPU and 8 GB of memory will be billed for that full allocation even if the application only peaks at 0.4 vCPU and 1.2 GB in practice.
That over-allocation compounds quickly at scale. A service with a desiredCount of 20 over-provisioned tasks repeats the same waste twenty times, continuously, before any additional Service Auto Scaling capacity is added. Depending on region and pricing model, a heavily over-sized task can easily cost several times more than a right-sized equivalent running the exact same workload. Unlike EC2, Fargate does not allow arbitrary CPU and memory combinations. It only accepts predefined valid pairings — for example: 0.25 vCPU supports 0.5 GB, 1 GB, or 2 GB RAM 1 vCPU supports 2–8 GB RAM in 1 GB increments Effective Fargate optimisation therefore means analysing real utilisation data, selecting the smallest valid CPU/memory combination that still provides safe headroom for peak demand, and deploying the change through a new task definition revision
In this lesson you'll learn how the AWS Fargate billing model makes over-requested tasks unnecessarily expensive, how to read real container utilisation using CloudWatch Container Insights (CPU and memory utilised versus reserved), and how to choose from the limited set of valid Fargate CPU↔memory combinations.
You'll also walk through the safe rollout pattern for right-sizing services: registering a smaller task definition revision, updating the ECS service, and reviewing desiredCount along with any Service Auto Scaling target-tracking policies.
Finally, you'll explore additional optimisation opportunities including Fargate Spot, ARM64/Graviton-based workloads, AWS Compute Optimizer recommendations for ECS on Fargate, and how the optimisation approach differs for ECS on EC2 where the underlying instances and capacity providers must also be right-sized.
The combination you can't actually pick
Fargate doesn't let you choose arbitrary CPU and memory values — it only accepts a predefined set of valid combinations. For example, a task configured with 1 vCPU can only use memory values within a specific supported range, while 0.25 vCPU tasks are limited to a much smaller set of options. Teams often identify an ideal target size from utilisation metrics, only to discover that the exact combination they want is not valid on Fargate. At that point, many simply round up to the next supported configuration, unintentionally increasing cost again. Effective right-sizing on Fargate therefore requires two things: understanding real workload utilisation, and understanding the valid CPU↔memory combinations that Fargate will actually accept.
Right-sizing Fargate in action
Marcus runs the platform team at a logistics company. During a finance review, the team discovers that a single ECS service — an image-resizing worker — is responsible for a disproportionately large share of monthly Fargate spend. The service runs 20 tasks, each configured with 4 vCPU and 8 GB of memory.
Marcus opens CloudWatch Container Insights and reviews two weeks of utilisation data. CpuUtilized averages around 0.5 vCPU against 4 vCPU reserved, while MemoryUtilized sits close to 1.3 GB against 8 GB reserved. Even during batch-processing peaks, the service only reaches around 0.9 vCPU and 1.7 GB of memory usage. Using this data, he selects the smallest valid Fargate CPU↔memory combination that still provides safe operational headroom: 1 vCPU and 2 GB of memory.
Marcus registers the smaller configuration as a new task definition revision and rolls it out using a standard ECS rolling deployment. He keeps the service desiredCount unchanged during the rollout, then later tightens the Service Auto Scaling target-tracking policy once stability is confirmed. The result is a major reduction in monthly Fargate spend with no measurable impact on throughput, latency, or customer experience.
First, pull the real container utilisation from Container Insights — CPU used versus CPU reserved — to confirm the service is over-requested.
14-day hourly CPU used vs the 4096-unit (4 vCPU) reservation — a clear over-request.
Register a smaller task-definition revision at a valid Fargate combo (1 vCPU / 2 GB), then roll the service onto it. desiredCount and the scaling policy come after.
New revision at a valid CPU↔memory pairing, rolled out with zero downtime.
Fargate right-sizing under the hooddeep dive
Fargate pricing is linear and billed per second (with a one-minute minimum), based on the CPU and memory defined in the task definition. In practical terms, larger task sizes scale cost almost directly in proportion to the vCPU and memory requested.
That means a heavily over-provisioned task can cost several times more than a right-sized equivalent, even when both process the same workload. Because the billing meter reads the task definition's requested cpu and memory values, the only way to materially reduce spend is by changing the task definition itself — actual runtime utilisation does not directly affect the bill.
Fargate CPU and memory combinations are a hard platform constraint, not a recommendation. Each CPU tier only supports a defined range of memory values, which means right-sizing is limited to the combinations Fargate will actually accept. In practice, this often forces teams to choose the nearest supported configuration rather than an exact theoretical target.
CloudWatch Container Insights exposes the metrics needed to make those decisions properly: CpuUtilized versus CpuReserved, and MemoryUtilized versus MemoryReserved, at both service and task level. These metrics show the gap between what the workload actually consumes and what the task definition reserves for billing purposes.
AWS Compute Optimizer can now ingest this utilisation data and generate ECS-on-Fargate right-sizing recommendations, including projected savings and optimisation findings, in much the same way it already provides recommendations for EC2 instances.
Fargate CPU and memory combinations are a hard platform constraint, not a recommendation. Each CPU tier only supports a defined range of memory values, which means right-sizing is limited to the combinations Fargate will actually accept. In practice, this often forces teams to choose the nearest supported configuration rather than an exact theoretical target.
CloudWatch Container Insights exposes the metrics needed to make those decisions properly: CpuUtilized versus CpuReserved, and MemoryUtilized versus MemoryReserved, at both service and task level. These metrics show the gap between what the workload actually consumes and what the task definition reserves for billing purposes.
AWS Compute Optimizer can now ingest this utilisation data and generate ECS-on-Fargate right-sizing recommendations, including projected savings and optimisation findings, in much the same way it already provides recommendations for EC2 instances.
# Ask Compute Optimizer for ECS-on-Fargate service right-sizing recommendations.
aws compute-optimizer get-ecs-service-recommendations \
--service-arns arn:aws:ecs:us-east-1:123456789012:service/prod/image-worker \
--query 'ecsServiceRecommendations[0].{Finding:finding, \
CurrentCpu:currentServiceConfiguration.cpu, \
CurrentMem:currentServiceConfiguration.memory, \
Option:serviceRecommendationOptions[0]}'
# Inspect memory used vs reserved to confirm the smaller memory tier is safe.
aws cloudwatch get-metric-statistics \
--namespace ECS/ContainerInsights --metric-name MemoryUtilized \
--dimensions Name=ClusterName,Value=prod Name=ServiceName,Value=image-worker \
--start-time $(date -u -d '14 days ago' +%FT%TZ) \
--end-time $(date -u +%FT%TZ) --period 3600 --statistics Average Maximum What is the impact of over-requested Fargate tasks?
The most visible impact is the bill, and it's larger than it looks because the over-request is multiplied by the running task count. One service defined at 4× the capacity it needs, running 20 tasks, is paying for ~60 vCPU and ~120 GB it never touches — thousands of dollars a month for a single workload. Across an estate of dozens of services, low-utilisation Fargate is routinely 30–60% of container spend that could be reclaimed with no architectural change at all.
The second-order impact is that over-requesting masks where the workload actually lives. A team that pads every task "to be safe" never learns its real CPU and memory profile, so it can't reason about concurrency, batching, or whether a noisy neighbour problem is real. Generous task sizes paper over the questions that, answered, would make the service both cheaper and more predictable.
There's a compounding-discount impact too. Spot (70% off) and Graviton (20% off) both apply to the requested size, so an over-requested task wastes a proportionally larger discount: 70% off the wrong size is still paying for capacity you don't use. And Compute Savings Plans, which can cover Fargate, get committed against inflated usage — you end up locking in a one-year commitment sized to waste, the same stranding trap as over-sized Reserved Instances on EC2. Right-size before you commit.
Finally, the autoscaling interaction bites. A Service Auto Scaling target-tracking policy that targets, say, 50% CPU on a task reserving 4× what it needs will almost never scale — the per-task utilisation is structurally low — so you over-provision and lose the elasticity you thought you had. Right-sizing the task is what makes target-tracking work the way it's supposed to.
How do you right-size Fargate safely?
Right-sizing containers is a four-step loop that runs continuously as workloads evolve: read real utilisation, pick the smallest valid shape with headroom, roll it out as a new revision, then layer discounts and tighten autoscaling.
1. Read real utilisation from Container Insights
Enable CloudWatch Container Insights on the cluster so you get CpuUtilized/CpuReserved and MemoryUtilized/MemoryReserved per service and per task. Look at 14+ days, average and peak — memory especially, because a container that briefly touches its limit gets OOM-killed, not throttled. Size to cover the peak with comfortable headroom (target steady-state utilisation around 50–60%), not the average. Without this data you're guessing, and guessing is what created the over-request.
2. Choose the smallest valid CPU↔memory combination
Fargate only accepts fixed pairings, so right-sizing is a snap-to-grid exercise: find the smallest valid combo that still covers your peak. If CPU wants 0.5 vCPU but memory needs 4 GB, you're forced up to the 0.5 vCPU / 4 GB cell — and that mismatch is itself a signal to check whether the workload is memory-bound and might suit a different design. Let Compute Optimizer's ECS-on-Fargate recommendation propose the shape; trust its LOW-risk findings, eyeball the rest.
3. Roll out via a new task-definition revision, then tune desiredCount and autoscaling
Never edit a running task in place — register a new revision with the smaller cpu/memory and update-service --force-new-deployment so the rolling deploy keeps minimumHealthyPercent capacity up the whole time. Once the smaller tasks are stable, revisit desiredCount and the Service Auto Scaling target-tracking policy: a right-sized task makes target-tracking actually responsive, so you often need fewer baseline tasks and let scaling handle the peaks.
4. Layer Spot and Graviton, and don't commit before right-sizing
Once the shape is correct, compound the discounts. Move interruption-tolerant tasks (queue workers, batch, stateless replicas) to Fargate Spot via a capacity-provider strategy for ~70% off, and rebuild images multi-arch to run on ARM64/Graviton Fargate for ~20% off. Only after right-sizing should you size a Compute Savings Plan — committing against inflated usage strands the commitment. For ECS-on-EC2, also right-size the cluster's instances and capacity-provider Auto Scaling group and use bin-pack placement so tasks pack tightly.
# Route a service's tasks to Fargate Spot via a capacity-provider strategy (~70% off).
aws ecs update-service \
--cluster prod --service image-worker \
--capacity-provider-strategy \
capacityProvider=FARGATE_SPOT,weight=4 \
capacityProvider=FARGATE,weight=1,base=2 \
--force-new-deployment
# Tighten target-tracking so a right-sized task actually scales on demand.
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/prod/image-worker \
--scalable-dimension ecs:service:DesiredCount \
--policy-name cpu-target-50 --policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration \
'{"TargetValue":50.0,"PredefinedMetricSpecification":{"PredefinedMetricType":"ECSServiceAverageCPUUtilization"}}' Quick quiz
Question 1 of 5An ECS service runs 20 Fargate tasks defined at 4 vCPU / 8 GB. Container Insights shows CPU averaging 0.5 vCPU, peaking at 0.9, and memory averaging 1.3 GB, peaking at 1.7. What's the right next move?
You scored
0 / 5
Keep learning
Dig deeper into Fargate pricing, container utilisation tooling, and the right-sizing strategy around it.
- AWS Fargate pricing Per-vCPU-second and per-GB-second rates for X86 and ARM/Graviton, plus Fargate Spot — the numbers behind the sizing math.
- Amazon ECS task definition CPU and memory The full table of valid Fargate CPU↔memory combinations you must choose from when sizing a task.
- AWS Compute Optimizer — ECS services on Fargate How Compute Optimizer analyses Container Insights data and produces ECS-on-Fargate right-sizing recommendations.
- FinOps Foundation — Cloud Rate and Usage Optimization How container right-sizing fits the broader FinOps lifecycle and operating model.
You've completed Right-size ECS tasks and Fargate services. You now know why Fargate bills for the capacity you request rather than what you use, how to read real CPU and memory utilisation from Container Insights, how to snap to a valid CPU↔memory combo, and how to roll a smaller task-definition revision out with zero downtime — then compound the saving with Spot and Graviton. The next time a finance review flags a high-spend service, you'll have a four-step loop ready to run.
Back to the library