SageMaker Savings Plans: the basics
What you're actually buying when you commit to ML compute
A SageMaker Savings Plan is a commitment to spend a fixed dollar amount of Amazon SageMaker ML compute per hour — say $6/hour — for a one- or three-year term. In exchange, AWS discounts everything that hourly commitment covers by up to roughly 64% off On-Demand. You're not reserving a specific instance; you're pre-purchasing a flat rate of SageMaker compute consumption and letting AWS apply the discount automatically across the components that make up an ML platform: training jobs, real-time and batch inference endpoints, notebook and Studio instances, and processing jobs.
The defining feature is that this is the ONLY commitment lever for SageMaker. A Compute Savings Plan and an EC2 Instance Savings Plan do NOT cover SageMaker usage — those ml.* instances bill on their own meter and need their own plan. Within that scope a SageMaker SP is flexible: the discount applies automatically across instance families and sizes (ml.c5 to ml.g5), across regions, and across components, so moving a model from ml.c5.2xlarge to ml.g5.xlarge or shifting a training job to another region keeps the discount applied. What it will not do is reach outside SageMaker to your regular EC2, Fargate, or Lambda spend.
It's a finding worth surfacing because most ML platforms run a large, predictable inference baseline at full On-Demand price with no commitment in place — while assuming their general Compute Savings Plan already covers it. It doesn't. That steady endpoint baseline is the cheapest possible thing to discount: it's going to serve traffic anyway, so paying On-Demand for it is leaving 30–64% on the table every hour. The discipline is to size the commitment to the floor of your inference usage and leave the bursty training peaks on On-Demand.
In this lesson you'll learn exactly what a SageMaker Savings Plan commits you to, why it's the only commitment that covers SageMaker, and the commit-to-the-inference-floor strategy that captures the discount on the steady serving baseline while leaving bursty training on On-Demand. You'll see how AWS generates a SAGEMAKER_SP recommendation in Cost Explorer, how to read coverage versus utilisation, the payment-option tradeoffs (No/Partial/All Upfront), the 1-year versus 3-year decision, and why right-sizing endpoints — and considering Serverless Inference and multi-model endpoints — must come before committing. You'll get the real CLI calls to pull a recommendation and check coverage, and the failure mode unique to ML: over-committing against training spikes that don't recur.
The commitment that survived the model swap
A team committed to a 1-year SageMaker Savings Plan at $5/hour sized to their real-time inference endpoints running on ml.c5.2xlarge instances. Four months later they re-architected: they moved the endpoints to ml.g5.xlarge GPU instances for a new transformer model, shifted one endpoint to another region for latency, and added a multi-model endpoint. They didn't touch the Savings Plan once — because a SageMaker SP commits dollars-per-hour of ML compute, not a machine, the discount re-applied itself across every instance-family, size, and region change automatically. Had they tried to lean on their general Compute Savings Plan instead, none of it would have applied: Compute SPs don't cover SageMaker at all, and the entire ML platform would have been running at full On-Demand.
Buying a SageMaker Savings Plan in action
Marcus runs FinOps at a company with a growing ML platform spending about $40k/month on SageMaker, almost all of it On-Demand. The dashboard shows SageMaker Savings Plan coverage at 0% — the team assumed the company's existing Compute SP covered it, but Compute SPs don't touch SageMaker. Reading the line carefully, he sees roughly $22k/month of steady real-time inference endpoints that have run flat for six months, plus a spiky $18k/month of training jobs that come and go with experiments.
Before committing a dollar, he checks the endpoints are right-sized — there's no point locking in a discount on an endpoint that should be a tier smaller, or that should be on Serverless Inference because its traffic is intermittent. With two low-traffic endpoints moved to Serverless and the rest right-sized, he pulls the AWS Cost Explorer SageMaker SP purchase recommendation. AWS, looking at 30 days of usage, suggests a SageMaker Savings Plan at roughly $7/hour on a 1-year No Upfront term — but that number includes the training spikes from the lookback window.
Marcus deliberately commits below the AWS number. The recommendation optimises for maximum coverage and was inflated by a heavy training month; he wants to cover the steady inference floor and leave the bursty training top on On-Demand so he never pays for unused commitment. He buys $4/hour, 1-year No Upfront, and sets a calendar reminder to re-check coverage and utilisation in 30 days — planning to layer a second small plan on top once the first is proven near 100% utilised.
First, ask AWS Cost Explorer for a SageMaker Savings Plan purchase recommendation based on recent usage — note the savings-plans-type is SAGEMAKER_SP, distinct from COMPUTE_SP.
The recommended hourly commitment includes bursty training; the disciplined buy sits below it, on the steady inference baseline.
After buying, check coverage on SageMaker spend to see how much eligible ML compute the plan is discounting, and confirm the bursty training top is still left flexible.
Coverage shows the share of eligible SageMaker compute under the plan; the rest is the volatile training spend kept flexible deliberately.
SageMaker Savings Plans under the hooddeep dive
A SageMaker Savings Plan applies as a billing-time discount, not a capacity reservation. Every hour, AWS takes your committed dollar rate and applies it to your eligible SageMaker usage in order of highest discount percentage first — so it greedily covers the usage that benefits most, which is why a single plan can span training jobs, real-time and batch inference endpoints, notebook and Studio instances, and processing jobs simultaneously. Eligible SageMaker usage above your commitment bills at On-Demand; unused commitment below your usage is simply wasted (you paid for it regardless). Crucially, the eligible scope stops at SageMaker: a Compute Savings Plan or EC2 Instance Savings Plan never applies to ml.* usage, and a SageMaker SP never applies to ordinary EC2, Fargate, or Lambda.
Within SageMaker the plan is fully flexible: the discount follows across instance families and sizes (ml.c5, ml.m5, ml.g5, and so on), across regions, and across components — move a model from CPU to GPU instances or shift a training job to another region and the discount re-applies automatically. The bursty shape of ML spend is the thing to reason about: training jobs spike the hourly rate for hours or days and then disappear, while inference endpoints hold a steady rate to serve production traffic. The purchase recommendation from ce get-savings-plans-purchase-recommendation is computed over a lookback window, so a heavy training month inflates the recommended hourly commitment above the durable inference floor — which is exactly the number you should commit to.
Payment options trade cash flow for rate: No Upfront pays the hourly commitment monthly across the term; All Upfront pays the whole term in advance for the deepest discount (typically a couple of points better than No Upfront); Partial Upfront is the midpoint. The term itself is the bigger lever — a 3-year plan discounts meaningfully more than a 1-year, but commits you for three times as long, which is the wrong bet on ML platforms in active model and architecture flux. AWS surfaces all of these permutations through the recommendation API; you choose the term and payment option as request parameters with --savings-plans-type SAGEMAKER_SP, and AWS returns the matching recommended hourly commitment, estimated savings, and ROI.
# Compare the savings of a 3-year All Upfront SageMaker plan against a 1-year No Upfront plan.
aws ce get-savings-plans-purchase-recommendation \
--savings-plans-type SAGEMAKER_SP \
--term-in-years THREE_YEARS \
--payment-option ALL_UPFRONT \
--lookback-period-in-days SIXTY_DAYS \
--query 'SavingsPlansPurchaseRecommendation.SavingsPlansPurchaseRecommendationSummary.{Commit:HourlyCommitmentToPurchase,Pct:EstimatedSavingsPercentage,ROI:EstimatedROI}'
# Watch utilisation after purchase — anything below ~95% means you sized to the training spikes, not the inference floor.
aws ce get-savings-plans-utilization \
--time-period Start=2026-05-01,End=2026-05-26 \
--granularity MONTHLY \
--query 'SavingsPlansUtilizationsByTime[0].Utilization' What is the impact of committing (or not) to a SageMaker Savings Plan?
The direct impact of buying is rate. A stable $22k/month inference baseline running at full On-Demand, moved under a well-sized SageMaker Savings Plan, commonly drops 25–40% on the covered portion — $6–9k/month off the bill for zero change in what's running. Over a 3-year term that's six figures of pure rate savings on ML compute the business was always going to serve. Not committing has the inverse impact: every hour the predictable inference floor runs at On-Demand is the most expensive way to buy the cheapest-to-discount thing the ML platform owns — and because Compute SPs don't reach SageMaker, this floor stays exposed even when the rest of the estate is well-covered.
The second-order impact is over-commitment, sharper for ML than for general compute because the spend is bursty. A Savings Plan is take-or-pay: commit to $7/hour because a heavy training month made it look durable, then watch the experiments end and steady usage settle at $4/hour, and you still pay the full $7 — the $3 gap is waste, locked in for the whole term, with no resale market. This is why disciplined teams commit to the steady inference floor and leave the spiky training top on On-Demand; the extra coverage from chasing AWS's lookback-inflated recommendation is not worth stranding commitment when the training tide goes out.
There's a sequencing impact that bites hard: right-sizing must come before committing. If you commit while endpoints are oversized — or while intermittent-traffic endpoints are running 24/7 on provisioned instances instead of Serverless Inference, or as separate single-model endpoints instead of a multi-model endpoint — you lock in a discount on capacity you shouldn't be running. When you later right-size, your committed dollar-per-hour is now larger than your shrunken eligible usage, utilisation drops, and the commitment strands. The correct order is always right-size and re-architect first, then commit to the true, optimised inference floor.
Finally there's a term-and-flexibility impact. A 3-year All Upfront plan maximises the headline discount but minimises agility — it's the right bet only on durable, stable inference. An ML platform mid-evolution (swapping model architectures, moving CPU to GPU, changing endpoint topology) wants the SageMaker SP's automatic in-scope flexibility and a shorter 1-year term so the commitment can follow the models without stranding. Matching term to model durability is where most of the real money — and most of the avoidable waste — lives in an ML cost program.
How do you commit to SageMaker Savings Plans safely?
Buying an ML commitment well is a repeatable loop on the FinOps cadence: right-size and re-architect endpoints first, isolate the steady inference floor from the bursty training, commit below the lookback-inflated recommendation, then watch coverage and utilisation and layer up as the program proves out.
1. Right-size and re-architect endpoints before committing anything
Run inference at the right size, move intermittent-traffic endpoints to Serverless Inference, and consolidate multiple small models onto multi-model endpoints first. Committing to a SageMaker Savings Plan on an oversized or wrongly-shaped endpoint fleet locks in a discount on capacity you shouldn't be running, and the commitment strands the moment you later optimise. The order is non-negotiable: optimise usage, then optimise rate. A plan applied to a right-sized inference baseline is savings; one applied to an unoptimised fleet is a multi-year mistake.
2. Commit to the steady inference floor, not the training peaks
Separate the SageMaker line into its two natures: the steady inference baseline that serves production traffic, and the bursty training, batch, and processing spend that spikes and vanishes. Pull 30–60 days of usage and find the dollar-per-hour level the inference endpoints run essentially all the time — that floor, not the average and not the training-inflated peak, is what you commit to. Everything above it stays on On-Demand. AWS's recommendation is computed over a lookback window, so a heavy training month pushes it above the durable floor; treat it as a ceiling, not a target.
3. Choose term and payment for model durability and cash, not headline rate
Use 1-year No Upfront as the default for a young ML program or platforms where model architectures are still changing — it preserves flexibility and cash at a small rate premium. Reserve 3-year and All/Partial Upfront for inference workloads that have proven durable, where the deeper discount justifies locking cash and a three-year commitment. Remember there is no Compute SP or EC2 SP fallback for SageMaker — this is the only lever, so size it deliberately rather than over-buying to be safe.
4. Monitor coverage and utilisation on the SageMaker line, then layer up
After buying, watch two numbers scoped to SageMaker on the FinOps cadence: utilisation (must stay near 100% — below ~95% means you sized to the training spikes) and coverage (the share of eligible ML spend discounted, climbing toward 70–80%). Once a plan is proven at high utilisation, layer a second small plan onto the next slice of the inference floor as it grows. This incremental approach captures most of the savings while keeping over-commitment risk near zero — far safer than one big upfront bet sized to a peak training month.
# Pull the SageMaker recommendation, then check both metrics that govern the buy.
# 1) What would AWS recommend at the 1-year No Upfront level? (treat as a ceiling)
aws ce get-savings-plans-purchase-recommendation \
--savings-plans-type SAGEMAKER_SP \
--term-in-years ONE_YEAR \
--payment-option NO_UPFRONT \
--lookback-period-in-days THIRTY_DAYS \
--query 'SavingsPlansPurchaseRecommendation.SavingsPlansPurchaseRecommendationSummary.HourlyCommitmentToPurchase'
# 2) Where is coverage today on the SageMaker line (how much inference floor is still exposed)?
aws ce get-savings-plans-coverage \
--time-period Start=2026-05-01,End=2026-05-26 \
--granularity MONTHLY \
--query 'SavingsPlansCoverages[].Coverage.CoveragePercentage'
# 3) After buying, confirm you didn't size to the training spikes.
aws ce get-savings-plans-utilization \
--time-period Start=2026-05-01,End=2026-05-26 \
--granularity MONTHLY \
--query 'Total.UtilizationPercentage' Quick quiz
Question 1 of 5Your SageMaker line runs $22k/month of steady real-time inference plus a bursty $18k/month of training. Your company already has a Compute Savings Plan covering EC2. What's the right commitment move?
You scored
0 / 5
Keep learning
Dig deeper into SageMaker Savings Plans mechanics, ML inference cost optimisation, and how rate optimisation fits the FinOps lifecycle.
- AWS SageMaker Savings Plans Authoritative reference on what SageMaker Savings Plans cover (training, inference, notebooks, processing), the discount, and the purchase workflow.
- Amazon SageMaker pricing On-Demand rates by ml instance family and component, plus Serverless Inference pricing — the numbers behind the commit-to-the-floor decision.
- AWS Cost Explorer — Savings Plans recommendations How AWS generates SAGEMAKER_SP purchase recommendations and reports coverage and utilisation — the numbers behind the inference-floor decision.
- FinOps Foundation — Cloud Rate and Usage Optimization How commitment-based discounts fit the broader FinOps lifecycle and operating model.
You've completed Purchase SageMaker Savings Plans. You now know that a SageMaker SP is the only commitment lever that covers ML compute, what it commits you to across training, inference, notebooks, and processing, the commit-to-the-inference-floor strategy that captures the discount while leaving bursty training flexible, and why right-sizing endpoints — including moving intermittent ones to Serverless Inference — must come before committing. The next time the dashboard flags zero SageMaker coverage on a steady inference baseline, you'll have a defensible path from recommendation to a disciplined, well-utilised commitment.
Back to the library