Cost

Purchase SageMaker Savings Plans

Commit to a steady dollar-per-hour of Amazon SageMaker ML compute for one or three years and take up to ~64% off On-Demand — but size it to the steady inference baseline, not the training spikes.

14 min·10 sections·AWS

Last reviewed 27 May 2026

SageMaker Savings Plans: the basics

What you're actually buying when you commit to ML compute

A SageMaker Savings Plan is a commitment to spend a fixed dollar amount of Amazon SageMaker ML compute per hour — say $6/hour — for a one- or three-year term. In exchange, AWS discounts everything that hourly commitment covers by up to roughly 64% off On-Demand. You're not reserving a specific instance; you're pre-purchasing a flat rate of SageMaker compute consumption and letting AWS apply the discount automatically across the components that make up an ML platform: training jobs, real-time and batch inference endpoints, notebook and Studio instances, and processing jobs.

The defining feature is that this is the ONLY commitment lever for SageMaker. A Compute Savings Plan and an EC2 Instance Savings Plan do NOT cover SageMaker usage — those ml.* instances bill on their own meter and need their own plan. Within that scope a SageMaker SP is flexible: the discount applies automatically across instance families and sizes (ml.c5 to ml.g5), across regions, and across components, so moving a model from ml.c5.2xlarge to ml.g5.xlarge or shifting a training job to another region keeps the discount applied. What it will not do is reach outside SageMaker to your regular EC2, Fargate, or Lambda spend.

It's a finding worth surfacing because most ML platforms run a large, predictable inference baseline at full On-Demand price with no commitment in place — while assuming their general Compute Savings Plan already covers it. It doesn't. That steady endpoint baseline is the cheapest possible thing to discount: it's going to serve traffic anyway, so paying On-Demand for it is leaving 30–64% on the table every hour. The discipline is to size the commitment to the floor of your inference usage and leave the bursty training peaks on On-Demand.

In this lesson you'll learn exactly what a SageMaker Savings Plan commits you to, why it's the only commitment that covers SageMaker, and the commit-to-the-inference-floor strategy that captures the discount on the steady serving baseline while leaving bursty training on On-Demand. You'll see how AWS generates a SAGEMAKER_SP recommendation in Cost Explorer, how to read coverage versus utilisation, the payment-option tradeoffs (No/Partial/All Upfront), the 1-year versus 3-year decision, and why right-sizing endpoints — and considering Serverless Inference and multi-model endpoints — must come before committing. You'll get the real CLI calls to pull a recommendation and check coverage, and the failure mode unique to ML: over-committing against training spikes that don't recur.

Fun fact

The commitment that survived the model swap

A team committed to a 1-year SageMaker Savings Plan at $5/hour sized to their real-time inference endpoints running on ml.c5.2xlarge instances. Four months later they re-architected: they moved the endpoints to ml.g5.xlarge GPU instances for a new transformer model, shifted one endpoint to another region for latency, and added a multi-model endpoint. They didn't touch the Savings Plan once — because a SageMaker SP commits dollars-per-hour of ML compute, not a machine, the discount re-applied itself across every instance-family, size, and region change automatically. Had they tried to lean on their general Compute Savings Plan instead, none of it would have applied: Compute SPs don't cover SageMaker at all, and the entire ML platform would have been running at full On-Demand.

Buying a SageMaker Savings Plan in action

Marcus runs FinOps at a company with a growing ML platform spending about $40k/month on SageMaker, almost all of it On-Demand. The dashboard shows SageMaker Savings Plan coverage at 0% — the team assumed the company's existing Compute SP covered it, but Compute SPs don't touch SageMaker. Reading the line carefully, he sees roughly $22k/month of steady real-time inference endpoints that have run flat for six months, plus a spiky $18k/month of training jobs that come and go with experiments.

Before committing a dollar, he checks the endpoints are right-sized — there's no point locking in a discount on an endpoint that should be a tier smaller, or that should be on Serverless Inference because its traffic is intermittent. With two low-traffic endpoints moved to Serverless and the rest right-sized, he pulls the AWS Cost Explorer SageMaker SP purchase recommendation. AWS, looking at 30 days of usage, suggests a SageMaker Savings Plan at roughly $7/hour on a 1-year No Upfront term — but that number includes the training spikes from the lookback window.

Marcus deliberately commits below the AWS number. The recommendation optimises for maximum coverage and was inflated by a heavy training month; he wants to cover the steady inference floor and leave the bursty training top on On-Demand so he never pays for unused commitment. He buys $4/hour, 1-year No Upfront, and sets a calendar reminder to re-check coverage and utilisation in 30 days — planning to layer a second small plan on top once the first is proven near 100% utilised.

First, ask AWS Cost Explorer for a SageMaker Savings Plan purchase recommendation based on recent usage — note the savings-plans-type is SAGEMAKER_SP, distinct from COMPUTE_SP.

$ aws ce get-savings-plans-purchase-recommendation --savings-plans-type SAGEMAKER_SP --term-in-years ONE_YEAR --payment-option NO_UPFRONT --lookback-period-in-days THIRTY_DAYS --query 'SavingsPlansPurchaseRecommendation.SavingsPlansPurchaseRecommendationSummary'

{

"EstimatedROI": "34.2",

"CurrencyCode": "USD",

"HourlyCommitmentToPurchase": "7.18",

"EstimatedSavingsAmount": "4180.55",

"EstimatedSavingsPercentage": "24.1",

"EstimatedMonthlySavingsAmount": "4180.55",

"EstimatedOnDemandCostWithCurrentCommitment": "17340.00"

}

# $7.18/hr was inflated by training spikes in the lookback — commit to the inference floor, not this ceiling.

The recommended hourly commitment includes bursty training; the disciplined buy sits below it, on the steady inference baseline.

After buying, check coverage on SageMaker spend to see how much eligible ML compute the plan is discounting, and confirm the bursty training top is still left flexible.

$ aws ce get-savings-plans-coverage --time-period Start=2026-05-01,End=2026-05-26 --filter '{"Dimensions":{"Key":"SAVINGS_PLAN_ARN","Values":["SageMakerSavingsPlans"]}}' --granularity MONTHLY --query 'SavingsPlansCoverages[0].Coverage'

{

"SpendCoveredBySavingsPlans": "21640.10",

"OnDemandCost": "18120.44",

"TotalCost": "39760.54",

"CoveragePercentage": "54.4"

}

# Inference floor covered; the ~46% On-Demand is the bursty training top, left flexible on purpose.

Coverage shows the share of eligible SageMaker compute under the plan; the rest is the volatile training spend kept flexible deliberately.

SageMaker Savings Plans under the hooddeep dive

A SageMaker Savings Plan applies as a billing-time discount, not a capacity reservation. Every hour, AWS takes your committed dollar rate and applies it to your eligible SageMaker usage in order of highest discount percentage first — so it greedily covers the usage that benefits most, which is why a single plan can span training jobs, real-time and batch inference endpoints, notebook and Studio instances, and processing jobs simultaneously. Eligible SageMaker usage above your commitment bills at On-Demand; unused commitment below your usage is simply wasted (you paid for it regardless). Crucially, the eligible scope stops at SageMaker: a Compute Savings Plan or EC2 Instance Savings Plan never applies to ml.* usage, and a SageMaker SP never applies to ordinary EC2, Fargate, or Lambda.

Within SageMaker the plan is fully flexible: the discount follows across instance families and sizes (ml.c5, ml.m5, ml.g5, and so on), across regions, and across components — move a model from CPU to GPU instances or shift a training job to another region and the discount re-applies automatically. The bursty shape of ML spend is the thing to reason about: training jobs spike the hourly rate for hours or days and then disappear, while inference endpoints hold a steady rate to serve production traffic. The purchase recommendation from ce get-savings-plans-purchase-recommendation is computed over a lookback window, so a heavy training month inflates the recommended hourly commitment above the durable inference floor — which is exactly the number you should commit to.

Payment options trade cash flow for rate: No Upfront pays the hourly commitment monthly across the term; All Upfront pays the whole term in advance for the deepest discount (typically a couple of points better than No Upfront); Partial Upfront is the midpoint. The term itself is the bigger lever — a 3-year plan discounts meaningfully more than a 1-year, but commits you for three times as long, which is the wrong bet on ML platforms in active model and architecture flux. AWS surfaces all of these permutations through the recommendation API; you choose the term and payment option as request parameters with --savings-plans-type SAGEMAKER_SP, and AWS returns the matching recommended hourly commitment, estimated savings, and ROI.

# Compare the savings of a 3-year All Upfront SageMaker plan against a 1-year No Upfront plan.
aws ce get-savings-plans-purchase-recommendation \
  --savings-plans-type SAGEMAKER_SP \
  --term-in-years THREE_YEARS \
  --payment-option ALL_UPFRONT \
  --lookback-period-in-days SIXTY_DAYS \
  --query 'SavingsPlansPurchaseRecommendation.SavingsPlansPurchaseRecommendationSummary.{Commit:HourlyCommitmentToPurchase,Pct:EstimatedSavingsPercentage,ROI:EstimatedROI}'

# Watch utilisation after purchase — anything below ~95% means you sized to the training spikes, not the inference floor.
aws ce get-savings-plans-utilization \
  --time-period Start=2026-05-01,End=2026-05-26 \
  --granularity MONTHLY \
  --query 'SavingsPlansUtilizationsByTime[0].Utilization'

What is the impact of committing (or not) to a SageMaker Savings Plan?

The direct impact of buying is rate. A stable $22k/month inference baseline running at full On-Demand, moved under a well-sized SageMaker Savings Plan, commonly drops 25–40% on the covered portion — $6–9k/month off the bill for zero change in what's running. Over a 3-year term that's six figures of pure rate savings on ML compute the business was always going to serve. Not committing has the inverse impact: every hour the predictable inference floor runs at On-Demand is the most expensive way to buy the cheapest-to-discount thing the ML platform owns — and because Compute SPs don't reach SageMaker, this floor stays exposed even when the rest of the estate is well-covered.

The second-order impact is over-commitment, sharper for ML than for general compute because the spend is bursty. A Savings Plan is take-or-pay: commit to $7/hour because a heavy training month made it look durable, then watch the experiments end and steady usage settle at $4/hour, and you still pay the full $7 — the $3 gap is waste, locked in for the whole term, with no resale market. This is why disciplined teams commit to the steady inference floor and leave the spiky training top on On-Demand; the extra coverage from chasing AWS's lookback-inflated recommendation is not worth stranding commitment when the training tide goes out.

There's a sequencing impact that bites hard: right-sizing must come before committing. If you commit while endpoints are oversized — or while intermittent-traffic endpoints are running 24/7 on provisioned instances instead of Serverless Inference, or as separate single-model endpoints instead of a multi-model endpoint — you lock in a discount on capacity you shouldn't be running. When you later right-size, your committed dollar-per-hour is now larger than your shrunken eligible usage, utilisation drops, and the commitment strands. The correct order is always right-size and re-architect first, then commit to the true, optimised inference floor.

Finally there's a term-and-flexibility impact. A 3-year All Upfront plan maximises the headline discount but minimises agility — it's the right bet only on durable, stable inference. An ML platform mid-evolution (swapping model architectures, moving CPU to GPU, changing endpoint topology) wants the SageMaker SP's automatic in-scope flexibility and a shorter 1-year term so the commitment can follow the models without stranding. Matching term to model durability is where most of the real money — and most of the avoidable waste — lives in an ML cost program.

How do you commit to SageMaker Savings Plans safely?

Buying an ML commitment well is a repeatable loop on the FinOps cadence: right-size and re-architect endpoints first, isolate the steady inference floor from the bursty training, commit below the lookback-inflated recommendation, then watch coverage and utilisation and layer up as the program proves out.

1. Right-size and re-architect endpoints before committing anything

Run inference at the right size, move intermittent-traffic endpoints to Serverless Inference, and consolidate multiple small models onto multi-model endpoints first. Committing to a SageMaker Savings Plan on an oversized or wrongly-shaped endpoint fleet locks in a discount on capacity you shouldn't be running, and the commitment strands the moment you later optimise. The order is non-negotiable: optimise usage, then optimise rate. A plan applied to a right-sized inference baseline is savings; one applied to an unoptimised fleet is a multi-year mistake.

2. Commit to the steady inference floor, not the training peaks

Separate the SageMaker line into its two natures: the steady inference baseline that serves production traffic, and the bursty training, batch, and processing spend that spikes and vanishes. Pull 30–60 days of usage and find the dollar-per-hour level the inference endpoints run essentially all the time — that floor, not the average and not the training-inflated peak, is what you commit to. Everything above it stays on On-Demand. AWS's recommendation is computed over a lookback window, so a heavy training month pushes it above the durable floor; treat it as a ceiling, not a target.

3. Choose term and payment for model durability and cash, not headline rate

Use 1-year No Upfront as the default for a young ML program or platforms where model architectures are still changing — it preserves flexibility and cash at a small rate premium. Reserve 3-year and All/Partial Upfront for inference workloads that have proven durable, where the deeper discount justifies locking cash and a three-year commitment. Remember there is no Compute SP or EC2 SP fallback for SageMaker — this is the only lever, so size it deliberately rather than over-buying to be safe.

4. Monitor coverage and utilisation on the SageMaker line, then layer up

After buying, watch two numbers scoped to SageMaker on the FinOps cadence: utilisation (must stay near 100% — below ~95% means you sized to the training spikes) and coverage (the share of eligible ML spend discounted, climbing toward 70–80%). Once a plan is proven at high utilisation, layer a second small plan onto the next slice of the inference floor as it grows. This incremental approach captures most of the savings while keeping over-commitment risk near zero — far safer than one big upfront bet sized to a peak training month.

# Pull the SageMaker recommendation, then check both metrics that govern the buy.
# 1) What would AWS recommend at the 1-year No Upfront level? (treat as a ceiling)
aws ce get-savings-plans-purchase-recommendation \
  --savings-plans-type SAGEMAKER_SP \
  --term-in-years ONE_YEAR \
  --payment-option NO_UPFRONT \
  --lookback-period-in-days THIRTY_DAYS \
  --query 'SavingsPlansPurchaseRecommendation.SavingsPlansPurchaseRecommendationSummary.HourlyCommitmentToPurchase'

# 2) Where is coverage today on the SageMaker line (how much inference floor is still exposed)?
aws ce get-savings-plans-coverage \
  --time-period Start=2026-05-01,End=2026-05-26 \
  --granularity MONTHLY \
  --query 'SavingsPlansCoverages[].Coverage.CoveragePercentage'

# 3) After buying, confirm you didn't size to the training spikes.
aws ce get-savings-plans-utilization \
  --time-period Start=2026-05-01,End=2026-05-26 \
  --granularity MONTHLY \
  --query 'Total.UtilizationPercentage'

Quick quiz

Question 1 of 5

Your SageMaker line runs $22k/month of steady real-time inference plus a bursty $18k/month of training. Your company already has a Compute Savings Plan covering EC2. What's the right commitment move?

Keep learning

Dig deeper into SageMaker Savings Plans mechanics, ML inference cost optimisation, and how rate optimisation fits the FinOps lifecycle.

You've completed Purchase SageMaker Savings Plans. You now know that a SageMaker SP is the only commitment lever that covers ML compute, what it commits you to across training, inference, notebooks, and processing, the commit-to-the-inference-floor strategy that captures the discount while leaving bursty training flexible, and why right-sizing endpoints — including moving intermittent ones to Serverless Inference — must come before committing. The next time the dashboard flags zero SageMaker coverage on a steady inference baseline, you'll have a defensible path from recommendation to a disciplined, well-utilised commitment.

Back to the library

SageMaker Savings Plans: what it means for the bill

A separate commitment lever for a separate ML compute meter

A SageMaker Savings Plan is a contract: you agree to spend a set dollar amount on SageMaker ML compute every hour for one or three years, and AWS charges you a discounted rate — up to about 64% below On-Demand — for everything that commitment covers. The commitment is denominated in dollars per hour, not in servers, which is the important part: you are buying a discount on a level of spend, not a specific machine. Whatever SageMaker compute the data science team runs, up to that hourly dollar amount, is billed at the lower rate automatically.

The detail that catches finance teams out is that ML compute is its own line. The general Compute Savings Plan that already covers your EC2, Fargate, and Lambda spend does not touch SageMaker — those ml.* hours bill separately and need their own SageMaker Savings Plan. So an organisation can show healthy commitment coverage on the main invoice while its entire ML platform runs at full On-Demand price, invisible until someone reads the SageMaker line specifically. On a platform spending $40k/month on SageMaker with a stable inference baseline, a well-sized commitment commonly takes $12–20k/month off the bill for no change in what's running.

The risk is the mirror image, and it's sharper here than for general compute because ML spend is bursty. Training runs spike — a single experiment can multiply the hourly rate for days, then vanish. Inference endpoints, by contrast, run steady to serve production traffic. The mistake is sizing the commitment to total SageMaker spend including the training peaks; you then over-commit, and when the experiments end you pay for the full hourly amount you can't consume — pure waste, locked in for the term, with no resale market. The two numbers to watch are coverage (how much of eligible SageMaker spend the plan discounts) and utilisation (how much of the commitment you actually used). Healthy programs commit to the steady inference floor and leave the spiky training top on On-Demand.

This lesson is for the finance partner who sees a SageMaker line on the invoice running at full price and assumes the company's existing Compute Savings Plan already covers it. It explains why ML compute is a separate commitment, how coverage and utilisation behave on the SageMaker line specifically, why the bursty nature of training makes over-commitment the headline risk, and how the 1-year versus 3-year and upfront-payment choices trade discount against cash. By the end you'll know what 'good' coverage and utilisation look like on the ML line, and why endpoint right-sizing has to happen first.

Fun fact

The commitment that survived the model swap

How a finance partner frames the commitment

Priya is the finance partner for the ML platform org. At the monthly cost review she sees SageMaker running at $40k/month with Savings Plan coverage at 0% — and she catches the assumption on the table that the company's general Compute Savings Plan already covers it. It doesn't; ML compute is a separate meter and a separate commitment. She asks the question that's now standard on the agenda: "What's our steady inference baseline on SageMaker, and why isn't it under its own commitment?" The engineering lead confirms roughly $22k/month of inference has been flat for half a year, with the rest being bursty training. That's the discountable floor.

The conversation is about sizing and risk, not syntax. Priya asks three things: are the endpoints right-sized first — and could any move to Serverless Inference — so we're not committing to the wrong shape; do we commit for one year or three; and do we pay upfront. They settle on starting with a 1-year No Upfront plan sized to the inference floor — not to AWS's recommendation, which was inflated by a heavy training month in the lookback window. The deeper 3-year and partial-upfront discounts can come once they trust the baseline.

She adds two recurring lines to the finance pack, scoped specifically to SageMaker: coverage (target climbing toward 70–80% of eligible ML spend) and utilisation (must stay near 100%). The bursty nature of training makes utilisation the one to watch hardest — if it dips below ~95%, that's the signal they sized to the training peaks instead of the inference floor, and it becomes the conversation, not the headline discount rate. A few months later coverage is at 70% and utilisation is 99%; Priya knows that's a healthy program, and that the remaining On-Demand is the volatile training top they intentionally left uncommitted.

Why this matters to the budget, not just the bill

The headline budget impact is a step-change reduction in unit cost on a large, predictable slice of the ML line. As the ML platform grows, SageMaker becomes one of the bigger single categories on the cloud invoice, and the inference baseline portion is the most forecastable spend within it. Moving it under its own commitment converts variable full-price spend into contracted discounted spend — typically 25–40% lower on the covered amount — which both reduces the bill and tightens the forecast, because committed spend is far more predictable than On-Demand.

The first thing to get right is recognising that this is a separate lever. The company's general Compute Savings Plan does not cover SageMaker; a healthy coverage number on the main invoice can sit alongside a SageMaker line running entirely at full price. Finance should ask for coverage and utilisation scoped specifically to SageMaker, not assume the headline commitment metrics include it. Otherwise the single largest no-risk discount on the ML platform stays invisible.

The metric to govern is the pair of coverage and utilisation on the ML line. Coverage is the share of eligible SageMaker compute the plan is discounting — a maturing program climbs toward 70–80% and deliberately leaves the bursty training top uncovered. Utilisation is how much of the commitment you actually used — this must stay near 100%, and because training is spiky, utilisation is the metric that exposes the classic ML mistake: sizing the commitment to a heavy training month and then carrying it when the experiments end. If utilisation drops below ~95%, you have stranded commitment, and there is no resale market to unwind it.

The most important budget discipline is sequencing: insist that engineering right-sizes endpoints — and moves intermittent ones to Serverless Inference or consolidates them into multi-model endpoints — before committing. Committing to an oversized or wrongly-architected endpoint fleet locks in a discount on waste and creates stranded commitment the moment the platform is later corrected. The order is always right-size, then commit, and the commitment is always sized to the steady inference floor, never the training peaks.

What finance can actually do about this

Finance can't buy the plans from a console, but it owns the framing that makes an ML commitment a saving instead of a liability. Four levers, used together on the monthly cadence.

1. Ask for SageMaker coverage and utilisation specifically

Don't accept the headline commitment metrics as covering ML — they don't. The general Compute Savings Plan never touches SageMaker, so a healthy main-invoice coverage number can hide an ML platform running entirely at full price. Put SageMaker-scoped coverage and utilisation on the monthly cost pack as their own lines so the no-risk discount on the inference floor can't stay invisible.

2. Make endpoint right-sizing a precondition for any commitment

Set the rule that no SageMaker Savings Plan gets purchased until the inference fleet has been right-sized and intermittent endpoints moved to Serverless Inference or consolidated onto multi-model endpoints. This sequencing rule prevents the most expensive mistake — locking in a multi-year discount on capacity that shouldn't be running that way, then stranding it when the platform is later corrected.

3. Size to the inference floor, never the training month

Because training spend is bursty, AWS's lookback-based recommendation is often inflated by a heavy experimentation month. Treat it as a ceiling. Sanction commitments sized to the steady inference baseline only, and watch utilisation as the early warning: if it slips below ~95%, the commitment was sized to peaks that didn't recur, and because there's no resale market it can't be unwound until the term ends, so it must be caught early.

4. Commit incrementally and layer up

Don't sanction one large commitment sized to AWS's maximum recommendation. Sponsor a layered approach: commit to the proven inference floor, prove high utilisation, then add the next slice as the baseline grows. This keeps over-commitment risk near zero while capturing the bulk of the savings, and it turns ML commitment buying into a routine, low-drama line on the cadence rather than an annual high-stakes bet against next quarter's experiment volume.

Quick quiz

Question 1 of 5

After buying a SageMaker Savings Plan, the report shows coverage at 70% but utilisation has fallen to 86% and is still dropping. What's the right read and move?

Keep learning

Dig deeper into SageMaker Savings Plans mechanics, ML inference cost optimisation, and how rate optimisation fits the FinOps lifecycle.

You've finished the finance partner's view of SageMaker Savings Plans. You know why ML compute is a separate commitment the general Compute SP doesn't cover, how the discount converts variable On-Demand inference into a lower contracted rate, why coverage and utilisation on the SageMaker line are the two metrics to govern, why the bursty nature of training makes over-commitment the headline risk, and why endpoint right-sizing has to come first. Next time the ML line shows up at the monthly review, you'll have a sharper question than "isn't that already covered?"

Back to the library

SageMaker Savings Plans: the headline

A standing discount on the ML serving capacity you run every hour

Production machine learning runs a predictable baseline — inference endpoints serving traffic continuously — that isn't going away. Buying that baseline at full On-Demand price is like renting your data centre by the night. A SageMaker Savings Plan is a one- or three-year commitment that takes up to roughly two-thirds off the rate on that baseline, and it's the only commitment lever that touches SageMaker at all — the general compute commitment the company already has does not cover it.

The decision is financial with an engineering precondition: commit to the steady inference floor, keep the bursty training spend on flexible pricing, and right-size the endpoints before you commit so you don't lock in the wrong shape. The single question to ask is about coverage and utilisation on the SageMaker spend specifically — how much of it is under commitment, and how fully the commitment is used. A mature program runs both high without over-committing to chase the training peaks.

A short read for the exec who wants the headline and the one question to ask. You'll get the rule-of-thumb — commit to the steady inference floor, stay flexible on the bursty training top — why SageMaker needs its own commitment even when general compute is already covered, and what 'good' looks like at the portfolio level. No commands, no internals: just the shape of the decision and the trap to avoid.

Fun fact

The commitment that survived the model swap

What it looks like when the org gets this right

At one company the quarterly cloud review used to show SageMaker as a single large On-Demand number, with everyone assuming the general compute commitment covered it. It didn't — ML compute is its own meter. The exec sponsor stopped asking "can we cut the ML bill?" and started asking two questions: "How much of our steady inference is under its own SageMaker commitment, and is that commitment fully used?"

Within two quarters the picture changed. Engineering right-sized the endpoints first — moving intermittent ones to Serverless Inference — then layered a SageMaker Savings Plan onto the stable inference baseline, starting with a one-year term to stay nimble while models were still changing. Coverage on the SageMaker line climbed from zero toward 75%; utilisation held at 99%; and the bursty training spend stayed on flexible pricing on purpose.

That's the right outcome state. The goal was never "commit to all of SageMaker" — over-committing against training spikes is its own waste — it was "commit to the steady inference floor, stay flexible on the training top, and right-size endpoints before you sign." The ML cost line stopped being a place to find cuts and became a confidence signal that the program was being run with discipline.

Why this is on the report at all

SageMaker commitment coverage and utilisation are among the clearest signals of whether the ML cost program is being run with discipline — and they're easy to miss because SageMaker needs its own commitment that the general compute plan doesn't provide. Low coverage on a large, stable inference base means the organisation is leaving an obvious, no-risk discount on the table on a fast-growing category. High coverage with high utilisation means the program is capturing the rate savings without over-reaching into the bursty training spend. The trend on these two numbers tells you more about ML cost maturity than the absolute bill does.

The risk to watch is over-commitment dressed up as savings, which is sharper for ML because training spend is spiky. An aggressive commitment sized to a heavy experimentation month can quietly become a multi-year stranded cost when the training tide recedes — and there's no resale. The mature posture — commit to the steady inference floor, right-size and re-architect endpoints before committing, keep the bursty training top flexible, and prefer shorter terms while models are still changing — is what separates a savings program from a liability. The one question to ask is whether SageMaker coverage and utilisation are both high and the inference floor was right-sized first.

The leadership move on this category

The executive handle isn't to approve a big commitment to chase a headline discount — it's to set the operating norms that make ML commitment buying disciplined and durable.

1. Make sure SageMaker has its own commitment story

Confirm the ML platform is being managed as its own commitment, not assumed to be covered by the general compute plan — it isn't. Ask for coverage and utilisation on the SageMaker line specifically. A fast-growing ML spend running entirely On-Demand is the single most common blind spot in an otherwise mature cost program.

2. Insist endpoints are optimised before rate is committed

Right-size and re-architect before you commit. Make it an explicit gate: no multi-year commitment on an inference fleet that hasn't been optimised — including moving intermittent endpoints to serverless. This prevents locking a discount onto capacity that should have been reshaped, then carrying the stranded commitment for years.

3. Commit to the steady floor, stay flexible on the bursty top

Treat coverage and utilisation as the health metric, not the discount rate. High coverage of the steady inference baseline with utilisation near 100% is what good looks like; aggressive coverage sized to a heavy training month with falling utilisation is over-commitment masquerading as savings. Favour shorter terms while models are still evolving so commitments follow the platform instead of stranding.

Quick quiz

Question 1 of 5

You're reviewing the cloud cost pack. SageMaker has its own Savings Plan, coverage on the steady inference baseline is climbing toward 75%, utilisation is steady at 99%, and the team right-sized endpoints before committing. What's the right read?

Keep learning

Dig deeper into SageMaker Savings Plans mechanics, ML inference cost optimisation, and how rate optimisation fits the FinOps lifecycle.

That's the lesson. Two takeaways worth holding onto: SageMaker needs its own Savings Plan because the general compute commitment doesn't cover it, and the discipline is to commit to the steady inference floor while staying flexible on the bursty training top — after right-sizing the endpoints, never before. The leadership question is about SageMaker coverage and utilisation, not the headline discount rate.

Back to the library

Part of the learning path Lock in your commitments

Purchase SageMaker Savings Plans

SageMaker Savings Plans: the basics

The commitment that survived the model swap

Buying a SageMaker Savings Plan in action

SageMaker Savings Plans under the hooddeep dive

What is the impact of committing (or not) to a SageMaker Savings Plan?

How do you commit to SageMaker Savings Plans safely?

1. Right-size and re-architect endpoints before committing anything

2. Commit to the steady inference floor, not the training peaks

3. Choose term and payment for model durability and cash, not headline rate

4. Monitor coverage and utilisation on the SageMaker line, then layer up

Quick quiz

Keep learning

SageMaker Savings Plans: what it means for the bill

The commitment that survived the model swap

How a finance partner frames the commitment

Why this matters to the budget, not just the bill

What finance can actually do about this

1. Ask for SageMaker coverage and utilisation specifically

2. Make endpoint right-sizing a precondition for any commitment

3. Size to the inference floor, never the training month

4. Commit incrementally and layer up

Quick quiz

Keep learning

SageMaker Savings Plans: the headline

The commitment that survived the model swap

What it looks like when the org gets this right

Why this is on the report at all

The leadership move on this category

1. Make sure SageMaker has its own commitment story

2. Insist endpoints are optimised before rate is committed

3. Commit to the steady floor, stay flexible on the bursty top

Quick quiz

Keep learning

Related cost lessons