Long-stopped EC2 instances: the basics
Why "stopped" doesn't mean "safe"
When you stop an EC2 instance, AWS shuts down the OS, releases the underlying host capacity, and stops charging you for compute. The instance record stays in your account along with its attached EBS volumes, network interfaces, tags, and IAM instance profile — frozen in time at the moment it stopped. Restarting the instance brings it back exactly as it was: same kernel, same packages, same secrets, same AMI baseline.
That "exactly as it was" is the problem. An instance stopped for six months has missed six months of security patches. Its baked-in IAM credentials, SSM agent version, OS package signatures, and TLS root store are all six months stale. The AMI it was launched from may have been deregistered or marked as deprecated by the vendor. If something in your fleet auto-starts it for a one-off test or a stuck pipeline reruns it, you've just put an unpatched, unmonitored host on your network.
AWS Security Hub flags this pattern under control EC2.4 — "Stopped EC2 instances should be removed after a specified time period." The default threshold is 30 days; the rationale is hygiene as much as cost. After 30 days the instance has either been deliberately preserved (rare, usually a one-off forensic snapshot) or quietly abandoned (the common case). Either way it deserves a decision, not an indefinite limbo.
In this lesson you'll learn how to identify long-stopped EC2 instances, how to tell abandoned from intentionally-preserved, and how to safely retire them without losing data anyone still cares about. You'll see the AWS CLI investigation pattern, a decision matrix for what to do per instance, and a snapshot-then-terminate flow that leaves you with recoverable artifacts if you ever need them back.
The 4am restart that wasn't
In one well-known retrospective, a SaaS company traced a 4am outage to an instance that had been stopped for 14 months. A misconfigured Auto Scaling group, after a cooldown timer fired, picked the stopped instance to "replace" a healthy one and started it up. It came back running an OS three major versions behind, pulled bad config from a deprecated S3 bucket, and answered traffic for 11 minutes before health checks killed it. The fix was one line — but the lesson was: stopped is not deleted, and AWS will happily turn it back on for you.
Cleaning up long-stopped instances in action
Marco runs platform operations at a fintech. A quarterly compliance scan returns 47 EC2.4 findings across three accounts — 47 instances stopped for more than 30 days, the oldest sitting at 412 days. Severity is MEDIUM but the auditor flagged it as a recurring item from the last review, which moves it up his queue fast.
He doesn't just bulk-terminate. Some of these are deliberate — a security forensics box held for a pending investigation, a snapshot-source instance used as a base AMI builder. Some are clearly abandoned — old developer sandboxes, half-finished proof-of-concept work whose owner left the company a year ago. The difference is in the tags and the StateTransitionReason.
He starts by listing every stopped instance with the dates that matter.
First, list every stopped instance in the region with launch time, state-transition reason, and the user-applied Owner tag (if any).
Stopped instances across the region — age and ownership at a glance.
For each candidate to retire, snapshot every attached EBS volume first so the data is recoverable. This is the safety net that lets you terminate without losing sleep.
Atomic snapshot of every attached volume before terminating.
How EC2.4 detects long-stopped instancesdeep dive
Security Hub control EC2.4 is backed by the AWS Config managed rule ec2-stopped-instance. The rule evaluates every EC2 instance in your account on a configurable cadence (default: every 24 hours) and compares the time since the most recent state-transition into stopped against the AllowedDays parameter. The default is 30 days; you can dial it up or down per-rule and per-account to match your operational reality.
The data point the rule reads is StateTransitionReason, which AWS embeds in the instance metadata at the moment it changes state. The format is fixed — User initiated (YYYY-MM-DD HH:MM:SS GMT) for human-triggered stops, plus distinct strings for ASG-initiated, spot-interruption, and host-failure stops. The rule parses that timestamp and does the math; there's no separate "stopped at" field, which is why scripts often read StateTransitionReason directly.
Crucially, the rule fires on the instance only — it doesn't surface the attached EBS volumes, AMIs derived from the instance, or snapshots that depend on those volumes. That's why a complete remediation has to walk the dependency graph yourself: snapshot the volumes, deregister any AMIs that point at them, then terminate the instance. If you skip the AMI step, deregistering the AMI later orphans the snapshots and you stop being able to relaunch from it.
# How AWS Config evaluates ec2-stopped-instance — list every stopped instance and the days since stop.
aws ec2 describe-instances \
--filters Name=instance-state-name,Values=stopped \
--query "Reservations[].Instances[].[InstanceId, StateTransitionReason]" \
--output text What is the impact of leaving instances stopped indefinitely?
The first impact is security hygiene. A stopped instance is a frozen attack surface: missing patches accumulate, baked-in IAM credentials and SSH keys may have been rotated organisation-wide while this instance kept its originals, and the AMI it was launched from may now be flagged as deprecated. Bring it back online and you've just attached an unmonitored, out-of-policy host to the VPC.
The second impact is cost — covered in detail in the related lesson on Stopped EC2 Instances with EBS, but worth restating: stopping the compute doesn't stop the EBS billing. A 250 GB gp3 volume sitting on a stopped instance costs roughly $20/month indefinitely; multiply that by the dozens of stopped instances most accounts accumulate and you have a six-figure annual line item with zero workload behind it.
The third impact is compliance and audit posture. EC2.4 is a recurring finding type — auditors check it on every visit. Open findings that have been outstanding for multiple quarters become evidence of weak operational hygiene, which raises questions about every other control in scope. "Why haven't you remediated this?" is a much harder conversation than "Here's our automated retire-after-30-days policy."
The fourth impact is operational risk: ASGs and orchestration tooling can sometimes restart stopped instances by mistake (a misconfigured replacement strategy, a stale launch-template reference, a manual recovery script). When that happens, the unpatched ghost rejoins the network and starts answering traffic before anyone realises what's running.
How do you safely retire long-stopped instances?
Retiring a stopped instance is a four-step loop. The order matters — you want a recoverable artifact before anything destructive, and a prevention rule in place so the same drift doesn't refill the queue next quarter.
1. Inventory and classify by intent
Pull every stopped instance with its LaunchTime, StateTransitionReason, and tags. Apply a decision matrix: recently stopped (<30 days) by an automation → leave it alone; stopped months ago with an owner tag → ping the owner once with a tag-and-warn label and an ExpiresAt date; stopped months ago with no owner or an owner who left → schedule for retirement; long-running before stop and still relevant to the business → schedule a patched re-launch from a current AMI rather than restarting the stale one.
2. Snapshot every attached EBS volume
Use aws ec2 create-snapshots --instance-specification to atomically snapshot the boot and data volumes in one call. Tag the snapshots with Purpose=ec2.4-cleanup and SourceInstance=<id> so you can find them later if someone needs the data. Set a snapshot lifecycle policy that retains them for 90 days — long enough to recover from a wrong call, short enough that the snapshot bill doesn't replace the volume bill.
3. Deregister dependent AMIs, then terminate
If the instance has been used as an AMI source, deregister those AMIs first (aws ec2 deregister-image) so you control the orphan-snapshot moment instead of discovering it later. Then call aws ec2 terminate-instances for the instance itself. Terminating frees the EBS volumes (if DeleteOnTermination is true) and the EC2.4 finding closes on the next AWS Config evaluation.
4. Prevent recurrence with AWS Config and tagging policy
Enable the AWS Config managed rule ec2-stopped-instance with AllowedDays set to your real threshold (30 is the default; many teams use 14 in non-prod). Add a tagging policy: every instance must carry Lifecycle=temporary|permanent and, when temporary, ExpiresAt=YYYY-MM-DD. An EventBridge rule reads the tag and triggers the snapshot-and-terminate Lambda automatically on the expiry date — drift never accumulates past the explicit intent.
# Snapshot, deregister, terminate — the safe retire flow for one instance.
INSTANCE=i-0a1b2c3d4e5f60002
aws ec2 create-snapshots \
--instance-specification InstanceId=$INSTANCE,ExcludeBootVolume=false \
--description "EC2.4 cleanup pre-terminate $INSTANCE" \
--tag-specifications "ResourceType=snapshot,Tags=[{Key=Purpose,Value=ec2.4-cleanup},{Key=SourceInstance,Value=$INSTANCE}]"
# Find any AMIs derived from this instance and deregister them.
aws ec2 describe-images --owners self \
--filters Name=tag:SourceInstance,Values=$INSTANCE \
--query 'Images[].ImageId' --output text | \
xargs -n1 -r aws ec2 deregister-image --image-id
aws ec2 terminate-instances --instance-ids $INSTANCE Quick quiz
Question 1 of 5You have 47 EC2 instances flagged by EC2.4 (stopped >30 days). You've classified them and identified 30 that are clearly abandoned. What's the right next step before terminating?
You scored
0 / 5
Keep learning
Dig deeper into EC2 lifecycle hygiene and the AWS tooling around it.
- AWS Security Hub control EC2.4 The exact rule definition, severity, and remediation guidance from AWS.
- AWS Config managed rule: ec2-stopped-instance Continuous detection with a tunable AllowedDays parameter.
- AWS EC2 instance lifecycle Reference for stop, hibernate, terminate semantics and what they preserve.
- AWS Data Lifecycle Manager (snapshot policies) Schedule and retain snapshots automatically so terminated workloads stay recoverable.
You've completed Remove long-stopped EC2 instances. You can now identify stopped instances by age and intent, snapshot their data for recoverability, deregister dependent AMIs cleanly, terminate without orphaning resources, and prevent the same drift from refilling your queue next quarter. The next time EC2.4 fires on 47 instances, you'll have a four-step loop ready to run.
Back to the library