Monitoring

Fix CloudWatch alarms with no actions

An alarm that fires but notifies nobody is a logged silence. Wire actions to every alarm before something burns quietly.

12 min·10 sections·AWS

Last reviewed 27 May 2026

Alarms without actions: the basics

What does it mean for a CloudWatch alarm to have no actions?

A CloudWatch alarm watches a metric, evaluates it against a threshold over a window of datapoints, and transitions between three states: OK, ALARM, and INSUFFICIENT_DATA. The transition itself does not page anyone or trigger anything — actions do. Each alarm has three independent action lists: AlarmActions (fired when entering ALARM), OKActions (when recovering to OK), and InsufficientDataActions (when metrics stop flowing). If those lists are empty, the alarm changes state in the AWS console and the alarm history, and nothing else happens.

"Bad" looks like an alarm with AlarmActions: []. It might have a polished name, a sensible threshold, and a useful metric — but when the workload finally misbehaves, the alarm enters ALARM, sits in the console, and waits for someone to randomly notice. There is no SNS publish, no Lambda invocation, no autoscaling step, no SSM Incident Manager response plan. The transition is logged; the human is not.

AWS doesn't surface a built-in check for this — it shows up in third-party audit tools (Trusted Advisor's CloudWatch section, AWS Config custom rules, Security Hub bridge rules) and in any internal compliance check worth its salt. The pattern is one of the most common monitoring failures because creating an alarm without actions is one CLI call shorter than creating one with actions — and the alarm still looks correct in the dashboard until the day you need it.

In this lesson you'll learn why CloudWatch alarms so often end up actionless, how the alarm action model actually works (three independent lists, multiple action types), how to audit your fleet for the pattern in a single CLI call, and how to wire up minimum-viable alerting that scales from a Slack channel to PagerDuty. You'll see a real audit query, a bulk remediation flow, and the prevention pattern that stops actionless alarms from coming back.

Fun fact

The Knight Capital lesson nobody learns from

In 2012 Knight Capital lost $440 million in 45 minutes because a deployment left old code running on one of eight servers. The post-mortem found that alerts had been firing on the system the entire morning — an email-based alarm that nobody on the trading desk read in real time because the inbox was full of low-severity noise. The alarm worked. The action — a single email address — did not. Alarms without actions are the most visible version of this same failure mode: the system is shouting, and there's literally nobody on the other end of the line.

Wiring up an actionless alarm in action

Marco is finishing a quarterly compliance audit for a retail FinOps customer when ALM-003 fires across 14 production alarms. The most obvious offender: an alarm named SainsburysLoadbasedInstanceRunning, watching CPU on a load-based EC2 fleet. Severity HIGH. It's been in OK state for weeks — but every time it has fired ALARM in the last 90 days, no human has been notified.

He starts by confirming the finding: pull the alarm config and check the AlarmActions list. If it's empty, the audit tool is right; if it has stale ARNs, that's a different kind of broken (also worth knowing about, but a separate fix).

Once the actionless state is confirmed, he wires the alarm to the team's existing SNS topic — an inexpensive, idempotent change that flips the alarm from "a logged silence" to "something that pages the on-call rotation."

First, audit the whole account for alarms with no AlarmActions. The JMESPath filter does the work — a single output table makes the scope of the problem obvious.

$ aws cloudwatch describe-alarms --query 'MetricAlarms[?length(AlarmActions)==`0`].[AlarmName,StateValue,MetricName]' --output table

┌──────────────────────────────────────────────┬──────────┬──────────────────────┐

│ AlarmName │ State │ MetricName │

├──────────────────────────────────────────────┼──────────┼──────────────────────┤

│ SainsburysLoadbasedInstanceRunning │ ALARM │ CPUUtilization │

│ rds-prod-checkout-FreeStorageSpace │ OK │ FreeStorageSpace │

│ alb-edge-5xx-rate │ OK │ HTTPCode_ELB_5XX │

│ lambda-checkout-errors │ ALARM │ Errors │

│ ecs-orders-MemoryUtilization │ OK │ MemoryUtilization │

└──────────────────────────────────────────────┴──────────┴──────────────────────┘

# 14 alarms total. Two are currently in ALARM and silently doing nothing.

Every alarm here transitions through ALARM/OK with nobody on the other end.

Now wire the worst offender to the team's SNS topic. put-metric-alarm is idempotent — it replaces the alarm in place, preserving the metric, threshold, and evaluation periods. Only the actions change.

$ aws cloudwatch put-metric-alarm --alarm-name SainsburysLoadbasedInstanceRunning --metric-name CPUUtilization --namespace AWS/EC2 --statistic Average --period 300 --threshold 80 --comparison-operator GreaterThanThreshold --evaluation-periods 2 --alarm-actions arn:aws:sns:eu-west-1:123456789012:platform-oncall --ok-actions arn:aws:sns:eu-west-1:123456789012:platform-oncall --dimensions Name=AutoScalingGroupName,Value=sainsburys-loadbased

# (no output — success is silent)

# Confirm the wiring took:

aws cloudwatch describe-alarms --alarm-names SainsburysLoadbasedInstanceRunning \

--query 'MetricAlarms[0].{Name:AlarmName,Actions:AlarmActions,OK:OKActions}'

{

"Name": "SainsburysLoadbasedInstanceRunning",

"Actions": ["arn:aws:sns:eu-west-1:123456789012:platform-oncall"],

"OK": ["arn:aws:sns:eu-west-1:123456789012:platform-oncall"]

The alarm now publishes to SNS on every ALARM transition — and on recovery via OKActions.

The CloudWatch alarm action model under the hooddeep dive

Every CloudWatch alarm owns three independent action lists. AlarmActions fire on the transition into ALARM, OKActions on the transition back to OK, and InsufficientDataActions when the underlying metric stops reporting datapoints inside the evaluation window. The lists are not symmetric — you can have eight actions on AlarmActions and zero on OKActions, and the alarm is still valid. Most teams forget OKActions entirely, which means on-call gets paged but never gets the "all clear" notification.

Each action is an ARN — and the supported types are wider than people remember. SNS topics (the most common), Auto Scaling policies (arn:aws:autoscaling:...:scalingPolicy/...), EC2 instance actions (arn:aws:automate:region:ec2:stop|terminate|reboot|recover), SSM Incident Manager response plans, and Lambda invocations via SNS-bridged subscriptions. An alarm can have several actions of different types in the same list, so a single transition can both page the on-call and trigger an autoscale step.

The detection pattern for an actionless alarm is a one-line JMESPath query against describe-alarms: filter on length(AlarmActions)==0``. There's no native AWS check that does this for you across the fleet, which is why the pattern hides — the console shows each alarm individually and there's no "sort by actionless" column. AWS Config can fill the gap with a custom rule, and Security Hub can be configured to ingest the result, but neither is on by default.

# The full action-model surface — one alarm can wire all three lists, each with multiple types.
aws cloudwatch put-metric-alarm \
  --alarm-name rds-prod-checkout-FreeStorageSpace \
  --metric-name FreeStorageSpace \
  --namespace AWS/RDS \
  --statistic Average \
  --period 300 \
  --threshold 10737418240 \
  --comparison-operator LessThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions \
      arn:aws:sns:eu-west-1:123456789012:db-oncall \
      arn:aws:ssm-incidents::123456789012:response-plan/rds-storage-low \
  --ok-actions \
      arn:aws:sns:eu-west-1:123456789012:db-oncall \
  --insufficient-data-actions \
      arn:aws:sns:eu-west-1:123456789012:platform-oncall \
  --dimensions Name=DBInstanceIdentifier,Value=checkout-prod

What is the impact of alarms without actions?

The most direct impact is invisible incidents. Every alarm that fires without an action is an outage, degradation, or budget breach where the system tried to tell you and you didn't get the message. The mean time to detection on an actionless alarm is whatever interval somebody happens to refresh the CloudWatch console — typically hours, sometimes days. Customer-visible failures are usually reported by customers before the team finds them in the dashboard.

The second-order impact is on the team's relationship with monitoring as a whole. When alarms exist that don't page, on-call learns not to trust the dashboard — "yeah, that alarm has been red for three weeks, ignore it." The trust decays alarm by alarm until the whole monitoring layer becomes scenery. New alarms get created with the same actionless default because that's how the existing ones look.

The third-order impact is regulatory. SOC 2 CC7.2, ISO 27001 A.12.4, and PCI DSS Requirement 10 all expect logged events to trigger an investigation or response. An alarm whose history shows repeated ALARM transitions with no documented response is audit evidence that detection exists but response does not — the worst possible combination for a compliance reviewer.

There is no direct AWS bill from an actionless alarm — they cost the same $0.10/month as a wired one — but the cost of every incident that runs longer than it needed to, every customer who churned because of a quietly degraded service, and every audit finding that drags out a recertification all roll up into the total. The fix per alarm is a one-line CLI change. The cost of not fixing it compounds.

How do you fix and prevent actionless alarms?

Wiring actions to alarms is a four-step loop. The inventory-then-wire flow gets the existing fleet healthy; the routing and prevention steps make sure the next batch of alarms is born with actions attached.

1. Inventory every actionless alarm and tag by domain

Run describe-alarms with the actionless filter across every region and every account, then tag each result by owning team — Owner=platform, Owner=data, Owner=app. The tags drive the next step: each team gets a list of alarms they own, and each team already has (or needs to create) a single SNS topic for their on-call rotation.

2. Wire to a tiered alerting topology, not one shared inbox

The anti-pattern is a million alarms pointed at [email protected]. The pattern is two SNS topics per team — team-low (Slack channel) and team-high (PagerDuty/Incident Manager) — with alarm severity deciding which topic the AlarmActions ARN points at. Low severity informs; high severity pages. Mix them and on-call burns out within a quarter.

3. Wire OKActions and InsufficientDataActions, not just AlarmActions

An ALARM-only wiring leaves on-call wondering whether the issue resolved. OKActions pointed at the same low-severity topic close the loop. InsufficientDataActions pointed at the high-severity topic catch the silent killer — the metric stops reporting entirely, often because the agent crashed or the resource was terminated. A complete wiring has all three lists populated.

4. Prevent recurrence with AWS Config or an enforcement Lambda

Deploy an AWS Config custom rule (or a simple Lambda triggered by EventBridge on cloudwatch:PutMetricAlarm) that fails any alarm with an empty AlarmActions list. The rule can be advisory (notify the creator and let it pass) or enforcing (delete the actionless alarm). Most teams start advisory for a quarter, then tighten to enforcing once compliance is high enough that the noise is bearable.

# Inventory + bulk-tag workflow. Each team's SNS topic is the only per-team variable.
aws cloudwatch describe-alarms \
  --query 'MetricAlarms[?length(AlarmActions)==`0`].AlarmName' \
  --output text \
  > actionless-alarms.txt

# For each alarm, look up the owning team via tags and wire the right topic.
while read alarm; do
  TEAM=$(aws cloudwatch list-tags-for-resource \
    --resource-arn "arn:aws:cloudwatch:eu-west-1:123456789012:alarm:${alarm}" \
    --query 'Tags[?Key==`Owner`].Value | [0]' --output text)

  TOPIC="arn:aws:sns:eu-west-1:123456789012:${TEAM}-oncall"

  aws cloudwatch put-metric-alarm \
    --alarm-name "$alarm" \
    --alarm-actions "$TOPIC" \
    --ok-actions "$TOPIC"
  # (other fields preserved from the existing alarm via describe-alarms → jq pipeline)
done < actionless-alarms.txt

Quick quiz

Question 1 of 5

You've audited the fleet and found 47 CloudWatch alarms with empty AlarmActions lists. What's the right next move?

Keep learning

Dig deeper into CloudWatch alarm design and the AWS tooling around alerting.

You've completed Fix CloudWatch alarms with no actions. You can now audit a fleet for the actionless pattern, route each alarm to a tiered SNS topology that respects on-call sanity, wire all three action lists for complete coverage, and prevent the pattern from coming back with AWS Config or an enforcement Lambda. The next time a compliance audit fires ALM-003, you'll have a four-step loop ready to run.

Back to the library

Alarms without actions: what it means for cost and accountability

A monitoring configuration gap that turns incidents into extended outages — and extended outages into larger bills

A CloudWatch alarm watches a specific metric — CPU, error rate, queue depth, storage space — and transitions into an ALARM state when the metric crosses a threshold. That transition is the system's attempt to tell you something is wrong. An alarm with no actions is one where the system shouts and nobody is on the other end of the line. The alarm state is logged in AWS, but no notification fires, no escalation starts, and no automated response runs.

From a cost-and-risk perspective, this matters because the duration of an incident is the primary driver of its cost. A five-minute outage on a payment service is a minor blip; a four-hour outage — because the alarm sat in ALARM state with no page until someone refreshed the console — is a material loss event. The cost of the average cloud incident scales directly with mean time to detect, and actionless alarms make detection entirely dependent on someone manually checking a dashboard.

The finance framing isn't to pay for elaborate alerting infrastructure everywhere. It's to insist that for any resource whose failure has a measurable business cost, there is at least one human on the other end of the alarm. The question per alarm is: what does an undetected failure of this resource cost per hour, and is the answer large enough to justify wiring a notification? For most production resources the answer is obviously yes, and the remediation cost is a one-time engineer-hour, not a recurring spend.

This lesson is for the finance partner who needs to understand why an actionless alarm is a cost-and-risk issue, not just a technical one. It covers what alarms without actions actually do (nothing), what that costs when an incident runs undetected for hours instead of minutes, how to frame the remediation as a per-resource risk decision rather than a blanket infrastructure spend, and the governance levers — tagging, a documented alerting tier per environment, and a standing review — that keep both the monitoring coverage and its cost auditable. No AWS commands required.

Fun fact

The Knight Capital lesson nobody learns from

How a finance partner frames the actionless alarm decision

Dana is the finance partner on the platform account review. The audit surfaces ALM-003: fourteen production alarms with no notification wired. Rather than treating it as a binary engineering fix, she asks the tiering question first: which of these alarms are watching resources that, if they fail silently for four hours, cost the business real money?

She cross-references the alarm names against their resource types and environment tags. Nine are watching production-tier services — checkout CPU, RDS free storage, Lambda error rates. The math is straightforward: an undetected failure on checkout running for three hours costs more than the entire annual monitoring budget for the account. Those nine get wired immediately. The remaining five are development and staging alarms; Dana documents them as intentionally low-priority and records that as a reviewed decision rather than a silent ignore.

Her note for the finance pack is a single line: 'Fourteen actionless production alarms remediated; nine wired to tiered alerting, five dev/staging documented as intentional. Estimated reduction in mean-time-to-detect for production incidents: from hours to under five minutes.' That's the framing that belongs in a financial review — not alarm counts, but incident duration risk reduced.

Why this is a cost issue, not just a monitoring issue

The financial impact of an actionless alarm is almost entirely driven by incident duration. An alarm that fires and pages the on-call engineer within two minutes results in a short incident. The same alarm with no action means detection depends on someone manually checking the console, which in practice means the incident runs for hours before anyone knows. Incident cost — support tickets, SLA breach credits, engineering time, customer churn — scales roughly linearly with duration.

The billing impact is invisible on the AWS invoice. Actionless alarms cost $0.10 per month, identical to wired alarms. The cost doesn't live in the infrastructure spend line; it lives in the incident response line, the customer success budget, and the recurring cost of a team that doesn't trust its own monitoring. That's harder to quantify but straightforward to frame: for any production resource, model the cost of a three-hour undetected failure versus a ten-minute detected one.

The compliance angle adds a second cost path. SOC 2, ISO 27001, and PCI DSS all require that detection events trigger a documented response. An alarm history showing repeated ALARM transitions with no corresponding incident ticket or response record is audit evidence of a controls gap — one that leads to findings, remediation plans, and the extended re-certification cycles that cost real partner time and occasionally delay customer contracts. The remediation cost of wiring an alarm is measured in engineer-minutes; the cost of a controls gap finding is measured in lawyer-hours.

The right finance intervention is to make sure the per-resource tiering decision is made explicitly. Not every alarm needs to page PagerDuty at 3am. But for every resource whose failure has a cost that exceeds the cost of a single incident response, the question 'is there at least one human who will be notified automatically if this alarm fires?' should have a documented yes.

What finance owns in the remediation loop

Finance can't wire an SNS topic, but it can own the framing and governance that makes the remediation durable rather than one-and-done. Four levers, applied at the regular cadence.

1. Insist the inventory is segmented by environment and business impact

The raw count of actionless alarms is almost meaningless — a hundred dev alarms and a single production checkout alarm are not the same problem. Finance should ask that any remediation list be sorted by environment and resource criticality, so the prioritization reflects the actual cost exposure rather than alphabetical order. The production alarms with the highest failure cost get fixed first.

2. Budget for a tiered alerting infrastructure, not a single shared inbox

A proper alerting topology — typically two SNS topics per team, one routing to Slack and one to PagerDuty — has a small, predictable cost (SNS pricing is effectively negligible at normal alarm volumes). That is a worthwhile line item to approve explicitly, because the alternative is all alarms pointing at a single overloaded email inbox that on-call learns to ignore. Model the cost of on-call burnout against the cost of a clean tiered topology.

3. Require documented exceptions for intentionally unwired alarms

Any alarm that is left without actions by design — a development environment monitor, a test metric — should carry a recorded justification rather than being silently left behind in the audit. Finance owns the audit trail: a clean picture is one where every alarm is either wired or has a documented reason why it isn't, which is the standard that survives both internal review and external compliance audit.

4. Track the prevention control as a standing governance item

The goal is a structural fix — an AWS Config rule or enforcement Lambda that prevents new actionless alarms from being created. Once deployed, that rule should appear on the compliance dashboard as a green control, not as a quarterly remediation item. Finance should ask whether the prevention layer is in place before closing the remediation cycle, because without it the same inventory fills up again within months.

Quick quiz

Question 1 of 5

A quarterly review surfaces 23 CloudWatch alarms with no AlarmActions — eight watching production payment and checkout services, fifteen watching dev and test environments. As the finance partner, what's the right approach?

Keep learning

Dig deeper into CloudWatch alarm design and the AWS tooling around alerting.

You've finished the finance partner's view of actionless CloudWatch alarms. You know the real cost is in incident duration, not the alarm's AWS bill; how to frame the remediation as a per-resource risk tiering decision rather than a blanket engineering task; the four governance levers — environment-segmented inventory, tiered alerting budget, documented exceptions, and a prevention control — that keep the monitoring posture auditable; and the one number that matters in any review: how many production alarms are watching revenue-critical resources without a wired human on the other end. Next time the audit surfaces ALM-003, you'll have a sharper question than 'how many alarms did we fix?'

Back to the library

Alarms without actions: the headline

Silent monitoring is not monitoring — it's the appearance of vigilance without the substance

AWS CloudWatch alarms detect when something goes wrong. But an alarm with no actions is one that fires into a void — the system knows there's a problem, records it, and tells nobody. The incident runs until someone manually checks a dashboard. This is one of the most common monitoring failures in cloud environments precisely because the alarm looks correct when you inspect it; the gap only surfaces when you actually need it.

The leadership question is simple: when a production system fails, is there a guaranteed path from detection to human awareness, or does it depend on someone happening to look? Actionless alarms mean the answer is the latter. The right posture is that every system whose failure has a business consequence has at least one wired notification — not as a technical nicety, but as an organizational commitment that incidents will be detected faster than customers report them.

A short read for the executive who wants the plain-English version: what an actionless alarm is, why it's a detection gap rather than a technical configuration detail, and what good looks like — every production system with a wired notification, exceptions deliberate and on the record. You'll get the one question to ask at any engineering review to confirm your monitoring is real and not just present.

Fun fact

The Knight Capital lesson nobody learns from

What it looks like when the org takes this seriously

At one company, the VP of Engineering, Carla, used to learn about production outages from customer support tickets — an hour or two after the problem started. The monitoring dashboard had alarms for every critical service, but most of them had no actions wired. The alarms were decorative.

After a quarterly audit surfaced the pattern, the team spent two days wiring actions to every production alarm and introducing a two-tier SNS topology: a Slack channel for informational alerts, PagerDuty for anything that needed a human awake at 2am. The next time checkout CPU spiked, the on-call engineer got a page within two minutes of the alarm firing.

Carla's question at every engineering review is now: 'Is there any production alarm that can fire without waking someone up?' It's a one-question audit of the monitoring posture. The answer should always be no for anything customer-facing, and every yes should have a written reason behind it.

The business consequence of silent alarms

The practical impact of an actionless alarm is that your first signal of a production incident comes from a customer, not from your own system. By the time a customer files a support ticket or a social media post surfaces, the incident has typically been running for an hour or more. Every minute of additional duration is additional cost — degraded transactions, SLA exposure, support load, and reputational damage. The monitoring infrastructure exists but isn't doing the job.

There's also a governance dimension. Compliance frameworks like SOC 2 and PCI require not just that you detect problems but that detection triggers a response. An alarm history showing hours in ALARM state with no corresponding action is a controls gap — evidence that the detection layer is present on paper and absent in practice. Auditors know what to look for, and this pattern is near the top of the list.

The resolution is straightforward and inexpensive: every production alarm should have at least one action, wired to the right tier of human attention. Exceptions should be deliberate — a non-production alarm intentionally left unwired, with a reason on the record. That's the standard that turns monitoring from a dashboard decoration into an operational guarantee.

The leadership move on actionless alarms

The executive handle isn't to mandate a specific alerting topology — it's to set the standard that detection always reaches a human, and that every exception is a deliberate recorded decision rather than a configuration oversight.

1. Set a default: production alarms must have at least one action

Make it policy that any alarm watching a production resource — whether it monitors cost anomalies, error rates, or infrastructure health — must have at least one action wired. A clear default removes the per-alarm debate and ensures incidents surface to humans before they surface to customers.

2. Accept intentionally unwired alarms for low-stakes environments

Not every alarm needs to page someone. Development and test monitors can reasonably be left informational or unwired by design. The goal is not zero actionless alarms; it is that every alarm's wiring status matches its environment's criticality, and every exception is written down.

3. Ask for the prevention control, not just the remediation count

At any review, the right question is not 'how many alarms did we fix?' — it's 'is there now a control that prevents actionless alarms from being created in production?' A remediation without prevention is a recurring cost item. A structural control is a one-time investment that removes the finding from future reviews.

Quick quiz

Question 1 of 5

Your engineering team reports that all production CloudWatch alarms now have wired actions, dev/test alarms are documented as intentionally lower-priority, and an AWS Config rule prevents new actionless production alarms from being created. What's the right read on this?

Keep learning

Dig deeper into CloudWatch alarm design and the AWS tooling around alerting.

That's the lesson. Two takeaways: an alarm without actions is not monitoring, it's the appearance of monitoring — and the fix is a policy, not a one-time cleanup. Every production system whose failure has a business consequence should have a guaranteed path from detection to human awareness. Every exception should be deliberate and on the record. The right question at any review is whether a structural control is in place to enforce that standard going forward, not whether last quarter's findings were cleared.

Back to the library

Part of the learning path Get your alarms right

Fix CloudWatch alarms with no actions

Alarms without actions: the basics

The Knight Capital lesson nobody learns from

Wiring up an actionless alarm in action

The CloudWatch alarm action model under the hooddeep dive

What is the impact of alarms without actions?

How do you fix and prevent actionless alarms?

1. Inventory every actionless alarm and tag by domain

2. Wire to a tiered alerting topology, not one shared inbox

3. Wire OKActions and InsufficientDataActions, not just AlarmActions

4. Prevent recurrence with AWS Config or an enforcement Lambda

Quick quiz

Keep learning

Alarms without actions: what it means for cost and accountability

The Knight Capital lesson nobody learns from

How a finance partner frames the actionless alarm decision

Why this is a cost issue, not just a monitoring issue

What finance owns in the remediation loop

1. Insist the inventory is segmented by environment and business impact

2. Budget for a tiered alerting infrastructure, not a single shared inbox

3. Require documented exceptions for intentionally unwired alarms

4. Track the prevention control as a standing governance item

Quick quiz

Keep learning

Alarms without actions: the headline

The Knight Capital lesson nobody learns from

What it looks like when the org takes this seriously

The business consequence of silent alarms

The leadership move on actionless alarms

1. Set a default: production alarms must have at least one action

2. Accept intentionally unwired alarms for low-stakes environments

3. Ask for the prevention control, not just the remediation count

Quick quiz

Keep learning

Related monitoring lessons