Skip to main content
emnode / learn
Learning path

Get your alarms right

Coverage, dead alarms and noise: make CloudWatch alarms something you can actually trust.

11 lessons·~136 min total

Lessons in this path

  1. 1
    Monitoring AWS

    Add CloudWatch alarms to EC2 instances

    An instance with no alarms tells you nothing when it breaks — wire the baseline four metrics and tag-driven automation.

    13 min
  2. 2
    Monitoring AWS

    Add CloudWatch alarms to RDS instances

    RDS without alarms only tells you it failed by the application timing out — wire the standard set so the database tells on itself first.

    13 min
  3. 3
    Monitoring AWS

    Add CloudWatch alarms to load balancers

    Load balancers see every request - alarms on 5xx rates and unhealthy host count catch outages before customers do.

    13 min
  4. 4
    Monitoring AWS

    Add error alarms to Lambda functions

    A Lambda that throws errors silently is the most expensive failure mode in serverless — every retry costs money and the user never sees it.

    12 min
  5. 5
    Monitoring AWS

    Fix CloudWatch alarms with no actions

    An alarm that fires but notifies nobody is a logged silence. Wire actions to every alarm before something burns quietly.

    12 min
  6. 6
    Monitoring AWS

    Wire OK actions on CloudWatch alarms

    If you page on alarm, page on recovery too — otherwise on-call wonders whether it's still broken.

    11 min
  7. 7
    Monitoring AWS

    Fix INSUFFICIENT_DATA alarms

    An alarm in INSUFFICIENT_DATA isn't quiet — it's blind. The metric stopped reporting, often because the resource is gone or the dimensions are wrong.

    12 min
  8. 8
    Monitoring AWS

    Tame flapping CloudWatch alarms

    10 state transitions in 24 hours isn't a fire — it's a misconfigured threshold. Tune the eval window or use anomaly detection.

    12 min
  9. 9
    Monitoring AWS

    Address frequently firing alarms

    An alarm that fires 158 times in 30 days isn't catching incidents — it's generating noise. Tune, suppress, or fix the underlying problem.

    13 min
  10. 10
    Monitoring AWS

    Audit alarms that never trigger

    An alarm that's been OK for 12 months is either fine or unverified — review periodically before you trust it for the next incident.

    11 min
  11. 11
    Monitoring AWS

    Resolve alarms stuck in ALARM state

    An alarm in ALARM is either a real fire or a stale signal. Triage, fix the underlying condition, or fix the alarm.

    14 min