Enhanced Monitoring: the basics
What does it actually mean for an RDS instance to lack Enhanced Monitoring?
By default, an RDS instance ships a handful of metrics to CloudWatch every 60 seconds: CPUUtilization, FreeableMemory, FreeStorageSpace, ReadIOPS/WriteIOPS, NetworkReceiveThroughput, and a few more. These come from outside the guest — the hypervisor measures them, not the OS. They tell you the box is busy. They don't tell you why.
Enhanced Monitoring is a separate feature that runs an agent inside the database host, scrapes /proc-style OS metrics, and streams them to CloudWatch Logs at 1, 5, 10, 15, 30, or 60-second intervals. You get per-process CPU and memory, IO wait, swap, load average, individual disk device utilisation — the kind of view you'd get from top, iostat, and vmstat on a normal Linux host.
Security Hub control RDS.6 fails any RDS instance with MonitoringInterval = 0, which is the default for instances created without explicitly opting in. The control is a fail-by-omission: nobody chose to leave it off, it just was never turned on.
In this lesson you'll learn the difference between default CloudWatch metrics and Enhanced Monitoring, when the extra granularity actually pays for itself (and when it doesn't), how to pick an interval that doesn't blow up your CloudWatch Logs bill, and how to flip it on safely with the right IAM role attached. You'll see real CLI investigation and the exact modify-db-instance call to remediate the finding.
The mystery 4am CPU spike
A team chased a recurring 4am CPU spike on their production Postgres for two months. Default CloudWatch showed CPU pinned at 95% for exactly 11 minutes, then back to baseline. Nothing in the slow-query log, no application traffic. They finally enabled Enhanced Monitoring at 5-second resolution and saw it immediately: autovacuum on a 400GB append-only table, kicked off by the daily stats threshold. One tuning parameter — autovacuum_vacuum_scale_factor — and the spike vanished. Two months of guessing, ten minutes of OS-level metrics.
Enabling Enhanced Monitoring in action
Marco is the database lead at a fintech. Security Hub fires RDS.6 against their primary Postgres instance — db-prod-payments — and he needs to clear it before the next SOC 2 audit window closes.
Before flipping the switch he wants to know the current state and the cost implication. Enhanced Monitoring streams to CloudWatch Logs at roughly $0.50/GB ingest; at 1-second intervals on a busy DB that's not free. He needs to pick the right interval, not just the cheapest.
He starts by checking the current monitoring configuration on the instance.
First, confirm the finding — check MonitoringInterval on the flagged instance.
RDS.6 confirmed — MonitoringInterval is 0 and no monitoring role is attached.
Now enable it at 15-second resolution for production. The monitoring role is a one-time IAM setup — re-use it across every RDS instance in the account.
Enhanced Monitoring enabled in-place. The change is online — no downtime.
Enhanced Monitoring under the hooddeep dive
Default CloudWatch metrics for RDS are emitted by the Nitro hypervisor — it sees the VM as a black box and reports what the host sees: CPU time charged to the VM, network bytes through the ENI, EBS volume IOPS at the block layer. From the guest's perspective it's invisible; nothing runs inside the database host to produce these metrics.
Enhanced Monitoring runs a small CloudWatch Logs agent inside the DB host (the same agent AWS manages for you on every RDS instance). It samples /proc, /sys, and the IO subsystem at your chosen interval and pushes a JSON document into the RDSOSMetrics log group. Each document includes per-process CPU/memory, swap usage, load average, IO wait, and per-device disk stats — the same fields you'd get from running top -b, vmstat, and iostat -x on a normal Linux box.
Pricing is straightforward but easy to underestimate: you pay CloudWatch Logs ingest ($0.50/GB) and storage ($0.03/GB-month) on the volume of OS metrics, which scales linearly with the inverse of the interval. A 1-second interval on a busy r6g.2xlarge can produce several GB per day. 60-second intervals are essentially free; 15 seconds is the typical production sweet spot; 1 second is for active troubleshooting, not steady state.
# The IAM role RDS needs to push OS metrics into CloudWatch Logs.
# AWS provides a managed policy — you just create the role and attach it.
aws iam create-role \
--role-name rds-monitoring-role \
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"monitoring.rds.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
aws iam attach-role-policy \
--role-name rds-monitoring-role \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonRDSEnhancedMonitoringRole What is the impact of running RDS without Enhanced Monitoring?
The direct impact is diagnostic blindness. When a latency spike happens at 3am, default CloudWatch tells you CPU went to 90% and that's it. You can't see whether it was a vacuum, a checkpointer, a runaway query, or backup IO competing for the disk — all of those look identical at the hypervisor layer. Without OS-level visibility most database incidents end up being guess-and-check, which means longer MTTR and more pages.
The second-order impact is over-provisioning. Teams that can't see what's actually consuming CPU and memory tend to throw bigger instances at the problem. "It might be memory pressure, let's go up a tier" is a $400/month fix for a problem that Enhanced Monitoring would have solved with a config change. Right-sizing decisions made without OS metrics are decisions made blind.
The compliance impact is concrete: RDS.6 sits in the AWS Foundational Security Best Practices and is one of the checks PCI DSS and HIPAA-aligned reviewers expect to see passing. A failed RDS.6 on a production database doesn't break the audit by itself, but it's the kind of finding that turns into a written remediation requirement with a deadline.
Enhanced Monitoring is also the prerequisite for several useful CloudWatch alarms — IO wait sustained above 20%, swap-in rate above zero, per-disk queue depth — none of which can be alarmed on without it. Without OS metrics, you're alarming on symptoms, not causes.
How do you enable Enhanced Monitoring without blowing up the bill?
Enabling Enhanced Monitoring is a four-step loop. The order matters — interval choice and IAM setup come before flipping the switch, audit and alarming come after.
1. Inventory which instances are non-compliant
Run describe-db-instances across every region and filter for MonitoringInterval=0. Sort by DBInstanceClass — bigger instances get higher priority because they're typically prod, and because they have the most useful OS-level signal to surface. Stage your remediation; you don't need to flip every dev DB at the same time as prod.
2. Create the monitoring role once, reuse everywhere
RDS needs an IAM role with the AWS-managed AmazonRDSEnhancedMonitoringRole policy attached. Create it once per account, name it predictably (rds-monitoring-role is the convention), and reference its ARN in every modify-db-instance call. Don't create one per instance — that's a pointless IAM explosion.
3. Pick the interval based on workload, not the cheapest default
60 seconds for dev/test (essentially free, still satisfies RDS.6). 15 seconds for steady-state production — the sweet spot between cost and resolution. 1-5 seconds only when you're actively troubleshooting; turn it back down to 15 once the incident is closed. Don't leave 1-second on a fleet of busy DBs unless you've budgeted for the CloudWatch Logs spend.
4. Pair it with Performance Insights and alarms
Enhanced Monitoring shows you what the OS sees; Performance Insights shows you what the database engine sees — query-level wait events, top SQL by load, blocking sessions. They're complementary, not redundant. PI is free for 7-day retention. Once both are on, wire up alarms on IO wait, swap usage, and DB load — that's the value Enhanced Monitoring unlocks.
# Apply Enhanced Monitoring to every non-compliant RDS instance in the region.
for id in $(aws rds describe-db-instances \
--query "DBInstances[?MonitoringInterval==\`0\`].DBInstanceIdentifier" \
--output text); do
aws rds modify-db-instance \
--db-instance-identifier "$id" \
--monitoring-interval 60 \
--monitoring-role-arn arn:aws:iam::123456789012:role/rds-monitoring-role \
--apply-immediately
done
# Verify — every row should now show a non-zero MonitoringInterval.
aws rds describe-db-instances \
--query "DBInstances[].{Id:DBInstanceIdentifier,Interval:MonitoringInterval}" \
--output table Quick quiz
Question 1 of 5You've cleared RDS.6 on a production Postgres instance by enabling Enhanced Monitoring at 15-second intervals. Three weeks later finance pings you about a $900/month spike on CloudWatch Logs. What's the most likely cause?
You scored
0 / 5
Keep learning
Dig deeper into RDS observability and the controls around it.
- AWS Security Hub control RDS.6 The exact rule definition, severity, and remediation guidance from AWS.
- Enhanced Monitoring for Amazon RDS Service docs covering setup, metric reference, and CloudWatch Logs integration.
- Amazon RDS Performance Insights The query-level companion to Enhanced Monitoring — free at 7-day retention.
- AWS Foundational Security Best Practices The Security Hub standard that ships RDS.6 and its sibling RDS controls.
You've completed Enable RDS Enhanced Monitoring. You can now tell the difference between hypervisor and OS-level metrics, pick an interval that satisfies RDS.6 without burning the CloudWatch Logs budget, and pair Enhanced Monitoring with Performance Insights for full-stack database visibility. The next time RDS.6 shows up in a Security Hub digest, you'll have a four-step loop ready to run.
Back to the library