Skip to main content
emnode / learn
Compliance

Enable application and API logging

One capability across API Gateway, AppSync, Athena, CodeBuild, DMS, DataSync, Step Functions, Transfer Family and managed database log exports: make sure every application and data service writes a durable, queryable record of what it did.

14 min·10 sections·AWS

Last reviewed

Application and API logging: the basics

What does "application logging" mean across so many services?

Most AWS application and data services can produce a log of what they did, and almost all of them ship with that logging switched off by default. An API Gateway stage can trace every request through it; an AppSync API can log GraphQL field resolution; an Athena workgroup can record every query; a CodeBuild project can capture build output; a DMS replication task can log source and target activity; a DataSync task can log transfer detail; a Step Functions state machine can record every state transition; a Transfer Family connector can log each file movement; and managed databases such as RDS, MariaDB and SQL Server can export their engine logs to CloudWatch. In each case, with logging off, the work happens and leaves no durable trace.

Security Hub turns each of these into its own control, which is why one estate can fail a whole cluster of application-logging checks at once. APIGateway.1 and APIGateway.9 cover REST and HTTP/WebSocket logging; AppSync.2, Athena.4 and CodeBuild.4 cover GraphQL, query and build logging; DMS.7, DMS.8, DataSync.1, MSK.5 and Transfer.3 cover data-movement logging; RDS.40, RDS.42 and SSM.6 cover engine log exports and automation logging; StepFunctions.1 covers workflow logging. They look like separate problems on the report, but they are one capability: every service that does meaningful work should write down what it did, somewhere you can query later.

This capability is distinct from the CloudTrail family. CloudTrail records the control plane: who called which AWS API to create, change or delete a resource. Application logging records the data plane: what a request, query, build, transfer or workflow actually did once it was running. You need both. CloudTrail tells you a stage was created; application logging tells you which requests it served and how it answered them.

In this lesson you will learn how AWS expresses application and API logging across gateways, GraphQL APIs, query engines, build systems, data-movement services, workflows and managed databases, why logging is off by default almost everywhere, and how to turn it on without trading a logging gap for a storage bill. The Controls this lesson covers section lists every Security Hub control in this capability, each linking to a deep page with the exact check and a copy-and-paste fix.

Fun fact

The pipeline that failed quietly for a month

A data team ran a nightly reconciliation workflow as a Step Functions state machine with logging off, the default for Standard workflows. A schema change broke one branch, but because the execution was configured to catch the error and continue, the state machine reported SUCCEEDED every night while silently skipping a third of the records. Nobody noticed for 31 days, until month-end numbers did not tie out. The 90-day console history showed green ticks; with no CloudWatch logs there were no per-state details to query. The fix that would have caught it on night one was a single logging configuration, costing a few dollars a month for that workflow's volume.

Finding the services that run silent

Dmitri owns the platform account at a mid-sized SaaS company. After a new region rollout, Security Hub raises a batch of application-logging findings: API Gateway stages, an Athena workgroup, a couple of DMS tasks and several Step Functions state machines, all with logging switched off. None are causing problems yet, which is exactly why nobody had noticed.

Rather than work the findings one by one, he starts by confirming which services are genuinely dark versus misconfigured, so he can fix the capability with one consistent baseline rather than chasing each control.

Check the execution logging level on a REST API stage. An unset loggingLevel (shown as None) is the default and the failing state.

$ aws apigateway get-stages --rest-api-id a1b2c3d4e5 --query 'item[].{Stage:stageName,LoggingLevel:methodSettings."*/*".loggingLevel}' --output table
-----------------------------------
| LoggingLevel | Stage |
+----------------+----------------+
| None | prod |
| None | staging |
| INFO | dev |
-----------------------------------
# prod and staging serve live traffic with no logging: both fail the control.

Logging off is the default for new stages, workflows and tasks alike. The report shows it as separate findings, but it is one capability.

How application logging works across these servicesdeep dive

Most of these controls resolve to one of three patterns. The first is a per-resource logging level or flag: API Gateway execution logging via loggingLevel set to ERROR or INFO, AppSync field-level logging, Step Functions logging set to ALL, ERROR or FATAL. The second is a log destination that must be defined: API Gateway V2 access logging needs both a destination ARN and a format string, Athena workgroups and CodeBuild projects need a CloudWatch Logs group or S3 location, DMS, DataSync, MSK Connect and Transfer Family connectors each need a CloudWatch Logs target. The third is database log export: RDS, MariaDB and SQL Server publish their engine logs to CloudWatch through the EnableCloudwatchLogsExports setting, and SSM Automation writes to a CloudWatch log group.

The prerequisite that catches everyone is the delivery permission. Several of these services need an account-level or per-resource IAM role before logs will flow at all. API Gateway needs an account-level CloudWatch role (cloudwatchRoleArn); Step Functions needs the vended-logs delivery permissions (logs:CreateLogDelivery and friends) on its execution role; Network and database services need their service principal or role permitted to write to the destination. If that role is missing, the logging setting saves silently and the log group stays empty, which is the most common reason a remediation looks done but the finding stays failed. Always verify logs actually appear, not just that the setting saved.

Security Hub evaluates these through AWS Config managed rules, mostly change-triggered, so a fix usually re-evaluates within minutes. Two operational details matter across the board. Logging is forward-only: it captures activity from the moment it is enabled, never the historical backlog. And verbose levels (full request tracing, includeExecutionData, dataTrace) can capture sensitive payloads, so production should default to the lighter level with redaction and a retention policy, not the most verbose option turned on everywhere.

What is the impact of services that run without logging?

The first impact is operational blindness during incidents. When a stage starts returning errors, a workflow fails silently, a DMS task drops rows or a build breaks, there is no record of which requests reached the backend, which state failed, or what the service actually did. Engineers end up reproducing failures live with logging temporarily enabled, turning a five-minute diagnosis into a multi-hour one, often during a customer-facing outage when speed matters most.

The second impact is forensic and security investigation. If an API is abused (credential stuffing against a login endpoint, scraping, an injection attempt) the application logs are how you reconstruct what happened and what the backend did with the malicious requests. Without them, an incident that should produce a clear timeline produces a shrug, and the request-level detail cannot be recovered after the fact.

The third impact is audit and accountability. These controls map to the NIST 800-53 audit family (AU-2, AU-3, AU-6, AU-12) and to PCI DSS section 10, all variations on the requirement that systems record activity and that the record be reviewable. A failing finding is a concrete, citable gap that drags down the compliance score leadership and customers see, and remediating it under audit pressure is far more disruptive than having had it on all along.

The fourth impact, the one to manage deliberately, is that logging has a cost and a data risk. Verbose logging on a high-traffic API or workflow generates a lot of CloudWatch data, and full-payload tracing will happily write credentials or PII into log groups. The fix is cheap and correct, but it must come with retention policies, a default of the lighter level in production, and redaction of sensitive fields, or you trade a logging gap for a storage bill and a data-handling problem.

How do you turn application logging on safely?

Work the capability as one loop rather than chasing individual findings. Confirm the delivery permissions, inventory the silent services, enable the right level with retention and redaction, then bake logging into your templates so new services ship compliant.

1. Confirm the delivery permission before touching the service

Many of these services need an IAM role before logs flow at all: API Gateway's account-level CloudWatch role, Step Functions' vended-logs delivery permissions, the service principal grant on a database or connector destination. Without it, the logging setting saves but no logs ever appear, which is the single most common reason a fixed resource stays flagged. This is a prerequisite, not an optional cleanup step.

2. Inventory every service running silent

Across every account and Region, list the services in this group and read their logging state: API Gateway loggingLevel and AccessLogSettings, AppSync field logging, Athena and CodeBuild log configs, DMS and DataSync and Transfer connector logging, MSK Connect logging, RDS/MariaDB/SQL Server EnableCloudwatchLogsExports, SSM Automation, and Step Functions loggingConfiguration. Capture rough request or execution volume too, because it drives the cost decision in the next step.

3. Enable the right level with retention and redaction

Set the level to capture what you need without overpaying: ERROR or block-only on high-traffic production services, fuller tracing only on low-volume or actively-debugged ones. Leave full-payload tracing (dataTrace, includeExecutionData) off in production or redact sensitive fields, since it logs request and response bodies. Critically, set a CloudWatch Logs or S3 retention policy on every destination at the same time, so enabling logging does not create an unbounded storage bill.

4. Build it into your IaC templates

The manual fix clears today's findings; it does not stop new ones. Bake logging (level, destination, retention) into the CloudFormation, Terraform, CDK or SAM templates every new service is created from, and enforce it with the matching AWS Config rules so a non-compliant resource is flagged the moment it deploys. When logging is a property of the template, these findings go to zero and stay there.

# Verify the prerequisite first: API Gateway's account-level CloudWatch role.
# Without it, the logging setting saves but no logs ever flow.
aws apigateway get-account --query 'cloudwatchRoleArn' --output text

# Enable ERROR-level execution logging on every stage of a REST API.
REST_API=a1b2c3d4e5
for STAGE in $(aws apigateway get-stages --rest-api-id $REST_API \
  --query 'item[].stageName' --output text); do
  aws apigateway update-stage --rest-api-id $REST_API --stage-name $STAGE \
    --patch-operations op=replace,path=/*/*/logging/loglevel,value=ERROR
done

# Cap retention on the log group so storage stays bounded (do this every time you enable logging).
aws logs put-retention-policy \
  --log-group-name "API-Gateway-Execution-Logs_${REST_API}/prod" \
  --retention-in-days 90

# Example for a managed database: publish engine logs to CloudWatch (no per-event charge).
aws rds modify-db-instance --db-instance-identifier prod-db \
  --cloudwatch-logs-export-configuration 'EnableLogTypes=["error","audit"]' --apply-immediately

Quick quiz

Question 1 of 5

Security Hub shows application-logging failures across API Gateway, Athena, DMS and Step Functions. What is the most efficient way to think about them?

You can now treat application logging as one capability rather than a scatter of findings: confirm the delivery permission, inventory the services running silent, enable the right level with retention and redaction, and bake logging into the templates so new services ship compliant. The Controls this lesson covers section below links every control in this group to its deep page and fix.

Back to the library

Controls this lesson covers

One capability, many AWS Security Hub controls. This lesson is the shared playbook; each control below keeps its own deep page with the exact check, severity and a copy-and-paste fix.