Compliance

Enable application and API logging

One capability across API Gateway, AppSync, Athena, CodeBuild, DMS, DataSync, Step Functions, Transfer Family and managed database log exports: make sure every application and data service writes a durable, queryable record of what it did.

14 min·10 sections·AWS

Last reviewed 16 June 2026

Remediates AWS Security Hub: APIGateway.1 APIGateway.9 AppSync.2 Athena.4 CodeBuild.4 DataSync.1 DMS.7 DMS.8 MSK.5 RDS.40 RDS.42 SSM.6 StepFunctions.1 Transfer.3

Application and API logging: the basics

What does "application logging" mean across so many services?

Most AWS application and data services can produce a log of what they did, and almost all of them ship with that logging switched off by default. An API Gateway stage can trace every request through it; an AppSync API can log GraphQL field resolution; an Athena workgroup can record every query; a CodeBuild project can capture build output; a DMS replication task can log source and target activity; a DataSync task can log transfer detail; a Step Functions state machine can record every state transition; a Transfer Family connector can log each file movement; and managed databases such as RDS, MariaDB and SQL Server can export their engine logs to CloudWatch. In each case, with logging off, the work happens and leaves no durable trace.

Security Hub turns each of these into its own control, which is why one estate can fail a whole cluster of application-logging checks at once. APIGateway.1 and APIGateway.9 cover REST and HTTP/WebSocket logging; AppSync.2, Athena.4 and CodeBuild.4 cover GraphQL, query and build logging; DMS.7, DMS.8, DataSync.1, MSK.5 and Transfer.3 cover data-movement logging; RDS.40, RDS.42 and SSM.6 cover engine log exports and automation logging; StepFunctions.1 covers workflow logging. They look like separate problems on the report, but they are one capability: every service that does meaningful work should write down what it did, somewhere you can query later.

This capability is distinct from the CloudTrail family. CloudTrail records the control plane: who called which AWS API to create, change or delete a resource. Application logging records the data plane: what a request, query, build, transfer or workflow actually did once it was running. You need both. CloudTrail tells you a stage was created; application logging tells you which requests it served and how it answered them.

In this lesson you will learn how AWS expresses application and API logging across gateways, GraphQL APIs, query engines, build systems, data-movement services, workflows and managed databases, why logging is off by default almost everywhere, and how to turn it on without trading a logging gap for a storage bill. The Controls this lesson covers section lists every Security Hub control in this capability, each linking to a deep page with the exact check and a copy-and-paste fix.

Fun fact

The pipeline that failed quietly for a month

A data team ran a nightly reconciliation workflow as a Step Functions state machine with logging off, the default for Standard workflows. A schema change broke one branch, but because the execution was configured to catch the error and continue, the state machine reported SUCCEEDED every night while silently skipping a third of the records. Nobody noticed for 31 days, until month-end numbers did not tie out. The 90-day console history showed green ticks; with no CloudWatch logs there were no per-state details to query. The fix that would have caught it on night one was a single logging configuration, costing a few dollars a month for that workflow's volume.

Finding the services that run silent

Dmitri owns the platform account at a mid-sized SaaS company. After a new region rollout, Security Hub raises a batch of application-logging findings: API Gateway stages, an Athena workgroup, a couple of DMS tasks and several Step Functions state machines, all with logging switched off. None are causing problems yet, which is exactly why nobody had noticed.

Rather than work the findings one by one, he starts by confirming which services are genuinely dark versus misconfigured, so he can fix the capability with one consistent baseline rather than chasing each control.

Check the execution logging level on a REST API stage. An unset loggingLevel (shown as None) is the default and the failing state.

$ aws apigateway get-stages --rest-api-id a1b2c3d4e5 --query 'item[].{Stage:stageName,LoggingLevel:methodSettings."*/*".loggingLevel}' --output table

-----------------------------------

| LoggingLevel | Stage |

+----------------+----------------+

| None | prod |

| None | staging |

| INFO | dev |

-----------------------------------

# prod and staging serve live traffic with no logging: both fail the control.

Logging off is the default for new stages, workflows and tasks alike. The report shows it as separate findings, but it is one capability.

How application logging works across these servicesdeep dive

Most of these controls resolve to one of three patterns. The first is a per-resource logging level or flag: API Gateway execution logging via loggingLevel set to ERROR or INFO, AppSync field-level logging, Step Functions logging set to ALL, ERROR or FATAL. The second is a log destination that must be defined: API Gateway V2 access logging needs both a destination ARN and a format string, Athena workgroups and CodeBuild projects need a CloudWatch Logs group or S3 location, DMS, DataSync, MSK Connect and Transfer Family connectors each need a CloudWatch Logs target. The third is database log export: RDS, MariaDB and SQL Server publish their engine logs to CloudWatch through the EnableCloudwatchLogsExports setting, and SSM Automation writes to a CloudWatch log group.

The prerequisite that catches everyone is the delivery permission. Several of these services need an account-level or per-resource IAM role before logs will flow at all. API Gateway needs an account-level CloudWatch role (cloudwatchRoleArn); Step Functions needs the vended-logs delivery permissions (logs:CreateLogDelivery and friends) on its execution role; Network and database services need their service principal or role permitted to write to the destination. If that role is missing, the logging setting saves silently and the log group stays empty, which is the most common reason a remediation looks done but the finding stays failed. Always verify logs actually appear, not just that the setting saved.

Security Hub evaluates these through AWS Config managed rules, mostly change-triggered, so a fix usually re-evaluates within minutes. Two operational details matter across the board. Logging is forward-only: it captures activity from the moment it is enabled, never the historical backlog. And verbose levels (full request tracing, includeExecutionData, dataTrace) can capture sensitive payloads, so production should default to the lighter level with redaction and a retention policy, not the most verbose option turned on everywhere.

What is the impact of services that run without logging?

The first impact is operational blindness during incidents. When a stage starts returning errors, a workflow fails silently, a DMS task drops rows or a build breaks, there is no record of which requests reached the backend, which state failed, or what the service actually did. Engineers end up reproducing failures live with logging temporarily enabled, turning a five-minute diagnosis into a multi-hour one, often during a customer-facing outage when speed matters most.

The second impact is forensic and security investigation. If an API is abused (credential stuffing against a login endpoint, scraping, an injection attempt) the application logs are how you reconstruct what happened and what the backend did with the malicious requests. Without them, an incident that should produce a clear timeline produces a shrug, and the request-level detail cannot be recovered after the fact.

The third impact is audit and accountability. These controls map to the NIST 800-53 audit family (AU-2, AU-3, AU-6, AU-12) and to PCI DSS section 10, all variations on the requirement that systems record activity and that the record be reviewable. A failing finding is a concrete, citable gap that drags down the compliance score leadership and customers see, and remediating it under audit pressure is far more disruptive than having had it on all along.

The fourth impact, the one to manage deliberately, is that logging has a cost and a data risk. Verbose logging on a high-traffic API or workflow generates a lot of CloudWatch data, and full-payload tracing will happily write credentials or PII into log groups. The fix is cheap and correct, but it must come with retention policies, a default of the lighter level in production, and redaction of sensitive fields, or you trade a logging gap for a storage bill and a data-handling problem.

How do you turn application logging on safely?

Work the capability as one loop rather than chasing individual findings. Confirm the delivery permissions, inventory the silent services, enable the right level with retention and redaction, then bake logging into your templates so new services ship compliant.

1. Confirm the delivery permission before touching the service

Many of these services need an IAM role before logs flow at all: API Gateway's account-level CloudWatch role, Step Functions' vended-logs delivery permissions, the service principal grant on a database or connector destination. Without it, the logging setting saves but no logs ever appear, which is the single most common reason a fixed resource stays flagged. This is a prerequisite, not an optional cleanup step.

2. Inventory every service running silent

Across every account and Region, list the services in this group and read their logging state: API Gateway loggingLevel and AccessLogSettings, AppSync field logging, Athena and CodeBuild log configs, DMS and DataSync and Transfer connector logging, MSK Connect logging, RDS/MariaDB/SQL Server EnableCloudwatchLogsExports, SSM Automation, and Step Functions loggingConfiguration. Capture rough request or execution volume too, because it drives the cost decision in the next step.

3. Enable the right level with retention and redaction

Set the level to capture what you need without overpaying: ERROR or block-only on high-traffic production services, fuller tracing only on low-volume or actively-debugged ones. Leave full-payload tracing (dataTrace, includeExecutionData) off in production or redact sensitive fields, since it logs request and response bodies. Critically, set a CloudWatch Logs or S3 retention policy on every destination at the same time, so enabling logging does not create an unbounded storage bill.

4. Build it into your IaC templates

The manual fix clears today's findings; it does not stop new ones. Bake logging (level, destination, retention) into the CloudFormation, Terraform, CDK or SAM templates every new service is created from, and enforce it with the matching AWS Config rules so a non-compliant resource is flagged the moment it deploys. When logging is a property of the template, these findings go to zero and stay there.

# Verify the prerequisite first: API Gateway's account-level CloudWatch role.
# Without it, the logging setting saves but no logs ever flow.
aws apigateway get-account --query 'cloudwatchRoleArn' --output text

# Enable ERROR-level execution logging on every stage of a REST API.
REST_API=a1b2c3d4e5
for STAGE in $(aws apigateway get-stages --rest-api-id $REST_API \
  --query 'item[].stageName' --output text); do
  aws apigateway update-stage --rest-api-id $REST_API --stage-name $STAGE \
    --patch-operations op=replace,path=/*/*/logging/loglevel,value=ERROR
done

# Cap retention on the log group so storage stays bounded (do this every time you enable logging).
aws logs put-retention-policy \
  --log-group-name "API-Gateway-Execution-Logs_${REST_API}/prod" \
  --retention-in-days 90

# Example for a managed database: publish engine logs to CloudWatch (no per-event charge).
aws rds modify-db-instance --db-instance-identifier prod-db \
  --cloudwatch-logs-export-configuration 'EnableLogTypes=["error","audit"]' --apply-immediately

Quick quiz

Question 1 of 5

Security Hub shows application-logging failures across API Gateway, Athena, DMS and Step Functions. What is the most efficient way to think about them?

Keep learning

Go deeper on how logging works across the application services in this capability.

You can now treat application logging as one capability rather than a scatter of findings: confirm the delivery permission, inventory the services running silent, enable the right level with retention and redaction, and bake logging into the templates so new services ship compliant. The Controls this lesson covers section below links every control in this group to its deep page and fix.

Back to the library

Application logging: what it means for the business

Cheap audit evidence whose absence only costs you mid-incident

These services are the front doors and engines of the company's applications. "Logging" on them is exactly what it sounds like: a record of what each request, query, build, transfer or workflow did. This cluster of findings flags services that are doing their job but keeping no durable record of it. None of the findings break anything or cost money directly; the cost is contingent and lands later, when an incident or an auditor asks what happened on a given day and the honest answer is "we were not recording."

Several frameworks the company likely cares about expect this. The audit-and-accountability family of NIST 800-53 and PCI DSS section 10 both require that activity on systems is logged and reviewable. A failing application-logging control is a concrete, citable gap that pulls down a compliance score and shows up in a SOC 2 or PCI review. The fix is almost entirely CloudWatch Logs or S3 storage, which is small and controllable with retention policies.

There is one genuine cost trap to manage. Verbose, full-payload logging on a high-traffic API or a high-volume workflow generates real CloudWatch volume, and full request bodies can capture sensitive data in the logs. The right ask is: turn logging on, default to the lighter level in production, cap retention, and redact sensitive fields. That clears the controls without trading a logging gap for a storage bill or a data-handling problem.

This lesson is for the finance partner who sees a cluster of application-logging findings on the security report and wants to know what the right response is and what it costs. It covers why these controls map to audit requirements, why the only real cost is log storage, the verbose-logging trap to watch on the bill, and how to read a reappearing finding count as a build-process signal rather than just a number to clear.

Fun fact

The pipeline that failed quietly for a month

How a finance partner frames the application-logging findings

Dmitri brings the batch of application-logging findings to his finance partner because the report shows a cluster of red across API Gateway, Athena, DMS and Step Functions and he wants the spend sized before he turns anything on. The partner's starting point is reassuring on cost: the fix is almost entirely CloudWatch Logs or S3 storage, typically a few dollars to low tens of dollars a month per active service at a sensible level. As remediation goes, this is one of the cheapest groups on the board. None of the findings break anything or cost money today; the cost of the gap is contingent and lands later, mid-incident or mid-audit, when someone asks what happened on a given day and the honest answer is we were not recording.

The one trap the partner flags is verbose logging. Full-payload tracing on a high-traffic API or a high-volume workflow generates real CloudWatch volume, and full request bodies can capture credentials and PII straight into the log groups. So the finance ask is precise: turn logging on, default to the lighter level in production, cap retention on every destination, and redact sensitive fields. That clears the controls without trading a logging gap for a storage bill or a data-handling problem. The partner also notes that a reappearing finding count is a build-discipline signal worth tracking, since logging that is not baked into templates means the same gap-by-default pattern is probably true of other controls too.

Why this matters to governance, not just the bill

The direct cost of fixing these controls is small: CloudWatch Logs or S3 storage, typically a few dollars to low tens of dollars a month per active service at a sensible logging level. As remediation goes, this is one of the cheapest groups on the board. The reason it matters is the cost of the audit gap it represents, not the cost of fixing it.

The controls map to the audit-and-accountability family of NIST 800-53 and to PCI DSS section 10, the parts of most frameworks that say you must be able to show what your systems did. A failing finding is a specific resource an auditor can point at, and for a company carrying a SOC 2, PCI or FedRAMP commitment that is exactly what erodes the score reported externally.

Watch the bill after any logging push. The lighter level is cheap; the verbose level and full request-body tracing on a busy service can add real money and capture sensitive data. The right ask is log on by default, lighter level in production, capped retention, redacted fields. And treat a reappearing finding count as a build-discipline signal: if new findings keep arriving every time a service ships, logging is not baked into the templates, and the same gap-by-default pattern is probably true of other controls.

What finance can do about the application-logging gap

Finance cannot set a loggingLevel, but it can frame application logging as cheap, governed audit evidence and stop the fix becoming a new storage or data problem. Three levers.

1. Budget logging as cheap audit evidence, not a cost to cut

The direct spend is small, a few dollars to low tens of dollars a month per active service at a sensible level, so treat it as a planned line that buys the ability to show what systems did. The cost that matters is the audit gap it represents, because these controls map to the NIST 800-53 audit-and-accountability family and PCI DSS section 10, and a failing finding is a specific resource an auditor can point at. Fund the lighter level everywhere first; it is the cheapest defensible posture.

2. Watch the bill after any logging push and cap retention

The lighter level is cheap; the verbose level and full request-body tracing on a busy service can add real money fast. The finance condition on any logging rollout is a CloudWatch Logs or S3 retention policy on every destination at the same time logging is enabled, so storage stays bounded. Make capped retention a non-negotiable part of the remediation, not a follow-up, or you trade a logging gap for an unbounded storage line.

3. Read a reappearing finding count as a build-discipline signal

If new application-logging findings keep arriving every time a service ships, logging is not baked into the templates, and that same gap-by-default pattern is probably true of other controls. Treat the trend in the count, not just the snapshot, as the metric: a count that creeps back up after each release is a process problem upstream of any single finding, and it is worth flagging at the cost-and-security review.

Quick quiz

Question 1 of 5

What is the direct cost of remediating the application-logging controls?

Keep learning

Go deeper on how logging works across the application services in this capability.

You have finished the finance view of application logging. You know the direct cost is small, CloudWatch Logs or S3 storage at a sensible level, that the verbose level plus full-payload tracing on busy services is the one real cost-and-data trap to cap, and that the exposure is audit and accountability against NIST 800-53 and PCI DSS section 10 rather than a big bill. Next time the cluster appears, you will fund the lighter level everywhere with capped retention, watch the bill after the push, and read a reappearing count as a build-discipline signal.

Back to the library

Application logging: the headline

Can we explain what our applications did? Right now, sometimes not

Our applications run through managed services: API gateways, GraphQL APIs, query engines, build systems, data-movement pipelines and workflows. This group of findings means some of them are running with no durable record of what they did. Everything works day to day, until a customer or an auditor asks what happened on a given date and the honest answer is that we were not recording.

This is a low-risk, low-cost set of fixes tied to the audit-and-accountability requirements in the frameworks we report against. The real outcome is not the findings clearing; it is that every application we run becomes explainable after the fact, which is the baseline an audit expects.

The leadership question is whether logging is built into how new services ship, not whether today's finding count is zero. Done well, the fix comes with cost and data guardrails so it never becomes its own problem.

A short read for the leader who needs to know what application logging proves, why closing it is a governance baseline rather than a budget call, and what a defensible end state looks like: every application service logs by default, with retention and redaction set so the fix never becomes a cost or data problem.

Fun fact

The pipeline that failed quietly for a month

What it looks like when logging is a shipping default, not an afterthought

The application-logging findings reached the executive review after a nightly reconciliation workflow had failed silently for a month, reporting SUCCEEDED every night while skipping a third of the records, because logging was off and nobody could query what each state actually did. The lesson leadership drew was not about that one workflow. It was that the company runs applications, API gateways, query engines, build systems, data-movement pipelines, that cannot explain what they did, and that the frameworks the business reports against treat systems must log their activity as a baseline.

So the executive framing settled on two things. First, this is a low-risk, low-cost set of fixes whose real outcome is that every application becomes explainable after the fact, which is what an audit expects. Second, the fix must come with guardrails, lighter level in production, capped retention, redacted fields, so it never becomes its own cost or data problem. The question leadership keeps asking is not whether today's finding count is zero, it is whether logging is built into how new services ship, because a count that keeps creeping back up is the real signal.

Why this is on the report at all

On its own, any one of these findings is small and cheap to fix. They are on the report because of what they represent: applications we run that cannot explain what they did. Multiplied across a fleet of services, that is the difference between an incident or an audit producing a clear timeline and producing a shrug. The frameworks we report against treat "systems must log their activity" as a baseline, and this is a concrete place we are failing it.

There is a cost nuance leadership should know so the fix does not create a new problem. Logging at the most verbose level on busy services generates real storage cost and can capture sensitive data. So the correct instruction is not just turn it all on, it is turn it on, default to the lighter level in production, cap retention and redact. Done that way, the findings clear, the bill stays flat, and we stop running applications we cannot account for. The real question is whether this is built into how new services ship.

The leadership move on application logging

The executive handle is not to approve each logging setting, it is to require that applications are explainable after the fact by default, with guardrails so the fix never becomes a cost or data problem.

1. Require logging by default in how new services ship

The real question is whether logging is a property of the templates every new service is created from, not whether today's finding count is zero. Make logging, level, destination and retention, part of the standard IaC, enforced by the matching Config rules, so a non-compliant service is flagged the moment it deploys. When logging ships by default, these findings go to zero and stay there instead of being cleared by hand each release.

2. Insist the fix comes with cost and data guardrails

Do not let turn it all on become the instruction. The correct one is turn it on, default to the lighter level in production, cap retention, and redact sensitive fields, because the most verbose level on busy services generates real storage cost and can capture credentials and PII. Done that way the findings clear, the bill stays flat, and logging does not create the next problem on the report.

3. Track the trend in the finding count as a process signal

A count that keeps creeping back up after each release tells leadership that logging is not built into how services ship, which is a more useful signal than the snapshot number. Ask for the trend and the explanation: if new findings keep arriving, the gap is in the build process, and that is where executive attention belongs rather than on clearing individual resources.

Quick quiz

Question 1 of 5

What do these application-logging findings really represent at the leadership level?

Keep learning

Go deeper on how logging works across the application services in this capability.

Two takeaways: an application that cannot explain what it did is a baseline audit gap, and the fix is cheap only if it ships with guardrails, lighter level in production, capped retention, redacted fields. The leadership outcome is logging built into how new services ship, so the findings go to zero and stay there. The number to watch is not today's count, it is whether that count keeps creeping back after each release.

Back to the library

Controls this lesson covers

One capability, many AWS Security Hub controls. This lesson is the shared playbook; each control below keeps its own deep page with the exact check, severity and a copy-and-paste fix.

APIGateway

DMS

MSK

MSK.5 Medium MSK connectors should have logging

RDS

SSM

SSM.6 Medium SSM Automation runs are not logged

StepFunctions

StepFunctions.1 Medium State machines should have logging on

Transfer

Transfer.3 Medium Transfer connectors should have logging

Part of the learning path See what's happening

Enable application and API logging

Application and API logging: the basics

The pipeline that failed quietly for a month

Finding the services that run silent

How application logging works across these servicesdeep dive

What is the impact of services that run without logging?

How do you turn application logging on safely?

1. Confirm the delivery permission before touching the service

2. Inventory every service running silent

3. Enable the right level with retention and redaction

4. Build it into your IaC templates

Quick quiz

Keep learning

Application logging: what it means for the business

The pipeline that failed quietly for a month

How a finance partner frames the application-logging findings

Why this matters to governance, not just the bill

What finance can do about the application-logging gap

1. Budget logging as cheap audit evidence, not a cost to cut

2. Watch the bill after any logging push and cap retention

3. Read a reappearing finding count as a build-discipline signal

Quick quiz

Keep learning

Application logging: the headline

The pipeline that failed quietly for a month

What it looks like when logging is a shipping default, not an afterthought

Why this is on the report at all

The leadership move on application logging

1. Require logging by default in how new services ship

2. Insist the fix comes with cost and data guardrails

3. Track the trend in the finding count as a process signal

Quick quiz

Keep learning

Controls this lesson covers

APIGateway

AppSync

Athena

CodeBuild

DataSync

DMS

MSK

RDS

SSM

StepFunctions

Transfer

Related compliance lessons