Skip to main content
emnode / learn
Compliance

Harden ECS container workloads

One capability across ECS task definitions, services, task sets and clusters: drop the privileges, close the network paths, move secrets out of plaintext and turn on the logging so a single compromised container stays a contained incident.

14 min·10 sections·AWS

Last reviewed

Hardening ECS workloads: the basics

Why a default ECS task definition is more privileged and more exposed than you think

An Amazon ECS task definition is the blueprint for a running container: its image, CPU and memory, networking, the user it runs as, what it can write, what it logs and what secrets it carries. The trouble is that the defaults lean towards convenience, not safety. Leave the user field unset and a Linux container runs as root; leave readonlyRootFilesystem unset and the container can rewrite itself at runtime; paste a credential into the environment block and it is stored as plaintext forever. Set pidMode to host or privileged to true and the wall between the container and the EC2 instance it shares with other tasks effectively disappears.

AWS Security Hub turns each weak default into its own control, which is why a single cluster can fail half a dozen ECS checks at once. ECS.3 flags a shared host process namespace (pidMode host); ECS.4 flags privileged containers; ECS.5 flags a writable root filesystem; ECS.8 flags AWS credentials in the environment block; ECS.9 flags missing log configuration; ECS.12 flags clusters without Container Insights; ECS.16 flags task sets that assign public IPs; ECS.20 and ECS.21 flag Linux containers running as root and Windows containers running as containeradministrator. They look like separate problems on the report, but they are one capability: give each container the least privilege, the smallest network footprint and the most visibility it can do its job with.

The reason these matter is blast radius. A container that runs as root, can write to itself, shares the host's process namespace, or carries a plaintext key is a far better launchpad for an attacker than a constrained one. Most of the failures are drift, a Dockerfile that never set a USER, a task definition copied from a Stack Overflow answer. The job is to find every over-privileged or over-exposed task definition, harden it at the source, then gate the pipeline so the failing configuration cannot ship again.

In this lesson you will learn how ECS expresses privilege, network exposure, secret handling and observability across task definitions, task sets and clusters, how to find every weakly-configured workload in an account, and how to harden them without breaking apps that legitimately need to write or carry config. The Controls this lesson covers section lists every Security Hub control in this capability, each linking to a deep page with the exact check and a copy-and-paste fix.

Fun fact

The backdoor that couldn't find a home

In a red-team exercise against a payments platform the attackers popped a container through a vulnerable dependency and tried their standard move: write a persistence script and a reverse-shell binary to disk so they would survive a restart. Every write failed with Read-only file system. The task had a non-root user, a read-only root filesystem, no privileged flag, and exactly one narrow writable mount that was wiped on every recycle. The foothold lasted until the next deployment, hours later, with nothing persisted. The whole class of post-exploitation moves was off the table because of a handful of one-line settings in the task definition.

Finding over-privileged containers across an estate

Devon is on the platform team at a B2B SaaS company preparing for a SOC 2 renewal. Security Hub shows ECS failures spread across task definitions in the production cluster: containers running as root, a writable root filesystem, and at least one plaintext credential.

Rather than work the findings one by one, he starts by inspecting the highest-risk task definition, the public-facing checkout-api, to see how many controls a single workload is failing at once.

Start with a public-facing service and inspect the user, filesystem and privilege settings together. One task definition often fails several controls.

$ aws ecs describe-task-definition --task-definition checkout-api --query 'taskDefinition.containerDefinitions[].{Name:name,User:user,ReadOnly:readonlyRootFilesystem,Priv:privileged}' --output table
------------------------------------------------------------
| checkout-api | None | None | None |
| log-sidecar | 0 | False | None |
------------------------------------------------------------
# user None defaults to root (fails ECS.20); readonly None defaults to writable (fails ECS.5).

An unset user and an unset read-only flag both default to the insecure value. One workload here trips ECS.20 and ECS.5 at once; fix it at the source in one revision.

How ECS hardening actually worksdeep dive

Most ECS controls resolve to one of three concerns, all evaluated on the latest active revision of the task definition (or the cluster, or the task set). The first is privilege: the user field (ECS.20 for Linux, ECS.21 for Windows), the privileged flag (ECS.4) and pidMode (ECS.3). The second is exposure and integrity: readonlyRootFilesystem (ECS.5) and assignPublicIp on a task set's network configuration (ECS.16). The third is secrets and observability: AWS credential keys in the environment block versus the secrets field (ECS.8), the logConfiguration block (ECS.9) and containerInsights on the cluster (ECS.12).

Task definitions are immutable, so every fix is the same shape: register a new revision with the corrected fields, then redeploy the service so the change takes effect on the next deployment. Registering a revision does not restart running tasks, which is why a green control can lag a still-running insecure task until you redeploy. ECS.5 is reported NOT_APPLICABLE for Windows containers; ECS.20 only evaluates Linux task definitions and ECS.21 only Windows; ECS.16 is the task-set sibling of the service-level public-IP check, because task sets carry their own network configuration during blue-green and external deployments.

The strongest position is preventive. ECS.8 in particular needs care: the secrets field pulls a value from Secrets Manager or SSM at launch using the task EXECUTION role (not the task role), so the execution role needs GetSecretValue plus kms:Decrypt if a customer-managed key is used, and any credential that ever sat in the environment block must be rotated, not just relocated. Bake non-root users, read-only filesystems, the secrets pattern, log configuration and Container Insights into a shared task-definition template, and add a CI policy check (cfn-guard, OPA, or an AWS Config conformance rule) that rejects any non-compliant definition before it deploys.

What is the impact of leaving containers unhardened?

The direct impact is a wider blast radius on any compromise. A container running as root, with a writable filesystem, or with the privileged flag or a shared host process namespace, gives an attacker who lands a foothold the tools to escalate, persist, and break out onto the host that runs all your tasks. Read-only filesystems and non-root users remove most of the post-exploitation playbook; isolating the process namespace and dropping privileged keep a single compromised service from reaching the box and its neighbours.

The second-order impact is credential and data exposure. A plaintext AWS key in an environment block is readable by anyone who can describe the task definition and is the single most common root cause of expensive account compromise, with automated scanners finding leaked keys within hours. A task set that assigns public IPs quietly puts a backend on the internet during a blue-green cutover even when the service looks locked down. Each of these is a path from a misconfiguration to a real incident.

There is an observability impact too: a container with no log configuration is a black box when it crashes, and a cluster without Container Insights is one you pay for but cannot see into until an incident forces the question. On the compliance side these controls map to NIST 800-53 access-control, audit-and-accountability and PCI DSS requirements, so an open backlog drags the posture score, surfaces in SOC 2 and PCI assessments, and becomes friction in enterprise procurement, independent of whether any breach ever occurs.

How do you harden containers safely?

Work the capability as one loop rather than chasing individual findings. The order matters: work out what each container legitimately needs (to write, to log, to read as a secret) before you tighten it, so you do not break a running service.

1. Inventory every task definition, task set and cluster

Across accounts and regions, list the latest active revision of each task definition and flag containers that run as root or containeradministrator, set privileged true, set pidMode host, lack readonlyRootFilesystem, carry credential keys in the environment block, or have no logConfiguration. List task sets with assignPublicIp ENABLED and clusters without Container Insights. Prioritise internet-facing and high-privilege services first; remember ECS.20 is Linux only and ECS.21 Windows only.

2. Work out each container's legitimate needs before tightening

Before flipping flags, find out what each container actually needs: which paths it writes (for a narrow tmpfs or volume mount under a read-only root), whether it genuinely needs the privileged flag or host pidMode (almost never), and which environment values are really secrets. For ECS.8, treat any value that ever lived in the environment block as leaked and rotate it before relocating. The cleanest apps write nothing to root and carry no secrets in env, and harden with no other change.

3. Register hardened revisions and redeploy, highest impact first

Set a non-root user, readonlyRootFilesystem true with narrow writable mounts, privileged false, pidMode unset or task, a logConfiguration block, and move secrets to the secrets field referencing a Secrets Manager or SSM ARN (granting the execution role read plus kms:Decrypt). Disable assignPublicIp on task sets and enable Container Insights on clusters. Register a new revision and redeploy the service, since task definitions are immutable; roll out one service as a canary, confirm no Read-only file system errors and healthy tasks, then proceed.

4. Gate the pipeline so the failing configuration can't ship again

Cleanup without prevention just resets the clock. Bake the hardened settings into a shared task-definition template or IaC module, and add a CI policy check (cfn-guard, OPA) or an AWS Config conformance rule that rejects any task definition with a root user, a writable root, a privileged container, host pidMode, a credential in env, or no logging. Make the secure choice the default choice so engineers get it for free.

# Inventory: flag containers running as root or with a writable root filesystem.
for fam in $(aws ecs list-task-definition-families --status ACTIVE \
    --query 'families[]' --output text); do
  aws ecs describe-task-definition --task-definition "$fam" \
    --query "taskDefinition.containerDefinitions[?user==null || user=='root' || user=='0' || readonlyRootFilesystem!=\`true\`].{Family:'$fam',Name:name,User:user,ReadOnly:readonlyRootFilesystem}" \
    --output text
done

# Harden at the source. Dockerfile:
#   RUN addgroup -S app && adduser -S -G app appuser
#   USER appuser
# Task definition: non-root user, read-only root with one narrow tmpfs, secrets via ARN.
#   "user": "1000:1000",
#   "readonlyRootFilesystem": true,
#   "mountPoints": [{ "sourceVolume": "scratch", "containerPath": "/tmp", "readOnly": false }],
#   "secrets": [{ "name": "DB_PASSWORD",
#     "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/checkout/db-AbCdEf" }]

# Register the hardened revision and roll it out (tasks only update on redeploy).
aws ecs register-task-definition --cli-input-json file://checkout-api-hardened.json
aws ecs update-service --cluster prod --service checkout-api \
  --task-definition checkout-api --force-new-deployment

Quick quiz

Question 1 of 5

Security Hub shows ECS failures across task definitions, a task set and a cluster. What is the most efficient way to think about them?

You can now treat ECS hardening as one capability rather than a scatter of findings: inventory each task definition, task set and cluster, work out what each container legitimately needs, register hardened revisions and redeploy highest-impact first, and gate the pipeline so the failing configuration can't ship again. The Controls this lesson covers section below links every control in this group to its deep page and fix.

Back to the library

Controls this lesson covers

One capability, many AWS Security Hub controls. This lesson is the shared playbook; each control below keeps its own deep page with the exact check, severity and a copy-and-paste fix.