Skip to main content
emnode / learn
Compliance

Enable event replication on EventBridge global endpoints

Security Hub EventBridge.4 — a global endpoint without event replication can't recover from a Regional failover on its own, leaving you to fix it by hand at the worst possible moment.

11 min·10 sections·AWS

Last reviewed

Remediates AWS Security Hub: EventBridge.4

EventBridge global endpoints: the basics

Why a failover-ready endpoint still needs replication switched on

An EventBridge global endpoint is the front door that makes an event-driven application Region-fault-tolerant. You point your publishers at one endpoint, attach a Route 53 health check, and configure a primary and a secondary Region. If the health check flips to unhealthy, EventBridge automatically reroutes all custom events to the event bus in the secondary Region within minutes — no code change, no manual cutover. It's the resilience pattern AWS recommends for any workload that can't tolerate a single Region going dark.

EventBridge.4 checks one specific knob on that endpoint: whether event replication is enabled. With replication on, EventBridge uses managed rules to send every custom event to the event buses in both the primary and the secondary Region, all the time. The control fails — flagging the AWS::Events::Endpoint resource — whenever a global endpoint exists with replication turned off.

It's flagged Medium because the gap is invisible until you actually need it. An endpoint with replication off still routes events normally and still fails over when the health check trips. The catch surfaces afterward: without replication, EventBridge can't automatically recover, so getting traffic back to the primary Region requires someone to manually reset the Route 53 health check to healthy. That's exactly the kind of manual step you don't want sitting on the critical path during an incident.

In this lesson you'll learn what an EventBridge global endpoint does, how Route 53 health checks drive automatic failover to a secondary Region, and why event replication is the difference between a system that recovers on its own and one that needs a manual health-check reset. You'll see the AWS CLI commands to find global endpoints with replication off and to inspect their configuration, the requirement that custom event buses share a name across Regions, the small cost replication adds, and how to enforce the setting so new endpoints arrive compliant.

Fun fact

Failover worked. Failback didn't.

A payments team built a textbook multi-Region setup: global endpoint, Route 53 health check, primary in us-east-1, secondary in us-west-2. During a real Regional event the failover fired exactly as designed — events rerouted to the secondary bus within minutes, and customers never noticed. The surprise came after the Region recovered. Because event replication had never been enabled, EventBridge couldn't auto-reset, so traffic stayed pinned to the secondary Region until an on-call engineer manually flipped the Route 53 health check back to healthy hours later. The outage they'd engineered around lasted minutes; the extra manual hours of running on the backup Region were entirely self-inflicted by one disabled setting.

Turning on event replication in action

Devon owns the events platform at a logistics company. Security Hub flags EventBridge.4 on the orders-global endpoint — a global endpoint fronting the order-processing event bus, with replication disabled. The endpoint has been live for eight months and has never been exercised by a real failover.

He confirms the topology first: a Route 53 health check is attached, the primary is us-east-1 and the secondary is us-west-2, and crucially both Regions have a custom event bus named orders-bus in the same account — the precondition for failover to work at all. Without matching bus names across Regions, enabling replication wouldn't fix anything.

Devon enables event replication on the endpoint via the EventBridge console — Edit endpoint, then select Event replication enabled — and re-runs the check. EventBridge now publishes every custom event to both Regions' buses through managed rules. He notes the small bump in EventBridge cost in the next month's bill as expected, and documents that orders-global will now recover automatically after a failover instead of waiting on a manual health-check reset.

First, list every EventBridge global endpoint and whether event replication is enabled, so you know which ones the control will flag.

$ aws events list-endpoints --query 'Endpoints[].{Name:Name,State:State,Replication:ReplicationConfig.State}' --output table
-------------------------------------------------
| ListEndpoints |
+-----------------+-----------+-----------------+
| Name | State | Replication |
+-----------------+-----------+-----------------+
| orders-global | ACTIVE | DISABLED |
| billing-global | ACTIVE | ENABLED |
+-----------------+-----------+-----------------+
# orders-global has replication DISABLED — it can't auto-recover after failover.

An inventory of global endpoints with their replication state — DISABLED is exactly what EventBridge.4 fails on.

Inspect the flagged endpoint to confirm its Route 53 health check and the primary/secondary event buses before enabling replication.

$ aws events describe-endpoint --name orders-global --query '{Replication:ReplicationConfig.State,HealthCheck:RoutingConfig.FailoverConfig.Primary.HealthCheck,Primary:EventBuses[0].EventBusArn,Secondary:EventBuses[1].EventBusArn}'
{
"Replication": "DISABLED",
"HealthCheck": "arn:aws:route53:::healthcheck/abc123",
"Primary": "arn:aws:events:us-east-1:111122223333:event-bus/orders-bus",
"Secondary": "arn:aws:events:us-west-2:111122223333:event-bus/orders-bus"
}
# Same bus name 'orders-bus' in both Regions — failover precondition is met.

Confirm the health check exists and both Regions use a same-named custom bus before flipping replication on.

Global endpoints and replication under the hooddeep dive

A global endpoint is an AWS::Events::Endpoint resource that wraps a pair of custom event buses — one in a primary Region, one in a secondary Region — behind a single endpoint ID. Publishers call PutEvents against the endpoint instead of a specific bus. The endpoint's RoutingConfig.FailoverConfig references a Route 53 health check; while that check reports healthy, events flow to the primary bus, and when it flips to unhealthy, EventBridge reroutes all custom events to the secondary bus within minutes. For this to work, both Regions must have a custom event bus with the same name in the same account.

Event replication is controlled by ReplicationConfig.State, which is either ENABLED or DISABLED. When enabled, EventBridge provisions managed rules that copy every custom event to the buses in both Regions continuously — not just during failover. That ongoing dual-write does two things: it continuously proves the secondary path is correctly wired (you'd notice immediately if events stopped arriving there), and it's the mechanism EventBridge relies on to automatically reset the Route 53 health check to healthy and fail traffic back to the primary once the Region recovers.

Without replication, the failover-out path still works — the health check trips and events route to the secondary — but the failback-in path is manual. There's no continuous secondary stream for EventBridge to confirm health against, so an operator has to manually mark the Route 53 health check healthy to return traffic to the primary Region. Replication trades a small per-event cost (events are billed in both Regions) for removing that manual step from the recovery runbook.

# Find every global endpoint with replication disabled across the account.
aws events list-endpoints \
  --query 'Endpoints[?ReplicationConfig.State==`DISABLED`].[Name,Arn]' \
  --output table

# Inspect one endpoint's full routing and replication configuration.
aws events describe-endpoint --name orders-global \
  --query '{Replication:ReplicationConfig.State,Routing:RoutingConfig,Buses:EventBuses}'

# Confirm both Regions actually have the same-named custom bus (failover precondition).
aws events describe-event-bus --name orders-bus --region us-east-1 --query 'Arn'
aws events describe-event-bus --name orders-bus --region us-west-2 --query 'Arn'

What is the impact of disabled event replication?

The headline impact is a broken recovery path. With replication off, your global endpoint will fail over to the secondary Region correctly, but it cannot automatically fail back. Returning traffic to the primary Region requires an operator to manually reset the Route 53 health check to healthy. That means extra unplanned time running on the secondary Region — time that depends on someone noticing, having the right access, and executing the reset correctly under incident pressure.

The second impact is loss of continuous validation. Replication's ongoing dual-write to both Regions is a built-in canary: if the secondary path is misconfigured, you find out during normal operation, not during an outage. Without it, your failover configuration is untested in practice, and the first time it's exercised for real is the worst possible time to discover that the secondary bus name doesn't match or a permission is missing.

There is a real cost to enabling replication, and it's worth naming honestly: because events are sent to event buses in both Regions, you pay EventBridge ingestion on both, increasing the monthly bill. AWS calls this out explicitly. For a high-volume endpoint that's a measurable line item, not a rounding error — which is exactly why the trade-off deserves a deliberate decision rather than a default.

Finally, there's a compliance and audit dimension. EventBridge.4 maps to NIST 800-53 high-availability and resilience controls (CP-10, SC-5(2), SI-13(5)). A failing control on a system you've designated business-critical is the kind of finding that surfaces in audits and disaster-recovery reviews — and 'we have failover but it doesn't auto-recover' is a hard position to defend when the whole point of the architecture was resilience.

How do you remediate EventBridge.4 safely?

Remediation is a four-step loop: confirm the endpoint is genuinely critical, verify the failover preconditions, enable replication, then enforce the setting so new endpoints arrive compliant.

1. Confirm the endpoint matters before paying for replication

Replication adds recurring cost, so start by classifying the endpoint. Is this global endpoint fronting a business-critical, low-RTO workload, or a low-stakes internal one? For critical systems, automatic recovery is worth the per-event charge in both Regions; for genuinely non-critical ones you may consciously accept the finding with a documented exception. Don't blanket-enable across every endpoint without this triage.

2. Verify the failover preconditions are actually met

Before enabling replication, confirm the topology is sound: a Route 53 health check is attached to the endpoint, and both the primary and secondary Regions have a custom event bus with the same name in the same account. AWS calls this out explicitly — failover only works with matching same-named buses across Regions. Enabling replication on a misconfigured endpoint won't deliver the recovery you expect.

3. Enable event replication on the endpoint

Set the endpoint's replication to enabled. In the EventBridge console, edit the global endpoint and under Event replication choose 'Event replication enabled', as the AWS remediation guidance describes. EventBridge then provisions managed rules that copy every custom event to both Regions' buses, restoring the automatic failback path and giving you continuous validation that the secondary side is wired correctly.

4. Enforce replication on new endpoints with Config and IaC

Stop the finding from recurring by codifying it. Bake replication-enabled into your CloudFormation, CDK, or Terraform module for global endpoints so every new one ships compliant, and keep the AWS Config rule global-endpoint-event-replication-enabled in scope so any drift or hand-built endpoint is caught automatically. The control re-evaluates on change, so a non-compliant endpoint surfaces within minutes of creation.

# Enable event replication on an existing global endpoint.
# (Re-supply the existing routing config so it isn't cleared on update.)
aws events update-endpoint \
  --name orders-global \
  --replication-config State=ENABLED \
  --routing-config '{"FailoverConfig":{"Primary":{"HealthCheck":"arn:aws:route53:::healthcheck/abc123"},"Secondary":{"Route":"us-west-2"}}}' \
  --event-buses '[{"EventBusArn":"arn:aws:events:us-east-1:111122223333:event-bus/orders-bus"},{"EventBusArn":"arn:aws:events:us-west-2:111122223333:event-bus/orders-bus"}]'

# Verify replication is now enabled.
aws events describe-endpoint --name orders-global \
  --query 'ReplicationConfig.State'

Quick quiz

Question 1 of 5

A business-critical global endpoint fails over to its secondary Region correctly during a real outage, but EventBridge.4 is failing on it. After the primary Region recovers, what behavior should you expect, and what's the right fix?

You've completed Enable event replication on EventBridge global endpoints. You now know what a global endpoint does, why Route 53 health checks drive automatic failover, and why event replication is the difference between a system that recovers on its own and one that needs a manual health-check reset. You've also seen the failover precondition — same-named custom buses in both Regions — and the deliberate cost trade-off replication carries. The next time EventBridge.4 flags an endpoint, you'll know how to triage, verify, enable, and enforce in one pass.

Back to the library