Securing VPN connections: the basics
What does "a healthy VPN" actually mean across the four checks?
An AWS Site-to-Site VPN is an encrypted IPsec link between your VPC and a remote network, a data center, a branch office, or another cloud, and AWS always provisions it as two redundant tunnels in two Availability Zones. A Client VPN is the managed remote-access service that lets users tunnel into your VPCs from the internet. Securing them is one capability that AWS expresses through four checks. EC2.183 fails any Site-to-Site tunnel whose accepted IKE versions still include the deprecated IKEv1. EC2.20 fails a Site-to-Site connection where either of its two tunnels is DOWN, because a connection running on one tunnel has already spent its safety margin. EC2.171 fails a Site-to-Site connection whose tunnels do not log session events to CloudWatch, and EC2.51 fails a Client VPN endpoint with connection logging switched off.
They look like four separate problems on the report, but they are one capability: the encrypted doors into your network should use current cryptography, fail over without you noticing, and record what they did. The crypto check (IKEv1 versus IKEv2) is about the strength and modernity of the handshake. The tunnel-state check is about resilience, redundancy you are paying for that has silently degraded to one-of-one. And the two logging checks are about accountability, the ability to explain what a tunnel did and who connected through it during an incident or an audit.
Most of these are a single setting that ships off by default. Tunnel logging is opt-in, so every Site-to-Site VPN created in the console fails EC2.171 until someone turns it on; Client VPN connection logging is the same. IKEv1 is usually a default someone clicked through years ago and never revisited. The one exception is a downed tunnel, which is a real fault to diagnose. The job is to inventory every VPN, lock the tunnels to IKEv2, restore and harden any downed tunnel, turn logging on, and guardrail the lot so new connections are born secure.
In this lesson you will learn how AWS expresses VPN security across Site-to-Site and Client VPN, how to find every connection that uses deprecated crypto, runs on a single tunnel, or logs nothing, and how to remediate each without dropping more than the brief renegotiation needs. The Controls this lesson covers section lists every Security Hub control in this capability, each linking to a deep page with the exact check and a copy-and-paste fix.
The tunnel that was down for 90 days
A retail company ran a Site-to-Site VPN to its on-prem order-management system. One tunnel dropped during a routine firewall change at the data center and never came back, because the change had narrowed an ACL so only one of the two AWS tunnel endpoints could reach the customer gateway. Nothing broke, because the surviving tunnel carried all the traffic, so no alarm fired and no ticket was opened. Ninety-one days later AWS took the endpoint hosting the surviving tunnel down for scheduled, announced maintenance, with automatic failover assumed, and the entire hybrid order flow went dark for forty minutes because there was no second tunnel to fail over to. The tunnel-state finding had been red the entire time; nobody was looking at it.
Auditing VPN health across an estate
Marco owns the network platform at a logistics company. Security Hub flags a cluster of VPN findings across several Site-to-Site connections: one with a tunnel DOWN, two still accepting IKEv1, and a batch with logging off across both tunnels.
Rather than work the findings one by one, he starts by reading the live telemetry for every connection, so he can see at a glance which tunnels are up, which crypto they negotiated, and which are dark, before changing anything.
Inventory every Site-to-Site VPN and the live status of each of its two tunnels. A tunnel DOWN for weeks is the worst case: redundancy has silently gone.
VgwTelemetry exposes per-tunnel status, the last status change, and a message pointing at the cause.
How AWS evaluates a healthy VPNdeep dive
Three of the four checks read fields on the tunnel options, which is why the fix is usually two API calls per Site-to-Site connection (one per tunnel). EC2.183 fails whenever a tunnel's IkeVersions list contains ikev1, regardless of which version the live session actually negotiated, so leaving ikev1 in the list keeps the finding open because the configuration still permits a downgrade. EC2.171 fails whenever a tunnel's LogOptions.CloudWatchLogOptions.LogEnabled is false; both tunnels must log for the connection to pass. EC2.20, backed by the Config rule vpc-vpn-2-tunnels-up, reads the per-tunnel Status inside VgwTelemetry and fails if either reports DOWN. EC2.51 reads a single field on a Client VPN endpoint, ConnectionLogOptions.Enabled.
A tunnel is UP when IKE Phase 1 and Phase 2 are established and, for dynamic routing, the BGP session is in Established state. The common silent-drop causes are an on-prem firewall blocking UDP 500/4500 to one endpoint after a change, an idle SA expiring on a low-traffic link, and StartupAction left at add so AWS waits passively for the customer gateway to initiate instead of starting the handshake itself. Connection logging is distinct from packet-level data: VPN tunnel logs capture IKE and BGP session events, Client VPN connection logs capture who connected from where and when, and neither is the same as VPC flow logs, so satisfying one logging control does not satisfy another.
Most of these evaluate continuously through AWS Config (every 12 hours, or change-triggered for the Client VPN logging check), so a fix clears at the next cycle. Changing a tunnel's IKE version or log options triggers a brief in-place renegotiation: the IPsec SA is rebuilt and, for BGP-dynamic VPNs, the route reconverges on the other tunnel in 30 to 60 seconds. That is why you change one tunnel at a time, so the second keeps carrying traffic throughout.
What is the impact of an insecure VPN?
The resilience impact is the most insidious because nothing looks broken. A Site-to-Site connection with one tunnel DOWN still carries all its traffic on the survivor, so it generates no urgency, yet it has gone from two-of-two to one-of-one with no failover path. AWS schedules and announces endpoint maintenance on the assumption that the second tunnel carries traffic; if it is already down, that routine maintenance becomes your outage, and the blast radius is whatever rides the link, on-prem databases, Active Directory, internal APIs, file shares.
The cryptographic impact is about handshake strength. IKEv1 dates from 1998, has known PSK-mode weaknesses in aggressive mode, and historically allowed weaker algorithms; a tunnel that still accepts it is one customer-gateway misconfiguration from negotiating something an assessor will mark substandard, and without MOBIKE it forces a full fresh handshake on every ISP IP change rather than carrying the session across. The accountability impact is forensic blindness: connection logs are how you answer "who was on the VPN during the incident, and from where?" and "why did the tunnel drop?" Without them that question has no answer, and unlike most gaps it cannot be reconstructed after the fact, because logs are not retroactive.
On the compliance side, these map to NIST high-availability and contingency-planning controls (CP-10, SC-5), NIST SP 800-77 which deprecates IKEv1 for new deployments, and the audit and access-control logging requirements (AU-2, AU-3, AU-12) in NIST 800-53, NIST 800-171 and PCI DSS. A persistent finding here is evidence that the stated security and availability posture and the actual running configuration have drifted apart, which is exactly what surfaces in a SOC 2 review or a customer security questionnaire.
How do you secure VPN connections safely?
Work the capability as one loop rather than chasing individual findings. The order matters: verify the far side and pick a low-traffic window before you change a tunnel, and change one tunnel at a time so the other keeps carrying traffic.
1. Inventory every VPN and read its telemetry and options
List every Site-to-Site VPN and Client VPN endpoint across every region. For each Site-to-Site connection, read VgwTelemetry for per-tunnel status, the IkeVersions list, and the LogOptions; for each Client VPN endpoint, read ConnectionLogOptions.Enabled. A tunnel DOWN with a LastStatusChange weeks in the past is the highest priority, because the connection has been single-tunnel that whole time.
2. Diagnose downed tunnels and verify the far side before changing crypto
For a DOWN tunnel, the StatusMessage and timing usually point at the cause: an on-prem firewall now blocking UDP 500/4500, an idle SA with no keepalive traffic, or StartupAction left at add. Fix the root cause first. Before locking a tunnel to IKEv2, confirm the customer gateway supports it (every modern firewall does; very old industrial gear may not, in which case the right answer is to replace the gateway, not leave the VPN deprecated).
3. Remediate one tunnel at a time, in a low-traffic window
Apply modify-vpn-tunnel-options to one tunnel, watch BGP reconverge on the other, confirm the changed tunnel comes back UP, then do the second. Lock IkeVersions to ikev2 only, enable LogOptions pointed at a CloudWatch log group with a retention policy, and set StartupAction to start so AWS initiates IKE itself. For Client VPN, create a log group with retention and point the endpoint's ConnectionLogOptions at it. Generate steady traffic or a low-rate health check across each link so the SA never idles out.
4. Alarm and enforce so the posture maintains itself
Publish a CloudWatch alarm on the AWS/VPN TunnelState metric per tunnel wired to on-call, so the next drop pages a human in minutes rather than waiting for a scan. Pin IKEv2 and logging in your IaC so new connections are born secure, and add Service Control Policies that deny ModifyVpnTunnelOptions payloads containing ikev1 and block new VPNs without logging. Prevention is what turns this from a recurring finding into a one-time fix.
# List every Site-to-Site VPN that currently has a tunnel DOWN.
aws ec2 describe-vpn-connections \
--query 'VpnConnections[?VgwTelemetry[?Status==`DOWN`]].VpnConnectionId' --output text
# Lock a tunnel to IKEv2 only and turn logging on, one tunnel at a time.
aws ec2 modify-vpn-tunnel-options \
--vpn-connection-id vpn-0a1b2c3d4e5f6a7b8 \
--vpn-tunnel-outside-ip-address 203.0.113.20 \
--tunnel-options '{"IKEVersions":[{"Value":"ikev2"}],"StartupAction":"start","LogOptions":{"CloudWatchLogOptions":{"LogEnabled":true,"LogGroupArn":"arn:aws:logs:us-east-1:111122223333:log-group:/aws/vpn/s2s","LogOutputFormat":"json"}}}'
# Turn on connection logging for a Client VPN endpoint.
aws ec2 modify-client-vpn-endpoint \
--client-vpn-endpoint-id cvpn-endpoint-0a1b2c3d4e5f6a7b8 \
--connection-log-options Enabled=true,CloudwatchLogGroup=/aws/clientvpn/prod-access
# Alarm on tunnel state (0 = down) so the next drop pages on-call, not the next scan.
aws cloudwatch put-metric-alarm --alarm-name vpn-tunnel2-down \
--namespace AWS/VPN --metric-name TunnelState \
--dimensions Name=VpnId,Value=vpn-0a1b2c3d4e5f6a7b8 Name=TunnelIpAddress,Value=203.0.113.20 \
--statistic Minimum --period 300 --evaluation-periods 1 \
--threshold 1 --comparison-operator LessThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:111122223333:network-oncall Quick quiz
Question 1 of 5Security Hub shows a downed tunnel, two IKEv1 tunnels, and several VPNs with logging off. What is the most efficient way to think about them?
You scored
0 / 5
Keep learning
Go deeper on Site-to-Site VPN tunnel options, IPsec hardening, and the monitoring and logging around VPN connections.
- AWS Site-to-Site VPN tunnel options Full reference for every per-tunnel option: IKE versions, DPD, StartupAction, phase-1/2 algorithms, and logging.
- Monitoring Site-to-Site VPN tunnels with CloudWatch The AWS/VPN metrics including TunnelState, so a dropped tunnel alarms rather than waiting for a compliance scan.
- NIST SP 800-77 Rev. 1 - Guide to IPsec VPNs The authoritative IPsec configuration baseline, explicit on deprecating IKEv1 for new deployments.
You can now treat VPN security as one capability rather than four separate findings: inventory every Site-to-Site and Client VPN, restore and harden any downed tunnel, lock the tunnels to IKEv2, turn connection and tunnel logging on, and ratchet it shut with TunnelState alarms and Service Control Policies so new connections are born secure. The Controls this lesson covers section below links every control in this group to its deep page and fix.
Back to the library