AWS Security Hub · SageMaker
SageMaker.4: Endpoint variants should have > 1 instance
Written and reviewed by Emnode · Last reviewed
What does AWS Security Hub SageMaker.4 check?
SageMaker.4 fails when a production variant in an endpoint configuration has `InitialInstanceCount` set to 1, leaving the endpoint with no Availability Zone redundancy.
Why does SageMaker.4 matter?
A single-instance variant has nowhere to fail over to. When AWS reclaims the instance during an AZ maintenance event, SageMaker must reprovision — pulling the image and loading model weights — and the endpoint returns 5xx errors for minutes while it does. For a model on a checkout or recommendation path, that downtime hits revenue directly.
How do I fix SageMaker.4?
- Inventory endpoint configs and flag any production variant with an instance count of 1.
- Create a new endpoint configuration with the count raised to 2 or more, spreading instances across AZs.
- Update the endpoint to the new config — SageMaker swaps it in with no downtime.
- For endpoints where a second instance genuinely is not worth the cost, document a tracked exception.
Remediation script · bash
# Disable root across every notebook that has it on (mutable on a stopped instance).
for n in $(aws sagemaker list-notebook-instances \
--query 'NotebookInstances[].NotebookInstanceName' --output text); do
root=$(aws sagemaker describe-notebook-instance --notebook-instance-name "$n" \
--query 'RootAccess' --output text)
if [ "$root" = "Enabled" ]; then
aws sagemaker stop-notebook-instance --notebook-instance-name "$n"
aws sagemaker wait notebook-instance-stopped --notebook-instance-name "$n"
aws sagemaker update-notebook-instance --notebook-instance-name "$n" --root-access Disabled
aws sagemaker start-notebook-instance --notebook-instance-name "$n"
echo "$n: root access disabled"
fi
done
# Immutable settings need a rebuild. Recreate a notebook locked down: private subnet,
# no direct internet. (DirectInternetAccess and SubnetId cannot be changed in place.)
aws sagemaker create-notebook-instance \
--notebook-instance-name ml-feature-exploration \
--instance-type ml.t3.medium \
--role-arn arn:aws:iam::111122223333:role/SageMakerExecution \
--subnet-id subnet-0ab12cd34ef56 \
--security-group-ids sg-0aa11bb22cc33 \
--direct-internet-access Disabled \
--root-access Disabled Full walkthrough (console steps, edge cases and verification) in the lesson Harden SageMaker and ML workloads.
Is SageMaker.4 a false positive?
This is one of the rare findings where remediation costs money. Low-value or batch-style endpoints may legitimately run single-instance; the right move there is a documented exception, not blind remediation.
More SageMaker controls
- SageMaker.1 A SageMaker notebook has direct internet access
- SageMaker.2 A SageMaker notebook is not launched in a VPC
- SageMaker.3 Users have root access on a SageMaker notebook
- SageMaker.5 Models should have network isolation enabled
- SageMaker.8 Notebook instances should run supported platforms
- SageMaker.9 Data quality jobs inter-container encryption
- SageMaker.10 Explainability jobs inter-container encryption
- SageMaker.11 Data quality jobs network isolation
- SageMaker.12 Model bias jobs network isolation
- SageMaker.13 Model quality jobs inter-container encryption
- SageMaker.14 Monitoring schedules network isolation
- SageMaker.15 Model bias jobs inter-container encryption