Skip to main content

Deploy an Anyscale Cloud on AWS

Before you run Ray workloads on Anyscale, an Anyscale organization member must deploy an Anyscale Cloud within a Amazon Web Services (AWS) environment. This integration enables Anyscale to manage resources like compute instances and storage directly in an AWS account.

Prerequisites

  1. Register a user account on Anyscale at console.anyscale.com and set up the Anyscale CLI locally.
  2. Verify your ability to launch EC2 instances in the AWS region you plan to use on Anyscale. Anyscale supports all commercially available regions. Anyscale doesn't support regions outside of the aws partition, that is, China regions and US GovCloud regions.
  3. Set up AWS credentials locally, by running aws configure (for more details see the AWS configuration guide).
  4. Set up AWS credentials to correspond to the AWS account that you're using for the Anyscale Cloud and it should have permissions to manage all of these resources. Ensure that you have minimal IAM permissions for cloud operations.
note

The following resources have low default quota:

  • Number of VPCs per region
  • Number of internet gateways per region

Anyscale requires one of these resources per cloud. If you've reached your quota, see how you can raise it.

1. Install the Anyscale CLI

  1. Run the following command to install the Anyscale CLI and Python client package:
pip install -U anyscale
  1. To authenticate your credentials, run the following command, which fetches and updates the token that confirms your identity in the ~/.anyscale/credentials.json file.
anyscale login

If necessary, log in to the Anyscale console to complete authentication.

2. Choose an Anyscale Cloud deployment method

Deploying an Anyscale Cloud integrates Anyscale's capabilities into your AWS account to leverage its compute, storage, and networking resources for scalable, distributed computing.

You can use one of two different deployment methods that use the Anyscale CLI for cloud configuration. Choose a method based on your organization's requirements.

  • anyscale cloud setup - Use for rapid deployment and a straightforward, low-maintenance solution; deploy in public subnets and access over public IP addresses without setting up additional networking infrastructure.
  • anyscale cloud register - Suitable for teams with advanced cloud expertise, seeking enhanced security, custom private networking, and specific compliance needs.

3. Create an Anyscale Cloud

Based on the deployment method selected from the previous step, create an Anyscale Cloud with the following instructions.

For the anyscale cloud setup deployment method, Anyscale automatically creates and configures the necessary resources within your AWS account. You deploy Ray clusters in public subnets and access them using public IP addresses without needing to set up additional networking infrastructure like VPNs.

Note: To manually customize resources, use the (Custom) cloud register method instead.

An Anyscale Cloud deployed using anyscale cloud setup uses direct networking with an architecture similar to the following:

Direct Networking

Deploy a new Cloud

Run the following command to deploy a new Cloud:

anyscale cloud setup \
--name example_cloud_name \
--provider aws \
--region ap-southeast-1 \
--enable-head-node-fault-tolerance
🏁Optional flags

--enable-head-node-fault-tolerance: Enables head node fault tolerance in Anyscale Services by configuring an additional MemoryDB instance for the Ray Global Control Store. Note that this flag extends the setup time by approximately 20 minutes.

note

By default, Anyscale doesn't set any retention policy for the S3 bucket created by managed cloud setup. If you have any preference or concern, you could set on your own.

4. Verify cloud resources

Anyscale provides a CLI command that to verify cloud resources for both options. Anyscale runs verification automatically during cloud creation and you can also run the verification on demand.

Trigger functional verification by specifying --functional-verify workspace or --functional-verify service. Anyscale launches a workspace or a service to verify the cloud is functional. You can also trigger both verifications (--functional-verify workspace,service).

$ anyscale cloud verify --name my-cloud-deployment

Authenticating
Loaded Anyscale authentication token from ANYSCALE_CLI_TOKEN.

Output
(anyscale +0.4s) Verifying VPC ...
(anyscale +0.8s) VPC vpc-1234 verification succeeded.
(anyscale +0.8s) Verifying subnets ...
(anyscale +1.2s) Subnets ['subnet-1234', 'subnet-2345', 'subnet-3456', 'subnet-4567'] verification succeeded.
(anyscale +1.2s) Verifying IAM roles ...
(anyscale +2.8s) IAM roles ['arn:aws:iam::999999999999:role/anyscale-iam-role-1234', 'arn:aws:iam::999999999999:role/cld_1234-cluster_node_role'] verification succeeded.
(anyscale +2.8s) Verifying security groups ...
(anyscale +3.0s) Security group ['sg-1234'] verification succeeded.
(anyscale +3.0s) Verifying S3 ...
(anyscale +3.1s) S3 anyscale-production-data-cld-1234 verification succeeded.
(anyscale +3.1s) Verifying EFS ...
(anyscale +3.3s) S3 fs-1234 verification succeeded.
(anyscale +3.3s) Verifying CloudFormation stack ...
(anyscale +3.3s) CloudFormation stack arn:aws:cloudformation:us-west-2:999999999999:stack/cld-1234/1915d0c0-3dd2-11ed-8365-020cb3caf633 verification succeeded.
(anyscale +3.3s) Verification resullt:
vpc: PASSED
subnets: PASSED
iam roles: PASSED
security groups: PASSED
s3: PASSED
efs: PASSED
cloudformation stack: PASSED
(anyscale +3.3s) Start functional verification...
Functional verification for WORKSPACE is about to begin.
It will spin up one m5.xlarge instance for each function and will incur a small amount of cost.
For workspace verification, it takes about 5 minutes.
The instances will be terminated after verification. Do you want to continue? [y/N]: y
╭───────────────────────────────────────────────────────────────────────────── workspace verification ─────────────────────────────────────────────────────────────────────────────╮
0:00:02 Workspace created at https://console.anyscale.com/workspaces/expwrk_abc/ses_abc │
0:01:22 Workspace is active. │
0:00:00 Workspace termination initiated. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
0:01:24 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Workspace verification succeeded!

Credential handling

Credentials never travel across the network to Anyscale and Anyscale doesn't store your credentials anywhere. Instead, Anyscale creates an IAM role in your cloud account, grants it permissions to interact with EC2 and IAM in your account, and allows Anyscale to assume that role. Anyscale then only stores the IAM role ARN that you created in your account. See Secret management for instructions.

caution
  • If you specify the instance IAM role, make sure it has read/write access to the S3 bucket registered with the cloud.
  • If you register multiple security groups with the Anyscale cloud and want to specify them in the advanced config, you're responsible for specifying a working set of security groups (see the security group section in the resource requirements. Your cluster may end up in an error state if you fail to do so. For example, the head node may not able to communicate with worker nodes.

Minimal IAM Permissions for cloud commands

This section provides the minimal IAM permissions required for the Anyscale CLI to perform cloud operations. As an AWS administrator, follow these steps to apply the policy:

  1. Create a new IAM policy or edit an existing policy to include the following permissions.

  2. Attach the policy to the IAM user or role used to run the Anyscale CLI.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudformationManagement",
"Effect": "Allow",
"Action": [
"cloudformation:CreateChangeSet",
"cloudformation:CreateStack",
"cloudformation:DeleteStack",
"cloudformation:DescribeStackEvents",
"cloudformation:DescribeStacks",
"cloudformation:ListStacks"
],
"Resource": [
"*"
]
},
{
"Sid": "EC2Management",
"Effect": "Allow",
"Action": [
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:CreateVpcEndpoint",
"ec2:DeleteInternetGateway",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteVpc",
"ec2:DeleteVpcEndpoints",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInternetGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribeRegions",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroupRules",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:ModifySubnetAttribute",
"ec2:ModifyVpcAttribute",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress"
],
"Resource": [
"*"
]
},
{
"Sid": "EFSManagement",
"Effect": "Allow",
"Action": [
"elasticfilesystem:CreateFileSystem",
"elasticfilesystem:CreateMountTarget",
"elasticfilesystem:DeleteFileSystem",
"elasticfilesystem:DeleteMountTarget",
"elasticfilesystem:DescribeBackupPolicy",
"elasticfilesystem:DescribeFileSystemPolicy",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:DescribeLifecycleConfiguration",
"elasticfilesystem:DescribeMountTargetSecurityGroups",
"elasticfilesystem:DescribeMountTargets",
"elasticfilesystem:DescribeReplicationConfigurations",
"elasticfilesystem:PutLifecycleConfiguration",
"elasticfilesystem:TagResource"
],
"Resource": [
"*"
]
},
{
"Sid": "IAMManagement",
"Effect": "Allow",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:AttachRolePolicy",
"iam:CreateInstanceProfile",
"iam:CreateRole",
"iam:DeleteInstanceProfile",
"iam:DeleteRole",
"iam:DeleteRolePolicy",
"iam:DetachRolePolicy",
"iam:GetInstanceProfile",
"iam:GetRole",
"iam:PassRole",
"iam:PutRolePolicy",
"iam:RemoveRoleFromInstanceProfile",
"iam:TagRole"
],
"Resource": [
"*"
]
},
{
"Sid": "S3Management",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:DeleteBucketPolicy",
"s3:GetAccelerateConfiguration",
"s3:GetBucketCors",
"s3:GetBucketLogging",
"s3:GetBucketNotification",
"s3:GetBucketObjectLockConfiguration",
"s3:GetBucketOwnershipControls",
"s3:GetBucketPolicy",
"s3:GetBucketPublicAccessBlock",
"s3:GetBucketTagging",
"s3:GetBucketVersioning",
"s3:GetBucketWebsite",
"s3:PutBucketCors",
"s3:PutBucketPolicy",
"s3:PutBucketPublicAccessBlock",
"s3:PutBucketTagging"
],
"Resource": [
"*"
]
},
{
"Sid": "Miscellaneous",
"Effect": "Allow",
"Action": [
"acm:ListCertificates",
"kms:CreateGrant",
"kms:DescribeKey",
"kms:GenerateDataKeyWithoutPlaintext",
"servicequotas:GetServiceQuota"
],
"Resource": [
"*"
]
}
]
}

Glossary

The following resources are required for both anyscale cloud setup and anyscale cloud register approaches:

  • VPC & Subnets: A VPC is a virtual network within the customer AWS account and is logically isolated from other virtual networks in the cloud. A subnet is a range of IP addresses in your VPC to which your AWS resources (such as EC2 instances) can be attached. Anyscale deploys workloads in your account within the VPC and subnets defined as part of setup.
  • Security Group: Security groups help secure the cloud environment by controlling the traffic that is allowed to reach and leave AWS hosted resources. Anyscale creates a security group with network rules to enable access to Anyscale’s suite of components and applications, such as
    • Jupyter Labs
    • Ray Dashboard
    • Ray Serve endpoints
    • Workspace
  • S3 Bucket: Amazon S3 is an object storage service that offers scalability, data availability, security & performance. Anyscale utilizes this S3 bucket for a variety of functions that support the management of Ray clusters and Ray applications, including:
    • General data storage that lasts beyond cluster lifespan
    • Storing model checkpoints for Ray Tune or RLlib
  • IAM Roles:
    • anyscale-iam-role: Anyscale's control plane uses this role to launch Ray clusters in your AWS account. It needs permissions to manage EC2 instances and attach an IAM role.
    • instance-iam-role: The default role attached to Ray clusters. This role can be modified to suit the needs and permissions that your workload requires.
    • Both these roles are created by Anyscale.
  • EFS: Amazon Elastic File System (EFS) is a cloud based, scalable file system for applications and workloads that can be in combination with other AWS services. EFS offers shared storage, is designed for scalable performance, and is secure & compliant with common regulatory standards. EFS is required for Anyscale Workspaces.
  • Note that user defined tags are not supported at this time.