Skip to main content

Deploy and manage Clouds on AWS

Set up and manage Anyscale clouds on AWS.

Prerequisitesโ€‹

  • You have registered a user account on Anyscale and have set up the Anyscale CLI locally.
  • You can launch EC2 instances in the AWS region you plan to use on Anyscale.
  • You have set up AWS credentials locally, by running aws configure (for more details see the AWS configuration guide).
  • AWS credentials correspond to the AWS account that you are using for the Anyscale cloud and it should have permissions to manage all the resources mentioned here.
note

The following resources have low default quota:

  • Number of VPCs per region
  • Number of internet gateways per region Anyscale requires one of these resources per cloud. If you've reached your quota, see how you can raise it.

Anyscale Clouds on AWSโ€‹

Anyscale Clouds can be deployed in AWS accounts using two different networking modes:

  • Direct Networking--Ray clusters will be deployed in public subnets and accessed via Public IP addresses
  • Customer Defined Networking--Ray clusters will be deployed in private subnets and accessed over Private IP addresses

Direct Networking clouds are the most common and developer friendly deployment option for Anyscale. Direct Networking allows you to connect to your Ray clusters, Workspaces, and Grafana dashboards without configuring additional networking infrastructure (usually a VPN) by routing over the Internet using Public IP addresses.

When deployed, an Anyscale Cloud deployed using Direct Networking has a similar architecture to the following:

Direct Networking

Notes:

  • The requirements for resources such as IAM Roles, VPCs, and Security Groups can be found in detailed resource requirements.
  • Direct Networking clouds can be created using either the Anyscale Managed or Customer Defined deployment modes whereas Customer Defined Networking clouds can only be created using the Customer Defined deployment mode using the --private-network parameter.
  • Anyscale Clouds should be deployed into multiple Availability Zones. By default, the Anyscale Managed deployment option creates a subnet in each Availability Zone.
  • The Elastic Container Registry (ECR) is displayed to show a possible integration with ECR to support Custom Docker Environments.

Create an Anyscale Cloudโ€‹

Run the following command to create an Anyscale cloud depending on your deployment option:

caution

To set up your Anyscale cloud with Customer Defined Resources, ensure that the cloud resources in your AWS account satisfy all the requirements detailed resource requirements. Anyscale runs a verification of your resources automatically at the end of the registration.

info

We provide functional verification for both anyscale cloud setup and anyscale cloud register to make sure your cloud is ready. You can add the flag --functional-verify workspace or --functional-verify service in the cloud creation command, it will automatically launch a workspace or service from your cloud and verify your newly created cloud is functional.

anyscale cloud setup \
--name example_cloud_name \
--provider aws \
--region ap-southeast-1 \
--enable-head-node-fault-tolerance
๐ŸOptional flags

--enable-head-node-fault-tolerance: Enables head node fault tolerance in Anyscale Services by configuring an additional MemoryDB instance for the Ray Global Control Store. Note that this flag extends the setup time by approximately 20 minutes.

caution

At the end of the cloud setup, we'll set up Anyscale internal resources which may take up to a minute. If you see the following error messages, your cloud might not be set up correctly. Please contact your Anyscale support:

Failed to get cloud provider metadata.
note

You can make an Anyscale cloud the default cloud by running

 anyscale cloud set-default <cloud-deployment-name-or-cloud-id>

Anyscale will create new clusters in the default cloud if no cloud is specified in the compute configs.

note

By default, Anyscale doesn't set any retention policy for the s3 bucket created by managed cloud setup. If you have any preference or concern, you could set on your own.

Fine-Grained Permission Control With Cloud Register (Advanced)โ€‹

If you would like more fine-grained control over permissions, you can use the Anyscale provided Terraform module to construct a permission set that is more suited to your needs.

You can refer to the AWS module here:

Steps to Create a Cloud

  1. Customize your expected cloud environment by providing necessary values for parameters of the Terraform module
  2. Apply Terraform module to your cloud environment
  3. Run the cloud register command that is returned by the Terraform module
    1. NOTE: You will need to export your Anyscale and cloud credentials before running this command
  4. Rerun the Terraform module, using cloud_id returned from cloud register as an argument
    1. This will scope down permissions to only resources with that specific cloud_id tag
warning

You are responsible for ensuring that the permissions are properly configured before deploying your workload. Please refer to Verify Cloud Resources to validate your permission set. For further assistance, reach out to Anyscale support.

Verify Cloud Resourcesโ€‹

Anyscale provides a CLI command that you can use to verify cloud resources for both options. Anyscale runs verification automatically during cloud creation and you can also run the verification on demand.

You can also trigger functional verification by specifying --functional-verify workspace or --functional-verify service. Anyscale launches a workspace or a service to verify the cloud is functional. You can also trigger both verifications (--functional-verify workspace,service).

$ anyscale cloud verify --name my-cloud-deployment

Authenticating
Loaded Anyscale authentication token from ANYSCALE_CLI_TOKEN.

Output
(anyscale +0.4s) Verifying VPC ...
(anyscale +0.8s) VPC vpc-1234 verification succeeded.
(anyscale +0.8s) Verifying subnets ...
(anyscale +1.2s) Subnets ['subnet-1234', 'subnet-2345', 'subnet-3456', 'subnet-4567'] verification succeeded.
(anyscale +1.2s) Verifying IAM roles ...
(anyscale +2.8s) IAM roles ['arn:aws:iam::999999999999:role/anyscale-iam-role-1234', 'arn:aws:iam::999999999999:role/cld_1234-cluster_node_role'] verification succeeded.
(anyscale +2.8s) Verifying security groups ...
(anyscale +3.0s) Security group ['sg-1234'] verification succeeded.
(anyscale +3.0s) Verifying S3 ...
(anyscale +3.1s) S3 anyscale-production-data-cld-1234 verification succeeded.
(anyscale +3.1s) Verifying EFS ...
(anyscale +3.3s) S3 fs-1234 verification succeeded.
(anyscale +3.3s) Verifying CloudFormation stack ...
(anyscale +3.3s) CloudFormation stack arn:aws:cloudformation:us-west-2:999999999999:stack/cld-1234/1915d0c0-3dd2-11ed-8365-020cb3caf633 verification succeeded.
(anyscale +3.3s) Verification resullt:
vpc: PASSED
subnets: PASSED
iam roles: PASSED
security groups: PASSED
s3: PASSED
efs: PASSED
cloudformation stack: PASSED
(anyscale +3.3s) Start functional verification...
Functional verification for WORKSPACE is about to begin.
It will spin up one m5.xlarge instance for each function and will incur a small amount of cost.
For workspace verification, it takes about 5 minutes.
The instances will be terminated after verification. Do you want to continue? [y/N]: y
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ workspace verification โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ 0:00:02 Workspace created at https://console.anyscale.com/workspaces/expwrk_abc/ses_abc โ”‚
โ”‚ 0:01:22 Workspace is active. โ”‚
โ”‚ 0:00:00 Workspace termination initiated. โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
0:01:24 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” Workspace verification succeeded!

Delete an Anyscale Cloudโ€‹

You can delete the cloud using the following command:

$ anyscale cloud delete --name <cloud-deployment-name>
  • This operation is only supported if the cloud has no non-terminated clusters associated with it, that means there are no running nor pending instances.
    • If the cluster has running status, please terminate the cluster.
    • If the cluster is in an error status, please follow the instructions on the error. Also check in your AWS console to ensure that there are no running instances of this cluster.
  • After the cloud is deleted, you wonโ€™t be able to do any operations on this cloud. This means you wonโ€™t be able to create clusters, jobs, services, nor workspaces in this cloud.
  • After the cloud is deleted, you wonโ€™t be able to access clusters of this cloud. This means you wonโ€™t see the existing clusters associated with this cloud whichever the status of the cluster before deletion.
  • If a Service was deployed in this cloud, deleting the cloud will also remove any Anyscale managed ALB resources and TLS certificates associated with it. In the event that the deletion of ALB resources fails, the cloud deletion process will abort until these resources are properly cleaned up. Once resolved, re-run cloud deletion in order to remove any remaining resources.
caution

Note that for Anyscale Managed Resources, cloud deletion also deletes all the resources associated with the cloud. For Customer Defined Resources, Anyscale doesn't delete any cloud provider resources created by you. Note that Anyscale's access to your account is not removed until you either delete the cross account access role or remove Anyscale from the trust policy.

Example command and output:

$ anyscale cloud delete --name example_cloud_name

Authenticating
Loaded Anyscale authentication token from ANYSCALE_CLI_TOKEN.

Output
If the cloud cld_prcjv8jc9tmbv3q54mc2h7dnl6 is deleted, you will not be able to access existing clusters of this cloud.
For more information, please refer to the documentation https://docs.anyscale.com/user-guide/onboard/clouds#cloud-deletion
Continue? [y/N]: y
(anyscale +6.8s)
Track progress of cloudformation at https://ap-southeast-1.console.aws.amazon.com/cloudformation/home?region=ap-southeast-1#/stacks/stackinfo?stackId=arn:aws:cloudformation:ap-southeast-1:123456:stack/cld-prcjv8jc32r89fniuf23/123456789
โ ธ Deleting cloud resources through cloudformation...(anyscale +49.1s) Cloudformation stack arn:aws:cloudformation:ap-southeast-1:815664363732:stack/cld-prcjv8jc32r89fniuf23/123456789 is deleted.
(anyscale +49.1s)
The S3 bucket (anyscale-production-data-cld-prcjv8jc32r89fniuf23) associated with this cloud still exists.
If you no longer need the data associated with this bucket, please delete it.
(anyscale +49.4s) Deleted cloud with name example_cloud_name.

Edit Customer Defined Resourcesโ€‹

Anyscale provides a CLI command that you can use to edit cloud resources for registered cloud (cloud created with customer defined resources).

The editable resources are: AWS S3 id, AWS EFS id, AWS EFS mount target id.

You can edit with the following example commands:

# Edit AWS S3 id.
$ anyscale cloud edit <cloud-name> --aws-s3-id=<your_new_aws_s3_id>

# Edit EFS id.
$ anyscale cloud edit <cloud-name> --aws-efs-id=<your_new_aws_efs_id>

# Edit EFS mount target id.
$ anyscale cloud edit <cloud-name> --aws-efs-mount-target-ip=<your_new_aws_efs_mount_target_id>

Additional Options:

  • Use cloud-id as an alternative to cloud-name. Example command:
$ anyscale cloud edit --cloud-id=<cloud_id> --aws-s3-id=<your_new_aws_s3_id>
  • Ensure the functionality of your edited cloud with functional verification. You can add the flag --functional-verify workspace or --functional-verify service in the command, it will automatically launch a workspace or service from your cloud and verify your edited cloud is functional. Example command:
$ anyscale cloud edit <cloud-name> --aws-s3-id=<your_new_aws_s3_id> --functional-verify workspace

Important Notes:

  • Before the edit, we'll execute a static cloud verification and request your confirmation. Ensure you review any warnings or errors in the verification results.
  • The edit is only for registered clouds, and does not apply to managed clouds.
  • If there are running workloads utilizing the old resources, you may want to retain them. Please note that this edit will not automatically remove any old resources. If you wish to delete them, you'll need to handle it.

Update IAM Role for Anyscale-Managed Cloudโ€‹

For Anyscale-managed clouds, you can use cloud update command keep your cross-account IAM role up to date. The cross account IAM role looks like anyscale-iam-role-<random chars>.

First make sure to run pip install -U anyscale to upgrade your Anyscale CLI to the latest version. Then update the cloud:

$ anyscale cloud update <cloud_name>

You can also update the cloud using cloud ID:

$ anyscale cloud update --id <cloud_id>

Currently Anyscale maintains 3 inline policies: Anyscale_IAM_Policy_Steady_State, Anyscale_IAM_Policy_Service_Steady_State, Anyscale_IAM_Policy_Initial_Setup. Running cloud update will create or overwrite these policies.

caution

If you edit these policies before, you'll see drifts on your cross account IAM role. We will resolve the drifts by appending the drifted statements to an inline policy Customer_Drifts_Policy. If you remove some permissions or restrict the resources in the policy, you may want to reapply the changes manually after cloud update completes.

FAQโ€‹

  • How do I add the AWS CLI to my nodes in my Ray cluster?

We don't include AWS CLI by default in our Docker images. You can add this yourself by adding the following to the post-build commands of your cluster environment:

apt-get install -y curl unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install

Please note you may need to properly assign permissions (as shown above) to the Anyscale role for the respective parts of the CLI to work correctly.

  • Do you store my credentials anywhere?

No. The credentials never travel across the network to Anyscale. Instead, Anyscale will create an IAM role in your cloud account, grant it permissions to interact with EC2 and IAM in your account and allow Anyscale to assume that role. Anyscale then only stores the IAM role ARN that is created in your account.

  • How do I revoke access?

You can revoke Anyscale's access to your AWS account by deleting the Anyscale IAM role in your account, which will look like anyscale-iam-role or anyscale-iam-role-<8 hex digits>.

  • What AWS regions are supported?

Anyscale supports all commercially available regions. We do not currently support regions outside of the aws partition (that is, China regions and US GovCloud regions).

  • How does cloud resources interact with advanced config?

You can specify the following cloud resources at in advanced config at cluster creation time:

  • Subnets and security groups. You can specify
    • any subset of the security groups registered with the cloud.
    • any subnet registered with the cloud.
  • Instance IAM role. You can specify any IAM role for the cluster to run with. It must have an instance profile with the same name as the role.
caution
  • If you specify the instance IAM role, please make sure it has read/write access to the S3 bucket registered with the cloud.
  • If you register multiple security groups with the Anyscale cloud and want to specify them in the advanced config, you're responsible for specifying a working set of security groups (see the security group section in the resource requirements). Your cluster may end up in an error state if you fail to do so (for example, the head node not able to communicate with worker nodes).

Appendix: Definitionsโ€‹

The following resources are required for both Anyscale Managed and Customer Defined approaches:

  • VPC & Subnets: A VPC is a virtual network within the customer AWS account and is logically isolated from other virtual networks in the cloud. A subnet is a range of IP addresses in your VPC to which your AWS resources (such as EC2 instances) can be attached. Anyscale deploys workloads in your account within the VPC and subnets defined as part of setup.
  • Security Group: Security groups help secure the cloud environment by controlling the traffic that is allowed to reach and leave AWS hosted resources. Anyscale creates a security group with network rules to enable access to Anyscaleโ€™s suite of components and applications, such as
    • Jupyter Labs
    • Ray Dashboard
    • Ray Serve endpoints
    • Workspace
  • S3 Bucket: Amazon S3 is an object storage service that offers scalability, data availability, security & performance. Anyscale utilizes this S3 bucket for a variety of functions that support the management of Ray clusters and Ray applications, including:
    • General data storage that lasts beyond cluster lifespan
    • Storing model checkpoints for Ray Tune or RLlib
  • IAM Roles:
    • anyscale-iam-role: Anyscale's control plane uses this role to launch Ray clusters in your AWS account. It needs permissions to manage EC2 instances and attach an IAM roles.
    • instance-iam-role: The default role attached to Ray clusters. This role can be modified to suit the needs and permissions that your workload requires.
    • Both these roles are created by Anyscale.
  • EFS: Amazon Elastic File System (EFS) is a cloud based, scalable file system for applications and workloads that can be in combination with other AWS services. EFS offers shared storage, is designed for scalable performance, and is secure & compliant with common regulatory standards. EFS is required for Anyscale Workspaces.
  • Note that user defined tags are not supported at this time.

Appendix: Detailed Resource Requirementsโ€‹

Detailed requirements for customer defined resources:

VPCโ€‹

  • The CIDR range must be greater than or equal to /24, but we recommend that it be greater than or equal to /20
  • The VPC has internet egress ability
  • Recommended: Enable a Gateway VPC Endpoint for S3 to reduce cost, and improve performance when pulling Cluster Environments.

Subnetsโ€‹

  • The CIDR range must be greater than or equal to /28, but we recommend that it be greater than or equal to /24
  • The subnet is public with an internet gateway and route table, or the subnet is private when the --private-network flag is set in cloud registry for Customer Defined Resources.
  • Must provide >= 2 subnet
  • No two public subnets should be in the same availability zone

Security Groupโ€‹

  • Inbound rules
    • Allow all inbound TCP traffic on port 443 (can be restricted to your CIDR blocks) for inbound access to submit Ray jobs, the Grafana dashboard, web-based Workspaces, and other functionality.
    • Allow all inbound SSH traffic on port 22 (can be restricted to your CIDR blocks or removed) for Workspace connections using VSCode Desktop.
    • Allow all inbound traffic from the given security-group to allow intra-cluster communication and access to Elastic File System (EFS) for Workspaces.

Inbound rules

  • Outbound rules
    • Allow all outbound traffic for reporting back to users and the Anyscale Control Plane.
    • Allow the outbound traffic for all protocols from the given security-group to allow intra-cluster communication. This is required by certain network device such as EFA.

IAM Role for cross account accessโ€‹

anyscale-iam-role-id

  • Create an IAM role with the following permissions to allow Anyscale to manage resources in your account:

    • Grant access to our control plane: Anyscale IAM role
    • Or manually set the trust relationship as follows:
      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Principal": {
      "AWS": "525325868955"
      },
      "Condition": {} # This is populated with an External ID after `cloud register`
      }
      ]
      }
    note

    The user running anyscale cloud register must have permission to edit this trust relationship. The register command updates the trust relationship to include a cloud-specific External ID.

  • Attach the following IAM policy to this role for standard cluster operation:

    [
    {
    "Sid": "IAM",
    "Effect": "Allow",
    "Action": [
    "iam:PassRole",
    "iam:GetInstanceProfile"
    ],
    "Resource": "*" #can be restricted
    },
    {
    "Sid": "RetrieveGenericAWSResources",
    "Effect": "Allow",
    "Action": [
    "ec2:DescribeAvailabilityZones",
    "ec2:DescribeInstanceTypes",
    "ec2:DescribeRegions",
    "ec2:DescribeAccountAttributes"
    ],
    "Resource": "*"
    },
    {
    "Sid": "DescribeRunningResources",
    "Effect": "Allow",
    "Action": [
    "ec2:DescribeInstances",
    "ec2:DescribeSubnets",
    "ec2:DescribeRouteTables",
    "ec2:DescribeSecurityGroups"
    ],
    "Resource": "*"
    },
    {
    "Sid": "InstanceTagMangement",
    "Effect": "Allow",
    "Action": [
    "ec2:CreateTags",
    "ec2:DeleteTags"
    ],
    "Resource": "*"
    },
    {
    "Sid": "InstanceStart",
    "Effect": "Allow",
    "Action": [
    "ec2:StartInstances",
    "ec2:RunInstances"
    ],
    "Resource": "*"
    },
    {
    "Sid": "InstanceStop",
    "Effect": "Allow",
    "Action": [
    "ec2:TerminateInstances",
    "ec2:StopInstances"
    ],
    "Resource": "*"
    },
    {
    "Sid": "InstanceManagementSpot",
    "Effect": "Allow",
    "Action": [
    "ec2:CancelSpotInstanceRequests",
    "ec2:ModifyImageAttribute",
    "ec2:ModifyInstanceAttribute",
    "ec2:RequestSpotInstances"
    ],
    "Resource": "*"
    },
    {
    "Sid": "ResourceManagementExtended",
    "Effect": "Allow",
    "Action": [
    "ec2:AttachVolume",
    "ec2:CreateVolume",
    "ec2:DescribeVolumes",
    "ec2:AssociateIamInstanceProfile",
    "ec2:DisassociateIamInstanceProfile",
    "ec2:ReplaceIamInstanceProfileAssociation",
    "ec2:CreatePlacementGroup",
    "ec2:AllocateAddress",
    "ec2:ReleaseAddress",
    "ec2:DescribeIamInstanceProfileAssociations",
    "ec2:DescribeInstanceStatus",
    "ec2:DescribePlacementGroups",
    "ec2:DescribePrefixLists",
    "ec2:DescribeReservedInstancesOfferings",
    "ec2:DescribeSpotInstanceRequests",
    "ec2:DescribeSpotPriceHistory"
    ],
    "Resource": "*"
    },
    {
    "Sid": "EFSManagement",
    "Effect": "Allow",
    "Action": [
    "elasticfilesystem:DescribeMountTargets"
    ],
    "Resource": "*"
    },
    {
    "Sid": "CreateSpotServiceLinkedRole",
    "Effect": "Allow",
    "Action": ["iam:CreateServiceLinkedRole", "iam:PutRolePolicy"],
    "Resource": "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot",
    "Condition": {"StringLike": {"iam:AWSServiceName": "spot.amazonaws.com"}},
    } # Only needed if Spot instances have not been used in the account.
    ]
  • To use Services, attach the following policies:

    [
    {
    "Sid": "CFN",
    "Effect": "Allow",
    "Action": [
    "cloudformation:TagResource",
    "cloudformation:UntagResource",
    "cloudformation:CreateStack",
    "cloudformation:UpdateStack",
    "cloudformation:DeleteStack",
    "cloudformation:DescribeStackEvents",
    "cloudformation:DescribeStackResources",
    "cloudformation:DescribeStacks",
    ],
    "Resource": "*",
    },
    {
    "Sid": "ELBDescribe",
    "Effect": "Allow",
    "Action": [
    "elasticloadbalancing:DescribeListeners",
    "elasticloadbalancing:DescribeLoadBalancers",
    "elasticloadbalancing:DescribeLoadBalancerAttributes",
    "elasticloadbalancing:DescribeRules",
    "elasticloadbalancing:DescribeTargetGroups",
    "elasticloadbalancing:DescribeTargetGroupAttributes",
    "elasticloadbalancing:DescribeTargetHealth",
    "elasticloadbalancing:DescribeListenerCertificates"
    ],
    "Resource": "*",
    },
    {
    "Sid": "EC2Describe",
    "Action": [
    "ec2:DescribeVpcs",
    "ec2:DescribeInternetGateways"
    ],
    "Effect": "Allow",
    "Resource": "*",
    },
    {
    "Sid": "ELBCerts",
    "Effect": "Allow",
    "Action": [
    "elasticloadbalancing:AddListenerCertificates",
    "elasticloadbalancing:RemoveListenerCertificates",
    ],
    "Resource": "*",
    },
    {
    "Sid": "ACMList",
    "Effect": "Allow",
    "Action": [
    "acm:ListCertificates"
    ],
    "Resource": "*",
    },
    {
    "Sid": "ACM",
    "Effect": "Allow",
    "Action": [
    "acm:DeleteCertificate",
    "acm:RenewCertificate",
    "acm:RequestCertificate",
    "acm:AddTagsToCertificate",
    "acm:DescribeCertificate",
    "acm:GetCertificate",
    "acm:ListTagsForCertificate",
    ],
    "Resource": "*",
    },
    {
    "Sid": "ELBWrite",
    "Effect": "Allow",
    "Action": [
    "elasticloadbalancing:AddTags",
    "elasticloadbalancing:RemoveTags",
    "elasticloadbalancing:CreateRule",
    "elasticloadbalancing:ModifyRule",
    "elasticloadbalancing:DeleteRule",
    "elasticloadbalancing:SetRulePriorities",
    "elasticloadbalancing:CreateListener",
    "elasticloadbalancing:ModifyListener",
    "elasticloadbalancing:DeleteListener",
    "elasticloadbalancing:CreateLoadBalancer",
    "elasticloadbalancing:DeleteLoadBalancer",
    "elasticloadbalancing:ModifyLoadBalancerAttributes",
    "elasticloadbalancing:CreateTargetGroup",
    "elasticloadbalancing:ModifyTargetGroup",
    "elasticloadbalancing:DeleteTargetGroup",
    "elasticloadbalancing:ModifyTargetGroupAttributes",
    "elasticloadbalancing:RegisterTargets",
    "elasticloadbalancing:DeregisterTargets",
    "elasticloadbalancing:SetIpAddressType",
    "elasticloadbalancing:SetSecurityGroups",
    "elasticloadbalancing:SetSubnets",
    ],
    "Resource": "*",
    "Condition": {
    "StringEquals": {"aws:CalledViaFirst": "cloudformation.amazonaws.com"}
    },
    },
    {
    "Sid": "LinkELBService",
    "Effect": "Allow",
    "Action": "iam:CreateServiceLinkedRole",
    "Resource": "*",
    "Condition": {
    "StringLike": {
    "iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
    }
    },
    },
    {
    "Sid": "IAMPolicies",
    "Effect": "Allow",
    "Action": [
    "iam:AttachRolePolicy",
    "iam:PutRolePolicy",
    "iam:UpdateRoleDescription",
    "iam:DeleteServiceLinkedRole",
    "iam:GetServiceLinkedRoleDeletionStatus",
    ],
    "Resource": "arn:aws:iam::*:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing",
    },
    ]
  • If required, limit permissions by constraining actions to resources with the anyscale-cloud-id tag. Use the following policies by replacing cld_ID with your own cloud id (create a cloud first if don't have one) and removing the InstanceStop and InstanceStart statements from before.

    [
    {
    "Sid": "DenyTaggingOnOtherInstances",
    "Effect": "Deny",
    "Action": [
    "ec2:DeleteTags",
    "ec2:CreateTags"
    ],
    "Resource": "arn:aws:ec2:*:*:instance/*",
    "Condition": {
    "StringNotEquals": {
    "aws:ResourceTag/anyscale-cloud-id": "cld_ID",
    "ec2:CreateAction": [
    "RunInstances",
    "StartInstances"
    ]
    }
    }
    },
    {
    "Sid": "RestrictedInstanceStop",
    "Effect": "Allow",
    "Action": [
    "ec2:TerminateInstances",
    "ec2:StopInstances"
    ],
    "Resource": "*",
    "Condition": {
    "StringEquals": {
    "aws:ResourceTag/anyscale-cloud-id": "cld_ID"
    }
    }
    },
    {
    "Sid": "RestrictedInstanceStart",
    "Effect": "Allow",
    "Action": [
    "ec2:StartInstances",
    "ec2:RunInstances"
    ],
    "Resource": "*",
    "Condition": {
    "StringEquals": {
    "aws:RequestTag/anyscale-cloud-id": "cld_ID"
    },
    "ForAnyValue:StringEquals": {
    "aws:TagKeys": [
    "anyscale-cloud-id"
    ]
    }
    }
    },
    {
    "Sid": "AllowRunInstancesForUntaggedResources",
    "Effect": "Allow",
    "Action": "ec2:RunInstances",
    "Resource": [
    "arn:aws:ec2:*::image/*",
    "arn:aws:ec2:*::snapshot/*",
    "arn:aws:ec2:*:*:subnet/*",
    "arn:aws:ec2:*:*:network-interface/*",
    "arn:aws:ec2:*:*:security-group/*",
    "arn:aws:ec2:*:*:key-pair/*",
    "arn:aws:ec2:*:*:volume/*"
    ]
    }
    ]

IAM Role for Ray Cluster Nodesโ€‹

instance-iam-role-id

  • Create an IAM Role as the default role for Ray clusters managed by Anyscale. This role should have a policy for Read and Write access to the S3 Bucket at a minimum.
  • You can set up the role to give trust to AWS service EC2.
  • Create an Instance Profile with the same name as the Role (NOTE: This is automatically created if you create the Role through the AWS Console and specify the EC2 service). Anyscale IAM role
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::bucket-name"]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": ["arn:aws:s3:::bucket-name/*"]
}
]
}
note

If you wish to use the instance IAM role to work with other AWS services, additional permissions may be needed on the instance IAM role. Example common services that would need additional permissions:

To determine the IAM role on a running Anyscale Cluster, run:

aws sts get-caller-identity

Using an existing IAM Role for Cluster Nodesโ€‹

To utilize an existing IAM Role with a Ray Cluster managed by Anyscale, follow these steps:

  1. Prepare the IAM Role: Ensure that the IAM Role is set up as an IAM Instance Profile. This allows EC2 instances to assume the IAM Role.
  2. Create a Compute Config: Create a new Compute Config from the Anyscale Console or from the CLI.
  3. Specify the Instance Profile ARN: Add the Instance Profile ARN in the Advanced Configuration section.

The following is an example JSON configuration to set the IAM Instance Profile

{
"IamInstanceProfile": { "Arn" : "<IAM Instance Profile ARN>" }
}

It should look like:

S3โ€‹

  • Create the bucket with permissions granted to the instance IAM role and Anyscale IAM role. Example permissions:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "allow-role-access",
    "Effect": "Allow",
    "Principal": {
    "AWS": [
    "arn:aws:iam::<account_id>:role/<your-anyscale-iam-role-name>",
    "arn:aws:iam::<account_id>:role/<your-instance-iam-role-name>"
    ]
    },
    "Action": [
    "s3:PutObject",
    "s3:DeleteObject",
    "s3:GetObject",
    "s3:ListBucket"
    ],
    "Resource": [
    "arn:aws:s3:::<your-bucket-name>/*",
    "arn:aws:s3:::<your-bucket-name>"
    ]
    }
    ]
    }
  • In addition, if you plan on using the Anyscale UI, add the following CORS rules to your bucket. (This allows the Anyscale UI to read and display logs. The data doesnโ€™t go through Anyscale control plane.)

    [
    {
    "AllowedHeaders": [
    "*"
    ],
    "AllowedMethods": [
    "GET", "PUT", "POST", "HEAD", "DELETE"
    ],
    "AllowedOrigins": [
    "https://*.anyscale.com"
    ],
    "ExposeHeaders": []
    }
    ]
  • If you use a KMS managed key for encryption on this bucket (SSE-KMS mode), both IAM roles need the kms:GenerateDataKey & kms:Decrypt permissions for the key. The default encryption configuration (SSE-S3) does not require additional permissions.

warning

Anyscale does not assume responsibility for data loss. To mitigate this risk, it is advisable to implement S3 bucket versioning and configure lifecycle management policies for data retention (AWS S3 Documentation: Versioning, S3 Object Lifecycle)

EFSโ€‹

  • Create a mount target with the subnets and the security group you provided above for this cloud.
  • Provide the security groups for this cloud. EFS security group

MemoryDBโ€‹

  • Instance type: the smallest available instance type db.t4g.small is sufficient.
  • The MemoryDB cluster should be in the same VPC and subnets configured for the cloud, and associated with the security group configured for the cloud.
  • The parameter group associated with the MemoryDB cluster has the maxmemory-policy set to allkeys-lru.
  • Each shard of the cluster should have at least 1 replica (2 nodes total) for high availability.
  • The cluster should have TLS enabled.
note

If you encounter any issue during the cloud registration step, please validate the AWS resources created as noted above. You can revalidate the cloud configuration by running anyscale cloud verify mentioned above.

Appendix: Minimal IAM Permissions for cloud commandsโ€‹

This section provides the minimal IAM permissions required for the Anyscale CLI to perform cloud operations. As an AWS administrator, follow these steps to apply the policy:

  1. Create a new IAM policy or edit an existing policy to include the following permissions.

  2. Attach the policy to the IAM user or role that will be used to run the Anyscale CLI.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudformationManagement",
"Effect": "Allow",
"Action": [
"cloudformation:CreateChangeSet",
"cloudformation:CreateStack",
"cloudformation:DeleteStack",
"cloudformation:DescribeStackEvents",
"cloudformation:DescribeStacks",
"cloudformation:ListStacks"
],
"Resource": [
"*"
]
},
{
"Sid": "EC2Management",
"Effect": "Allow",
"Action": [
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:CreateVpcEndpoint",
"ec2:DeleteInternetGateway",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteVpc",
"ec2:DeleteVpcEndpoints",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInternetGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribeRegions",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroupRules",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:ModifySubnetAttribute",
"ec2:ModifyVpcAttribute",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress"
],
"Resource": [
"*"
]
},
{
"Sid": "EFSManagement",
"Effect": "Allow",
"Action": [
"elasticfilesystem:CreateFileSystem",
"elasticfilesystem:CreateMountTarget",
"elasticfilesystem:DeleteFileSystem",
"elasticfilesystem:DeleteMountTarget",
"elasticfilesystem:DescribeBackupPolicy",
"elasticfilesystem:DescribeFileSystemPolicy",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:DescribeLifecycleConfiguration",
"elasticfilesystem:DescribeMountTargetSecurityGroups",
"elasticfilesystem:DescribeMountTargets",
"elasticfilesystem:DescribeReplicationConfigurations",
"elasticfilesystem:PutLifecycleConfiguration",
"elasticfilesystem:TagResource"
],
"Resource": [
"*"
]
},
{
"Sid": "IAMManagement",
"Effect": "Allow",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:AttachRolePolicy",
"iam:CreateInstanceProfile",
"iam:CreateRole",
"iam:DeleteInstanceProfile",
"iam:DeleteRole",
"iam:DeleteRolePolicy",
"iam:DetachRolePolicy",
"iam:GetInstanceProfile",
"iam:GetRole",
"iam:PassRole",
"iam:PutRolePolicy",
"iam:RemoveRoleFromInstanceProfile",
"iam:TagRole"
],
"Resource": [
"*"
]
},
{
"Sid": "S3Management",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:DeleteBucketPolicy",
"s3:GetAccelerateConfiguration",
"s3:GetBucketCors",
"s3:GetBucketLogging",
"s3:GetBucketNotification",
"s3:GetBucketObjectLockConfiguration",
"s3:GetBucketOwnershipControls",
"s3:GetBucketPolicy",
"s3:GetBucketPublicAccessBlock",
"s3:GetBucketTagging",
"s3:GetBucketVersioning",
"s3:GetBucketWebsite",
"s3:PutBucketCors",
"s3:PutBucketPolicy",
"s3:PutBucketPublicAccessBlock",
"s3:PutBucketTagging"
],
"Resource": [
"*"
]
},
{
"Sid": "Miscellaneous",
"Effect": "Allow",
"Action": [
"acm:ListCertificates",
"kms:CreateGrant",
"kms:DescribeKey",
"kms:GenerateDataKeyWithoutPlaintext",
"servicequotas:GetServiceQuota"
],
"Resource": [
"*"
]
}
]
}