Deploy an Anyscale Cloud on AWS
Before you run Ray workloads on Anyscale, an Anyscale organization member must deploy an Anyscale Cloud within a Amazon Web Services (AWS) environment. This integration enables Anyscale to manage resources like compute instances and storage directly in an AWS account.
Prerequisites
- Register a user account on Anyscale at
console.anyscale.com
and set up the Anyscale CLI locally. - Verify your ability to launch EC2 instances in the AWS region you plan to use on Anyscale. Anyscale supports all commercially available regions. Anyscale doesn't support regions outside of the
aws
partition, that is, China regions and US GovCloud regions. - Set up AWS credentials locally, by running
aws configure
(for more details see the AWS configuration guide). - Set up AWS credentials to correspond to the AWS account that you're using for the Anyscale Cloud and it should have permissions to manage all of these resources. Ensure that you have minimal IAM permissions for cloud operations.
The following resources have low default quota:
- Number of VPCs per region
- Number of internet gateways per region
Anyscale requires one of these resources per cloud. If you've reached your quota, see how you can raise it.
1. Install the Anyscale CLI
- Run the following command to install the Anyscale CLI and Python client package:
pip install -U anyscale
- To authenticate your credentials, run the following command, which fetches and updates the token that confirms your identity in the
~/.anyscale/credentials.json
file.
anyscale login
If necessary, log in to the Anyscale console to complete authentication.
2. Choose an Anyscale Cloud deployment method
Deploying an Anyscale Cloud integrates Anyscale's capabilities into your AWS account to leverage its compute, storage, and networking resources for scalable, distributed computing.
You can use one of two different deployment methods that use the Anyscale CLI for cloud configuration. Choose a method based on your organization's requirements.
anyscale cloud setup
- Use for rapid deployment and a straightforward, low-maintenance solution; deploy in public subnets and access over public IP addresses without setting up additional networking infrastructure.anyscale cloud register
- Suitable for teams with advanced cloud expertise, seeking enhanced security, custom private networking, and specific compliance needs.
3. Create an Anyscale Cloud
Based on the deployment method selected from the previous step, create an Anyscale Cloud with the following instructions.
- anyscale cloud setup (auto)
- anyscale cloud register (custom)
For the anyscale cloud setup
deployment method, Anyscale automatically creates and configures the necessary resources within your AWS account. You deploy Ray clusters in public subnets and access them using public IP addresses without needing to set up additional networking infrastructure like VPNs.
Note: To manually customize resources, use the (Custom) cloud register method instead.
An Anyscale Cloud deployed using anyscale cloud setup
uses direct networking with an architecture similar to the following:
Deploy a new Cloud
Run the following command to deploy a new Cloud:
anyscale cloud setup \
--name example_cloud_name \
--provider aws \
--region ap-southeast-1 \
--enable-head-node-fault-tolerance
--enable-head-node-fault-tolerance
: Enables head node fault tolerance in Anyscale Services by configuring an additional MemoryDB instance for the Ray Global Control Store. Note that this flag extends the setup time by approximately 20 minutes.
By default, Anyscale doesn't set any retention policy for the S3 bucket created by managed cloud setup. If you have any preference or concern, you could set on your own.
For the anyscale cloud register
deployment method, you are responsible for creating and configuring AWS resources needed to integrate with Anyscale. You define subnets to deploy Ray clusters and access them using public or private IP addresses.
This custom-defined networking requires you to configure the network paths between users, clusters, and the Anyscale Control Plane. Connectivity and network performance between users and clusters depends on your setup.
An Anyscale Cloud deployed using customer defined networking has a similar architecture to the following:
Notes:
- Anyscale Clouds should be deployed into multiple Availability Zones. By default, the
anyscale cloud setup
deployment option creates a subnet in each Availability Zone. - The Elastic Container Registry (ECR) is displayed to show a possible integration with ECR to support Custom Docker Environments.
Step 1: Choose method for creating cloud resources
You have three methods for creating custom AWS infrastructure resources to connect to Anyscale:
(Recommended) Anyscale provided Terraform module
Use this predefined set of configurations developed by Anyscale, which simplifies the setup process. Applying this module to your cloud environment configures the required resources in your AWS account. For details and instructions on using this module, see the following resources:
Create your own Terraform module
You can create custom Terraform modules to tailor cloud resources and configurations to meet compliance requirements.
Create resources manually in the AWS Management Console
You can manually create resources in the AWS Management Console, which offers maximum customization but can be prone to manual errors.
Step 2: Create cloud resources
Cloud resources created in an AWS account must meet a list of minimum requirements to work with Anyscale.
Following the Terraform Modules for Anyscale Cloud Foundations on AWS satisfies these requirements by default. For all other methods, perform the following steps:
VPC
- Anyscale recommends a CIDR range of
/24
or more. - The VPC has internet egress ability.
- Recommended: Enable a Gateway VPC Endpoint for S3 to reduce cost, and improve performance when pulling container images.
Subnets
- Anyscale recommends a CIDR range of
/24
or more. - The subnet is public with an internet gateway and route table, or the subnet is private when the
--private-network
flag is set in cloud registry for Customer Defined Resources. - Must provide >= 2 subnets.
- No two public subnets should be in the same availability zone.
Security group
Inbound rules
- Allow all inbound TCP traffic on port 443 (can be restricted to your CIDR blocks) for inbound access to submit Ray jobs, the Grafana dashboard, web-based Workspaces, and other functionality.
- Allow all inbound SSH traffic on port 22 (can be restricted to your CIDR blocks or removed) for Workspace connections using VS Code Desktop.
- Allow all inbound traffic from the given security-group to allow intra-cluster communication and access to Elastic File System (EFS) for Workspaces.
Outbound rules
- Allow all outbound traffic for reporting back to users and the Anyscale Control Plane.
- Allow the outbound traffic for all protocols from the given security-group to allow intra-cluster communication. This is required by certain network device such as EFA.
IAM Role for cross account access
anyscale-iam-role-id
Create an IAM role with the following permissions to allow Anyscale to manage resources in your account:
- Grant access to our control plane:
- Or manually set the trust relationship as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"AWS": "525325868955"
},
"Condition": {} # This is populated with an External ID after `cloud register`
}
]
}
The user running anyscale cloud register
must have permission to edit this trust relationship. The register
command updates the trust relationship to include a cloud-specific External ID.
- Attach the following IAM policy to this role for standard cluster operation:
[
{
"Sid": "IAM",
"Effect": "Allow",
"Action": [
"iam:PassRole",
"iam:GetInstanceProfile"
],
"Resource": "*" #can be restricted
},
{
"Sid": "RetrieveGenericAWSResources",
"Effect": "Allow",
"Action": [
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstanceTypes",
"ec2:DescribeRegions",
"ec2:DescribeAccountAttributes"
],
"Resource": "*"
},
{
"Sid": "DescribeRunningResources",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeSubnets",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups"
],
"Resource": "*"
},
{
"Sid": "InstanceTagMangement",
"Effect": "Allow",
"Action": [
"ec2:CreateTags",
"ec2:DeleteTags"
],
"Resource": "*"
},
{
"Sid": "InstanceStart",
"Effect": "Allow",
"Action": [
"ec2:StartInstances",
"ec2:RunInstances"
],
"Resource": "*"
},
{
"Sid": "InstanceStop",
"Effect": "Allow",
"Action": [
"ec2:TerminateInstances",
"ec2:StopInstances"
],
"Resource": "*"
},
{
"Sid": "InstanceManagementSpot",
"Effect": "Allow",
"Action": [
"ec2:CancelSpotInstanceRequests",
"ec2:ModifyImageAttribute",
"ec2:ModifyInstanceAttribute",
"ec2:RequestSpotInstances"
],
"Resource": "*"
},
{
"Sid": "ResourceManagementExtended",
"Effect": "Allow",
"Action": [
"ec2:AttachVolume",
"ec2:CreateVolume",
"ec2:DescribeVolumes",
"ec2:AssociateIamInstanceProfile",
"ec2:DisassociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation",
"ec2:CreatePlacementGroup",
"ec2:AllocateAddress",
"ec2:ReleaseAddress",
"ec2:DescribeIamInstanceProfileAssociations",
"ec2:DescribeInstanceStatus",
"ec2:DescribePlacementGroups",
"ec2:DescribePrefixLists",
"ec2:DescribeReservedInstancesOfferings",
"ec2:DescribeSpotInstanceRequests",
"ec2:DescribeSpotPriceHistory"
],
"Resource": "*"
},
{
"Sid": "EFSManagement",
"Effect": "Allow",
"Action": [
"elasticfilesystem:DescribeMountTargets"
],
"Resource": "*"
},
{
"Sid": "CreateSpotServiceLinkedRole",
"Effect": "Allow",
"Action": ["iam:CreateServiceLinkedRole", "iam:PutRolePolicy"],
"Resource": "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot",
"Condition": {"StringLike": {"iam:AWSServiceName": "spot.amazonaws.com"}},
} # Only needed if Spot instances have not been used in the account.
]
- To use Services, attach the following policies:
[
{
"Sid": "CFN",
"Effect": "Allow",
"Action": [
"cloudformation:TagResource",
"cloudformation:UntagResource",
"cloudformation:CreateStack",
"cloudformation:UpdateStack",
"cloudformation:DeleteStack",
"cloudformation:DescribeStackEvents",
"cloudformation:DescribeStackResources",
"cloudformation:DescribeStacks",
"cloudformation:GetTemplate",
],
"Resource": "*",
},
{
"Sid": "ELBDescribe",
"Effect": "Allow",
"Action": [
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:DescribeRules",
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:DescribeTargetGroupAttributes",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:DescribeListenerCertificates"
],
"Resource": "*",
},
{
"Sid": "EC2Describe",
"Action": [
"ec2:DescribeVpcs",
"ec2:DescribeInternetGateways"
],
"Effect": "Allow",
"Resource": "*",
},
{
"Sid": "ELBCerts",
"Effect": "Allow",
"Action": [
"elasticloadbalancing:AddListenerCertificates",
"elasticloadbalancing:RemoveListenerCertificates",
],
"Resource": "*",
},
{
"Sid": "ACMList",
"Effect": "Allow",
"Action": [
"acm:ListCertificates"
],
"Resource": "*",
},
{
"Sid": "ACM",
"Effect": "Allow",
"Action": [
"acm:DeleteCertificate",
"acm:RenewCertificate",
"acm:RequestCertificate",
"acm:AddTagsToCertificate",
"acm:DescribeCertificate",
"acm:GetCertificate",
"acm:ListTagsForCertificate",
],
"Resource": "*",
},
{
"Sid": "ELBWrite",
"Effect": "Allow",
"Action": [
"elasticloadbalancing:AddTags",
"elasticloadbalancing:RemoveTags",
"elasticloadbalancing:CreateRule",
"elasticloadbalancing:ModifyRule",
"elasticloadbalancing:DeleteRule",
"elasticloadbalancing:SetRulePriorities",
"elasticloadbalancing:CreateListener",
"elasticloadbalancing:ModifyListener",
"elasticloadbalancing:DeleteListener",
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:CreateTargetGroup",
"elasticloadbalancing:ModifyTargetGroup",
"elasticloadbalancing:DeleteTargetGroup",
"elasticloadbalancing:ModifyTargetGroupAttributes",
"elasticloadbalancing:RegisterTargets",
"elasticloadbalancing:DeregisterTargets",
"elasticloadbalancing:SetIpAddressType",
"elasticloadbalancing:SetSecurityGroups",
"elasticloadbalancing:SetSubnets",
],
"Resource": "*",
"Condition": {
"StringEquals": {"aws:CalledViaFirst": "cloudformation.amazonaws.com"}
},
},
{
"Sid": "LinkELBService",
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "*",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
}
},
},
{
"Sid": "IAMPolicies",
"Effect": "Allow",
"Action": [
"iam:AttachRolePolicy",
"iam:PutRolePolicy",
"iam:UpdateRoleDescription",
"iam:DeleteServiceLinkedRole",
"iam:GetServiceLinkedRoleDeletionStatus",
],
"Resource": "arn:aws:iam::*:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing",
},
]
- If required, limit permissions by constraining actions to resources with the
anyscale-cloud-id
tag. Use the following policies by replacingcld_ID
with your own cloud id (create a cloud first if don't have one) and removing theInstanceStop
andInstanceStart
statements from before.
[
{
"Sid": "DenyTaggingOnOtherInstances",
"Effect": "Deny",
"Action": [
"ec2:DeleteTags",
"ec2:CreateTags"
],
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"StringNotEquals": {
"aws:ResourceTag/anyscale-cloud-id": "cld_ID",
"ec2:CreateAction": [
"RunInstances",
"StartInstances"
]
}
}
},
{
"Sid": "RestrictedInstanceStop",
"Effect": "Allow",
"Action": [
"ec2:TerminateInstances",
"ec2:StopInstances"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/anyscale-cloud-id": "cld_ID"
}
}
},
{
"Sid": "RestrictedInstanceStart",
"Effect": "Allow",
"Action": [
"ec2:StartInstances",
"ec2:RunInstances"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:RequestTag/anyscale-cloud-id": "cld_ID"
},
"ForAnyValue:StringEquals": {
"aws:TagKeys": [
"anyscale-cloud-id"
]
}
}
},
{
"Sid": "AllowRunInstancesForUntaggedResources",
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*::image/*",
"arn:aws:ec2:*::snapshot/*",
"arn:aws:ec2:*:*:subnet/*",
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:key-pair/*",
"arn:aws:ec2:*:*:volume/*"
]
}
]
IAM role for Ray cluster nodes
instance-iam-role-id
- Create an IAM Role as the default role for Ray clusters managed by Anyscale. This role should have a policy for Read and Write access to the S3 Bucket at a minimum.
- You can set up the role to give trust to AWS service EC2.
Create an Instance Profile with the same name as the Role (NOTE: This is automatically created if you create the Role through the AWS Console and specify the EC2 service).
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::bucket-name"]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": ["arn:aws:s3:::bucket-name/*"]
}
]
}noteIf you wish to use the instance IAM role to work with other AWS services, additional permissions may be needed on the instance IAM role. Example common services that would need additional permissions:
To determine the IAM role on a running Anyscale Cluster, run:
aws sts get-caller-identity
Using an existing IAM role for cluster nodes
To utilize an existing IAM Role with a Ray Cluster managed by Anyscale, follow these steps:
- Prepare the IAM role: Ensure that the IAM Role is set up as an IAM Instance Profile. This allows EC2 instances to assume the IAM Role.
- Create a compute config: Create a new Compute config from the Anyscale Console or from the CLI.
- Specify the Instance Profile ARN: Add the Instance Profile ARN in the Advanced Configuration section.
- Anyscale Console
- CLI
The following is an example JSON configuration to set the IAM Instance Profile:
{
"IamInstanceProfile": { "Arn": "<IAM Instance Profile ARN>" }
}It should look like:
The following is an example YAML definition that sets the IAM Instance Profile:
cloud: my-cloud # You may specify `cloud_id` instead
allowed_azs:
- us-west-2a
head_node:
instance_type: m5.8xlarge
---
aws:
IamInstanceProfile:
Arn: arn:aws:iam::0123456789012:instance-profile/<IAM Instance Profile Name>
S3
- Create the bucket with permissions granted to the instance IAM role and Anyscale IAM role. Example permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "allow-role-access",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<account_id>:role/<your-anyscale-iam-role-name>",
"arn:aws:iam::<account_id>:role/<your-instance-iam-role-name>"
]
},
"Action": [
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<your-bucket-name>/*",
"arn:aws:s3:::<your-bucket-name>"
]
}
]
}
- In addition, if you plan on using the Anyscale UI, add the following CORS rules to your bucket. (This allows the Anyscale UI to read and display logs. The data doesn’t go through Anyscale control plane.)
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET", "PUT", "POST", "HEAD", "DELETE"
],
"AllowedOrigins": [
"https://*.anyscale.com"
],
"ExposeHeaders": []
}
]
- If you use a KMS managed key for encryption on this bucket (
SSE-KMS
mode), both IAM roles need thekms:GenerateDataKey
&kms:Decrypt
permissions for the key. The default encryption configuration (SSE-S3
) does not require additional permissions.
Anyscale does not assume responsibility for data loss. To mitigate this risk, it is advisable to implement S3 bucket versioning and configure lifecycle management policies for data retention (AWS S3 Documentation: Versioning, S3 Object Lifecycle)
-
The bucket is named in this format
anyscale-production-data-{cloud_id}
but this can be customized by the user if they choose to bring their own bucket. Within the bucket, Anyscale managed data is stored in the{organization_id}/
folder. For cloud-specific managed data, Anyscale further groups together the data into an{organization_id}/{cloud_id}
folder. There are still some legacy folders where Anyscale managed data is stored detailed below. -
Avoid modifying or deleting the data that Anyscale manages and stores on behalf of the customer. If the data is deleted, the Anyscale platform will have a degraded experience for features such as log viewing, log downloading, and others.
-
Anyscale stores logs in
{organization_id}/{cloud_id}/logs
and/logs
folders. The/logs
folder is a legacy location and Anyscale plans to migrate all logs to the{organization_id}/{cloud_id}/logs
folder. Anyscale stores all job logs, Web Terminal command logs, and Ray logs in that folder. For performance reasons, Anyscale stores logs in various formats for different use cases. For example, when streaming logs, Anyscale may produce many small files to allow for fresher data to be downloaded by the user.
EFS
- Create a mount target with the subnets and the security group you provided above for this cloud.
- Provide the security groups for this cloud.
MemoryDB
- Instance type: the smallest available instance type
db.t4g.small
is sufficient. - The MemoryDB cluster should be in the same VPC and subnets configured for the cloud, and associated with the security group configured for the cloud.
- The parameter group associated with the MemoryDB cluster has the maxmemory-policy set to allkeys-lru.
- Each shard of the cluster should have at least 1 replica (2 nodes total) for high availability.
- The cluster should have TLS enabled.
If you encounter any issue during the cloud registration step, validate the AWS resources created as noted above. You can revalidate the cloud configuration by running anyscale cloud verify
mentioned above.
Steps to create a cloud
- Customize your expected cloud environment by providing necessary values for parameters of the Terraform module.
- Apply Terraform module to your cloud environment.
- Run the
cloud register
command that is returned by the Terraform module. Note: Export your Anyscale and cloud credentials before running this command. - Rerun the Terraform module, using
cloud_id
returned fromcloud register
as an argument. This approach scopes down permissions to only resources with that specificcloud_id
tag.
Register Anyscale Cloud on AWS
After setting up the necessary resources, use the following command to register your Anyscale Cloud on AWS:
anyscale cloud register \
--private-network \ # for Customer defined networking
--provider aws \
--name example_cloud_name \
--vpc-id vpc-00000000000000000 \
--subnet-ids subnet-00000000000000000,subnet-00000000000000000,subnet-0000000000000000 \
--file-storage-id fs-00000000000000000 \
--anyscale-iam-role-id arn:aws:iam::000000000000:role/anyscale-iam-role-00000000 \
--instance-iam-role-id arn:aws:iam::000000000000:role/cluster_node_role-00000000 \
--security-group-ids sg-00000000000000000 \
--cloud-storage-bucket-name anyscale-production-data-cld-00000000000000000 \
--region us-west-2 \
--functional-verify workspace # to launch a workspace after the cloud is created
--memorydb-cluster-id memorydb-name
--memorydb-cluster-id
: Enables head node fault tolerance in Anyscale services. See MemoryDB in the resource requirements for configuration requirements.
--private-network
: Enables private networking on private subnets and IP addresses.
--functional-verify workspace
: Launches a test workspace to verify validity of resources.
--functional-verify service
: Launches a test service to verify validity of resources.
4. Verify cloud resources
Anyscale provides a CLI command that to verify cloud resources for both options. Anyscale runs verification automatically during cloud creation and you can also run the verification on demand.
Trigger functional verification by specifying --functional-verify workspace
or --functional-verify service
.
Anyscale launches a workspace or a service to verify the cloud is functional.
You can also trigger both verifications (--functional-verify workspace,service
).
$ anyscale cloud verify --name my-cloud-deployment
Authenticating
Loaded Anyscale authentication token from ANYSCALE_CLI_TOKEN.
Output
(anyscale +0.4s) Verifying VPC ...
(anyscale +0.8s) VPC vpc-1234 verification succeeded.
(anyscale +0.8s) Verifying subnets ...
(anyscale +1.2s) Subnets ['subnet-1234', 'subnet-2345', 'subnet-3456', 'subnet-4567'] verification succeeded.
(anyscale +1.2s) Verifying IAM roles ...
(anyscale +2.8s) IAM roles ['arn:aws:iam::999999999999:role/anyscale-iam-role-1234', 'arn:aws:iam::999999999999:role/cld_1234-cluster_node_role'] verification succeeded.
(anyscale +2.8s) Verifying security groups ...
(anyscale +3.0s) Security group ['sg-1234'] verification succeeded.
(anyscale +3.0s) Verifying S3 ...
(anyscale +3.1s) S3 anyscale-production-data-cld-1234 verification succeeded.
(anyscale +3.1s) Verifying EFS ...
(anyscale +3.3s) S3 fs-1234 verification succeeded.
(anyscale +3.3s) Verifying CloudFormation stack ...
(anyscale +3.3s) CloudFormation stack arn:aws:cloudformation:us-west-2:999999999999:stack/cld-1234/1915d0c0-3dd2-11ed-8365-020cb3caf633 verification succeeded.
(anyscale +3.3s) Verification resullt:
vpc: PASSED
subnets: PASSED
iam roles: PASSED
security groups: PASSED
s3: PASSED
efs: PASSED
cloudformation stack: PASSED
(anyscale +3.3s) Start functional verification...
Functional verification for WORKSPACE is about to begin.
It will spin up one m5.xlarge instance for each function and will incur a small amount of cost.
For workspace verification, it takes about 5 minutes.
The instances will be terminated after verification. Do you want to continue? [y/N]: y
╭───────────────────────────────────────────────────────────────────────────── workspace verification ─────────────────────────────────────────────────────────────────────────────╮
│ 0:00:02 Workspace created at https://console.anyscale.com/workspaces/expwrk_abc/ses_abc │
│ 0:01:22 Workspace is active. │
│ 0:00:00 Workspace termination initiated. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
0:01:24 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Workspace verification succeeded!
Credential handling
Credentials never travel across the network to Anyscale and Anyscale doesn't store your credentials anywhere. Instead, Anyscale creates an IAM role in your cloud account, grants it permissions to interact with EC2 and IAM in your account, and allows Anyscale to assume that role. Anyscale then only stores the IAM role ARN that you created in your account. See Secret management for instructions.
- If you specify the instance IAM role, make sure it has read/write access to the S3 bucket registered with the cloud.
- If you register multiple security groups with the Anyscale cloud and want to specify them in the advanced config, you're responsible for specifying a working set of security groups (see the security group section in the resource requirements. Your cluster may end up in an error state if you fail to do so. For example, the head node may not able to communicate with worker nodes.
Minimal IAM Permissions for cloud commands
This section provides the minimal IAM permissions required for the Anyscale CLI to perform cloud operations. As an AWS administrator, follow these steps to apply the policy:
-
Create a new IAM policy or edit an existing policy to include the following permissions.
- See AWS documentation on Creating IAM Policies for more information.
-
Attach the policy to the IAM user or role used to run the Anyscale CLI.
- See AWS documentation on Attaching IAM Policies for more information.
- anyscale cloud setup
- anyscale cloud register
- anyscale cloud verify
- anyscale cloud edit
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudformationManagement",
"Effect": "Allow",
"Action": [
"cloudformation:CreateChangeSet",
"cloudformation:CreateStack",
"cloudformation:DeleteStack",
"cloudformation:DescribeStackEvents",
"cloudformation:DescribeStacks",
"cloudformation:ListStacks"
],
"Resource": [
"*"
]
},
{
"Sid": "EC2Management",
"Effect": "Allow",
"Action": [
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:CreateVpcEndpoint",
"ec2:DeleteInternetGateway",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteVpc",
"ec2:DeleteVpcEndpoints",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInternetGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribeRegions",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroupRules",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:ModifySubnetAttribute",
"ec2:ModifyVpcAttribute",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress"
],
"Resource": [
"*"
]
},
{
"Sid": "EFSManagement",
"Effect": "Allow",
"Action": [
"elasticfilesystem:CreateFileSystem",
"elasticfilesystem:CreateMountTarget",
"elasticfilesystem:DeleteFileSystem",
"elasticfilesystem:DeleteMountTarget",
"elasticfilesystem:DescribeBackupPolicy",
"elasticfilesystem:DescribeFileSystemPolicy",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:DescribeLifecycleConfiguration",
"elasticfilesystem:DescribeMountTargetSecurityGroups",
"elasticfilesystem:DescribeMountTargets",
"elasticfilesystem:DescribeReplicationConfigurations",
"elasticfilesystem:PutLifecycleConfiguration",
"elasticfilesystem:TagResource"
],
"Resource": [
"*"
]
},
{
"Sid": "IAMManagement",
"Effect": "Allow",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:AttachRolePolicy",
"iam:CreateInstanceProfile",
"iam:CreateRole",
"iam:DeleteInstanceProfile",
"iam:DeleteRole",
"iam:DeleteRolePolicy",
"iam:DetachRolePolicy",
"iam:GetInstanceProfile",
"iam:GetRole",
"iam:PassRole",
"iam:PutRolePolicy",
"iam:RemoveRoleFromInstanceProfile",
"iam:TagRole"
],
"Resource": [
"*"
]
},
{
"Sid": "S3Management",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:DeleteBucketPolicy",
"s3:GetAccelerateConfiguration",
"s3:GetBucketCors",
"s3:GetBucketLogging",
"s3:GetBucketNotification",
"s3:GetBucketObjectLockConfiguration",
"s3:GetBucketOwnershipControls",
"s3:GetBucketPolicy",
"s3:GetBucketPublicAccessBlock",
"s3:GetBucketTagging",
"s3:GetBucketVersioning",
"s3:GetBucketWebsite",
"s3:PutBucketCors",
"s3:PutBucketPolicy",
"s3:PutBucketPublicAccessBlock",
"s3:PutBucketTagging"
],
"Resource": [
"*"
]
},
{
"Sid": "Miscellaneous",
"Effect": "Allow",
"Action": [
"acm:ListCertificates",
"kms:CreateGrant",
"kms:DescribeKey",
"kms:GenerateDataKeyWithoutPlaintext",
"servicequotas:GetServiceQuota"
],
"Resource": [
"*"
]
}
]
}
When running anyscale cloud register
, you will need both IAM policies for cloud register
and cloud verify
.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudformationManagement",
"Effect": "Allow",
"Action": [
"cloudformation:DescribeStacks",
"cloudformation:ListStacks"
],
"Resource": [
"*"
]
},
{
"Sid": "EC2Management",
"Effect": "Allow",
"Action": [
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs"
],
"Resource": [
"*"
]
},
{
"Sid": "EFSManagement",
"Effect": "Allow",
"Action": [
"elasticfilesystem:DescribeBackupPolicy",
"elasticfilesystem:DescribeFileSystemPolicy",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:DescribeMountTargets"
],
"Resource": [
"*"
]
},
{
"Sid": "IAMManagement",
"Effect": "Allow",
"Action": [
"acm:ListCertificates",
"iam:GetPolicy",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListAttachedRolePolicies",
"iam:ListInstanceProfilesForRole",
"iam:ListRolePolicies",
"iam:UpdateAssumeRolePolicy"
],
"Resource": [
"*"
]
},
{
"Sid": "S3Management",
"Effect": "Allow",
"Action": [
"s3:GetBucketCors",
"s3:GetBucketLocation",
"s3:GetBucketPolicy",
"s3:ListBucket",
"s3:ListAllMyBuckets"
],
"Resource": [
"*"
]
},
{
"Sid": "Miscellaneous",
"Effect": "Allow",
"Action": [
"servicequotas:GetServiceQuota"
],
"Resource": [
"*"
]
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudformationManagement",
"Effect": "Allow",
"Action": [
"cloudformation:DescribeStacks"
],
"Resource": [
"*"
]
},
{
"Sid": "EC2Management",
"Effect": "Allow",
"Action": [
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs"
],
"Resource": [
"*"
]
},
{
"Sid": "EFSManagement",
"Effect": "Allow",
"Action": [
"elasticfilesystem:DescribeBackupPolicy",
"elasticfilesystem:DescribeFileSystemPolicy",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:DescribeMountTargets"
],
"Resource": [
"*"
]
},
{
"Sid": "IAMManagement",
"Effect": "Allow",
"Action": [
"iam:GetRole",
"iam:ListAttachedRolePolicies",
"iam:ListRolePolicies",
"iam:GetRolePolicy",
"iam:GetPolicy",
"iam:ListInstanceProfilesForRole"
],
"Resource": [
"*"
]
},
{
"Sid": "S3Management",
"Effect": "Allow",
"Action": [
"s3:GetBucketCors",
"s3:GetBucketLocation",
"s3:GetBucketPolicy",
"s3:ListBucket",
"s3:ListAllMyBuckets"
],
"Resource": [
"*"
]
},
{
"Sid": "Miscellaneous",
"Effect": "Allow",
"Action": [
"servicequotas:GetServiceQuota"
],
"Resource": [
"*"
]
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EC2Management",
"Effect": "Allow",
"Action": [
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs"
],
"Resource": [
"*"
]
},
{
"Sid": "EFSManagement",
"Effect": "Allow",
"Action": [
"elasticfilesystem:DescribeBackupPolicy",
"elasticfilesystem:DescribeFileSystemPolicy",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:DescribeMountTargets"
],
"Resource": [
"*"
]
},
{
"Sid": "IAMManagement",
"Effect": "Allow",
"Action": [
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListAttachedRolePolicies",
"iam:ListInstanceProfilesForRole",
"iam:ListRolePolicies"
],
"Resource": [
"*"
]
},
{
"Sid": "S3Management",
"Effect": "Allow",
"Action": [
"s3:GetBucketCors",
"s3:GetBucketLocation",
"s3:GetBucketPolicy",
"s3:ListAllMyBuckets"
],
"Resource": [
"*"
]
},
{
"Sid": "Miscellaneous",
"Effect": "Allow",
"Action": [
"servicequotas:GetServiceQuota"
],
"Resource": [
"*"
]
}
]
}
Glossary
The following resources are required for both anyscale cloud setup
and anyscale cloud register
approaches:
- VPC & Subnets: A VPC is a virtual network within the customer AWS account and is logically isolated from other virtual networks in the cloud. A subnet is a range of IP addresses in your VPC to which your AWS resources (such as EC2 instances) can be attached. Anyscale deploys workloads in your account within the VPC and subnets defined as part of setup.
- Security Group: Security groups help secure the cloud environment by controlling the traffic that is allowed to reach and leave AWS hosted resources. Anyscale creates a security group with network rules to enable access to Anyscale’s suite of components and applications, such as
- Jupyter Labs
- Ray Dashboard
- Ray Serve endpoints
- Workspace
- S3 Bucket: Amazon S3 is an object storage service that offers scalability, data availability, security & performance. Anyscale utilizes this S3 bucket for a variety of functions that support the management of Ray clusters and Ray applications, including:
- General data storage that lasts beyond cluster lifespan
- Storing model checkpoints for Ray Tune or RLlib
- IAM Roles:
anyscale-iam-role
: Anyscale's control plane uses this role to launch Ray clusters in your AWS account. It needs permissions to manage EC2 instances and attach an IAM role.instance-iam-role
: The default role attached to Ray clusters. This role can be modified to suit the needs and permissions that your workload requires.- Both these roles are created by Anyscale.
- EFS: Amazon Elastic File System (EFS) is a cloud based, scalable file system for applications and workloads that can be in combination with other AWS services. EFS offers shared storage, is designed for scalable performance, and is secure & compliant with common regulatory standards. EFS is required for Anyscale Workspaces.
- Note that user defined tags are not supported at this time.