Deploy and manage Clouds on GCP
Set up and manage Anyscale clouds on the Google Cloud Platform.
Prerequisites
- A registered user account on Anyscale.
- An organization ID on Anyscale.
- A local Anyscale CLI set up.
pip install -U "anyscale[gcp]"
- A local
gcloud
CLI set up. Follow the instructions on installinggcloud
CLI. - A GCP project.
- You should have
owner
role on the project. - This project preferably only hosts Anyscale workloads.
- You should have
Google Cloud APIs will be enabled automatically if you create an Anyscale-managed cloud with cloud setup
or
use the Anyscale provided Terraform module.
Anyscale Clouds on GCP
Anyscale Clouds can be deployed in GCP accounts using two different networking modes:
- Direct Networking--Ray clusters will be deployed in public subnets and accessed via Public IP addresses
- Customer Defined Networking--Ray clusters will be deployed in private subnets and accessed over Private IP addresses
- Direct Networking
- Customer Defined Networking
Direct Networking clouds are the most common and developer friendly deployment option for Anyscale. Direct Networking allows you to connect to your Ray clusters, Workspaces, and Grafana dashboards without configuring additional networking infrastructure (usually a VPN) by routing over the Internet using Public IP addresses.
When deployed, an Anyscale Cloud deployed using Direct Networking has a similar architecture to the following:
To set up your Anyscale cloud with Customer Defined Resources, ensure that the Project in your GCP account satisfies all the resource requirements. Anyscale runs a verification of your resources automatically before completing the Cloud registration.
Customer Defined Networking clouds are designed for Anyscale customers that require fine-grained control on how Ray clusters can be accessed. Customer Defined Networking requires you to define the network paths between users and the private IP addresses of Ray clusters as well as between clusters and the Anyscale Control Plane. Connectivity and network performance between users and Ray clusters will vary based on your specific setup.
When deployed, an Anyscale Cloud deployed using Customer Defined Networking has a similar architecture to the following:
Notes:
- The requirements for resources such as IAM Roles, VPCs, and Security Groups can be found in detailed resource requirements.
- Direct Networking clouds can be created using either the Anyscale Managed or Customer Defined deployment modes whereas Customer Defined Networking clouds can only be created using the Customer Defined deployment mode using the
--private-network
parameter. - Artifact Registry is displayed to show a possible integration to support Custom Docker Environments.
- Due to limited support for certificates on GCP internal load balancers, Anyscale Services created for private network clouds are currently exposed over
http
instead ofhttps
.
Create an Anyscale Cloud
Run the following command to create an Anyscale cloud depending on your deployment option:
To set up your Anyscale cloud with Customer Defined Resources, ensure that the cloud resources in your GCP account satisfy all the requirements detailed resource requirements. Anyscale runs a verification of your resources automatically at the end of the registration.
We provide functional verification for both anyscale cloud setup
and anyscale cloud register
to make sure your cloud is ready. You can add the flag --functional-verify workspace
or --functional-verify service
in the cloud creation command, it will automatically launch a workspace or service from your cloud and verify your newly created cloud is functional.
- Anyscale Managed
- Customer Defined
anyscale cloud setup \
--provider gcp \
-n example_cloud_name \
--region us-west1 \
--project-id example-project
Authenticating
Output
⠧ Creating workload identity feneration provider for Anyscale access...(anyscale +8.4s) Workload Identity Pool created: projects/112233445566/locations/global/workloadIdentityPools/anyscale-provider-pool-5439d0b9
⠸ Creating workload identity feneration provider for Anyscale access...(anyscale +12.2s) Anyscale provider created: projects/112233445566/locations/global/workloadIdentityPools/anyscale-provider-pool-5439d0b9/providers/anyscale-access
(anyscale +13.4s) Track progress of Deployment Manager at https://console.cloud.google.com/dm/deployments/details/cld-abcd?project=example-project
(anyscale +13.4s) Note that it may take up to 5 minutes to create resources on GCP.
⠧ Creating cloud resources through Deployment Manager...(anyscale +3m42.6s) Deployment succeeded.
(anyscale +4m45.4s) Successfully created cloud example_cloud_name, and it's ready to use.
anyscale cloud register \
--provider gcp \
-n example_cloud_name \
--vpc-name example_vpc_name \
--subnet-names example_subnet_name \
--filestore-instance-id example_filestore_instance_id \
--filestore-location us-west1-a \
--anyscale-service-account-email example_anyscale_service_account@example_project.iam.gserviceaccount.com \
--instance-service-account-email example_instance_service_account@example_project.iam.gserviceaccount.com \
--firewall-policy-names example_firewall_policy_name \
--cloud-storage-bucket-name example_gcs_name \
--region us-west1 \
--project-id example_project \
--provider-name projects/12345678/locations/global/workloadIdentityPools/example-pool/providers/example-provider \
--private-network \ # for Customer Defined Networking
--functional-verify workspace # to launch a workspace after the cloud is created
Authenticating
Loaded Anyscale authentication token from ANYSCALE_CLI_TOKEN.
Output
⠹ Verifying cloud resources...(anyscale +6.1s) Subnet example_subnet_name verification succeeded.
(anyscale +8.2s) Verification result:
anyscale access: PASSED
project: PASSED
vpc and subnet: PASSED
anyscale iam service account: PASSED
cluster node service account: PASSED
firewall policy: PASSED
filestore: PASSED
cloud storage: PASSED
Please review the output from verification for any warnings. Would you like to proceed with cloud creation? [y/N]: y
(anyscale +1m37.0s) Successfully created cloud example_cloud_name, and it's ready to use.
(anyscale +1m37.0s) Start functional verification...
Functional verification for WORKSPACE is about to begin.
It will spin up one n1-standard-2 instance for each function and will incur a small amount of cost.
For workspace verification, it takes about 5 minutes.
The instances will be terminated after verification. Do you want to continue? [y/N]: y
╭───────────────────────────────────────────────────────────────────────────── workspace verification ─────────────────────────────────────────────────────────────────────────────╮
│ 0:00:02 Workspace created at https://console.anyscale.com/workspaces/expwrk_abc/ses_abc │
│ 0:01:22 Workspace is active. │
│ 0:00:00 Workspace termination initiated. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
0:01:24 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Workspace verification succeeded!
At the end of the cloud setup, we'll set up resources on the Anyscale Control Plane which may take up to a minute. If you see the following error messages, your cloud might not be set up correctly. Please contact your Anyscale support:
Timed out waiting for the cloud to become active.
Failed to get cloud provider metadata.
You can make an Anyscale cloud the default cloud by running
anyscale cloud set-default <anyscale-cloud-name-or-cloud-id>
Anyscale will create new clusters in the default cloud if no cloud is specified in the compute configs.
Fine-Grained Permission Control With Cloud Register (Advanced)
If you would like more fine-grained control over permissions, you can use the Anyscale provided Terraform module to construct a permission set that is more suited to your needs.
You can refer to the GCP module here:
For instructions on how to use the module, please refer to the guide here
Steps to Create a Cloud
- Customize your expected cloud environment by providing necessary values for parameters of the Terraform module
- Apply Terraform module to your cloud environment
- Run the
cloud register
command that is returned by the Terraform module- NOTE: You will need to export your Anyscale and cloud credentials before running this command
You are responsible for ensuring that the permissions are properly configured before deploying your workload. Please refer to Verify Cloud Resources to validate your permissions set. For further assistance, reach out to Anyscale support.
Verify Cloud Resources
Anyscale provides a CLI command that you can use to verify cloud resources for both options. Anyscale runs verification automatically during cloud creation and you can also run the verification on demand.
You can also trigger functional verification by specifying --functional-verify workspace
or --functional-verify service
.
Anyscale launches a workspace or a service to verify the cloud is functional.
You can also trigger both verifications (--functional-verify workspace,service
).
$ anyscale cloud verify --name example_cloud_name --functional-verify workspace
Authenticating
Output
(anyscale +7.5s) Verification result:
anyscale access: PASSED
project: PASSED
vpc and subnet: PASSED
anyscale iam service account: PASSED
cluster node service account: PASSED
firewall policy: PASSED
filestore: PASSED
cloud storage: PASSED
(anyscale +7.5s) Start functional verification...
Functional verification for WORKSPACE is about to begin.
It will spin up one n1-standard-2 instance for each function and will incur a small amount of cost.
For workspace verification, it takes about 5 minutes.
The instances will be terminated after verification. Do you want to continue? [y/N]: y
╭───────────────────────────────────────────────────────────────────────────── workspace verification ─────────────────────────────────────────────────────────────────────────────╮
│ 0:00:02 Workspace created at https://console.anyscale.com/workspaces/expwrk_abc/ses_abc │
│ 0:01:22 Workspace is active. │
│ 0:00:00 Workspace termination initiated. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
0:01:24 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Workspace verification succeeded!
Delete an Anyscale Cloud
You can delete the cloud using the following command:
$ anyscale cloud delete --name <anyscale-cloud-name>
- This operation is only supported if the cloud has no active or pending instances/clusters associated with it.
- If the cluster has running status, please terminate the cluster.
- If the cluster is in an error status, please follow the instructions on the error. Also check in your GCP console to ensure that there are no running instances of this cluster.
- After the cloud is deleted, you won’t be able to do any operations on this cloud. This means you won’t be able to create clusters, jobs, services, nor workspaces in this cloud.
- After the cloud is deleted, you won’t be able to access clusters of this cloud from Anyscale. This means you won’t see clusters associated with this cloud whichever the status of the cluster before deletion.
Note that for Anyscale Managed Resources, cloud deletion also deletes all the resources associated with the cloud. For Customer Defined Resources, Anyscale doesn't delete any cloud provider resources created by you. Revoke Anyscale's access to your project by either deleting the access Service Account or the workload identity provider associated with Anyscale's AWS account.
Example command and output:
- Anyscale Managed
- Customer Defined
$ anyscale cloud delete example_cloud_name
Authenticating
Output
If the cloud cld_abcd is deleted, you will not be able to access existing clusters of this cloud.
For more information, please refer to the documentation https://docs.anyscale.com/cloud-deployment/gcp/manage-clouds#delete-an-anyscale-cloud
Continue? [y/N]: y
(anyscale +3.9s)
Track progress of Deployment Manager at https://console.cloud.google.com/dm/deployments/details/cld-abcd?project=example-project
(anyscale +3m8.4s)
The cloud bucket (storage-bucket-cld-abcd) associated with this cloud still exists.
If you no longer need the data associated with this bucket, please delete it.
(anyscale +3m8.7s) Deleted cloud with name example_cloud_name.
$ anyscale cloud delete example_cloud_name
Authenticating
Loaded Anyscale authentication token from ANYSCALE_CLI_TOKEN.
Output
If the cloud example_cloud_id is deleted, you will not be able to access existing clusters of this cloud.
Note that Anyscale does not delete any of the cloud provider resources created by you.
For more information, please refer to the documentation https://docs.anyscale.com/user-guide/onboard/clouds/deploy-on-gcp#delete-an-anyscale-cloud
Continue? [y/N]: y
[Warning] The workload identity federation provider pool example_pool and service account service_account that allows Anyscale to access your GCP account is still in place. Please delete it manually if you no longer wish anyscale to have access.
(anyscale +4.4s) Deleted cloud with name example cloud name.
Edit Customer Defined Resources
Anyscale provides a CLI command that you can use to edit cloud resources for registered cloud (cloud created with customer defined resources).
The editable resources are: Filestore, Storage Bucket id.
You can edit with the following example commands:
# Edit filestore. Please specify both filestore instance id and location.
$ anyscale cloud edit <cloud-name> --gcp-filestore-instance-id=<your_new_filestore_instance_id> --gcp-filestore-location=<location_of_the_new_filestore>
# Edit storage bucket id.
$ anyscale cloud edit <cloud-name> --gcp-cloud-storage-bucket-name==<your_new_storage_bucket_id>
Additional Options:
- Use cloud-id as an alternative to cloud-name. Example command:
$ anyscale cloud edit --cloud-id=<cloud_id> --gcp-cloud-storage-bucket-name==<your_new_storage_bucket_id>
- Ensure the functionality of your edited cloud with functional verification. You can add the flag
--functional-verify workspace
or--functional-verify service
in the command, it will automatically launch a workspace or service from your cloud and verify your edited cloud is functional. Example command:
$ anyscale cloud edit <cloud-name> --gcp-cloud-storage-bucket-name==<your_new_storage_bucket_id> --functional-verify workspace
Important Notes:
- Before the edit, we'll execute a static cloud verification and request your confirmation. Ensure you review any warnings or errors in the verification results.
- The edit is only for registered clouds, and does not apply to managed clouds.
- If there are running workloads utilizing the old resources, you may want to retain them. Please note that this edit will not automatically remove any old resources. If you wish to delete them, you'll need to handle it.
Appendix: Definitions
The following resources are required for both Anyscale Managed and Customer Defined approaches:
- Service Accounts:
- Anyscale access Service Account: The Service Account that Anyscale uses to manage GCE instances and Ray clusters in your GCP project.
- Ray cluster Service Account: Default Service Account attached to Ray clusters. This Service Account can be modified to suit the needs and permissions that your workload requires.
- Workload Identify Federation: The mechanism Anyscale uses for exchanging our control plane's AWS role credentials for time-bound GCP Service Account credentials.
- VPC and Subnets: A VPC is a virtual network within the GCP project and every Anyscale cloud is deployed in its own VPC and is logically isolated from each other. A subnet is a range of IP addresses in your VPC to which your GCP resources (such as GCE VM instances) can be attached. Anyscale deploys workloads in your account within the VPC and subnets defined as part of the cloud setup.
- Firewall Policies: Groups of firewall rules, which help secure the cloud environment by controlling the traffic that is allowed to reach and leave GCP resources. Anyscale requires firewall rules to enable access to Anyscale’s suite of components and applications, such as:
- Jupyter Labs
- Ray Dashboard
- Ray Serve endpoints
- Workspace
- Filestore: A cloud based network file system for applications and workloads that can be used with other GCP services. Filestore offers shared storage, is designed for scalable performance, and is secure & compliant with common regulatory standards. Filestore is required for Anyscale Workspaces.
- Cloud Storage Bucket: An object storage service that offers scalability, data availability, security & performance. Anyscale utilizes the specified cloud storage bucket for a variety of functions that support the management of Ray clusters and Ray applications, including:
- General data storage that lasts beyond cluster lifespan
- Storing model checkpoints for Ray Tune or RLlib
Appendix: Detailed Resource Requirements
Detailed requirements for customer defined resources
GCP APIs
The project used to host the Anyscale Cloud requires the following APIs
- Compute Engine API ("compute.googleapis.com")
- Cloud Filestore API ("file.googleapis.com")
- Cloud Storage API ("storage-component.googleapis.com", "storage.googleapis.com")
- Certificate Manager API ("certificatemanager.googleapis.com")
- Deployment Manager API ("deploymentmanager.googleapis.com")
- Cloud Resource Manager API ("cloudresourcemanager.googleapis.com")
- Service Usage API ("serviceusage.googleapis.com")
Service Accounts
- Service Account for Anyscale access (
--anyscale-service-account-email
)- This Service Account requires the following pre-defined GCP roles:
- The
owner
role on the project. Service Account Token Creator
role on the Service Account itself.
- The
- This Service Account requires the following pre-defined GCP roles:
- Service Account for Ray Cluster (
--instance-service-account-email
)- This Service Account requires:
- Read, write and list permissions on the cloud storage bucket. The easiest option is to assign the
Storage Admin
role to both Service Accounts in the bucket policy. Artifact Registry Reader
role on the project if you want to use Custom Docker Environments.
- Read, write and list permissions on the cloud storage bucket. The easiest option is to assign the
- This Service Account requires:
Workload Identity Federation
- Create a workload identity pool. Inside the pool, create a workload identity provider with the following components:
- Anyscale's control plane AWS account number as the
AWS account ID
. Contact Anyscale support for this number. - 3 attribute mappings (the first two are auto-filled on the console, the third must be manually added).
{
"google.subject": "assertion.arn",
"attribute.aws_role": "assertion.arn.contains('assumed-role') ? assertion.arn.extract('{account_arn}assumed-role/') + 'assumed-role/' + assertion.arn.extract('assumed-role/{role_name}/') : assertion.arn",
"attribute.arn": "assertion.arn"
}, - An attribute condition restricting access to an organization-specific AWS IAM role in Anyscale's AWS account. Find your organization ID on the admin page.
google.subject.startsWith("arn:aws:sts::<Anyscale AWS #>:assumed-role/gcp_if_<Organization ID>")
- Anyscale's control plane AWS account number as the
- Grant the pool permission to act as the access Service Account by attaching the following IAM binding to the access Service Account.
{
"role": "roles/iam.workloadIdentityUser",
"members": [
"principalSet://iam.googleapis.com/projects/<PROJECT_NUMBER>/locations/global/workloadIdentityPools/<POOL_NAME>/attribute.role_name/arn:aws:sts::<Anyscale AWS #>:assumed-role/gcp_if_<Organization ID>"
]
}
VPC and Subnets
- Create VPC with either custom or auto mode.
- Anyscale requires exactly one subnet to launch instances in.
- (Optional) To use Anyscale Services on a private network, your VPC must have a proxy-only subnet in the same region as the cloud.
- The subnet CIDR range must be greater than or equal to /24, but we recommend that it be greater than or equal to /20.
- See here for valid GCP IPv4 ranges.
- The subnet is public with a route to the internet (this is the default GCP setting), or
the subnet is private when the
--private-network
flag is set in cloud registry for Customer Defined Resources.
Firewall policy
- Anyscale requires exactly one firewall policy that's associated with the VPC.
- Firewall policy can be either global or regional.
- The policy should include the following firewall rules:
- Allow ingress from
0.0.0.0/0
on port 22, 443. The0.0.0.0/0
can be scoped down. - Allow all traffic within the same subnet
- Allow ingress from
Filestore
- Create a Filestore instance in the same VPC as the cloud.
- Recommended: Create the Filestore instance in the same region as the cloud for optimal performance.
- Recommended: Select a service tier that is appropriate for future scaling. Filestore capacity can be increased after creation, but is bound by the service tier.
Cloud Storage Bucket
- Grant both the Anyscale access Service Account and the ray cluster node Service Account read/write/list access to the bucket.
- The easiest option is to assign the
Storage Admin
(roles/storage.admin
) role to both Service Accounts in the bucket policy. - The exact permissions are as follows:
storage.buckets.get
: Get bucket infostorage.objects.[get,list,create]
: List and create objects in a bucketstorage.multipartUploads.[create,listParts,abort]
: Manage multipart uploads
- The easiest option is to assign the
- CORS should be set up as the following if you wish to use Anyscale frontend to view data in the bucket:
[
{
"origin": ["https://console.anyscale.com"],
"method": ["GET"],
"responseHeader": ["*"],
"maxAgeSeconds": 3600
}
] - Recommended: Create the bucket in the same region as the cloud.
- Recommended: Configure the bucket to block all public access.
- Recommended: Use the default unified access control to simplify permissions.