Skip to main content

Deploy and manage Clouds on GCP

Set up and manage Anyscale clouds on the Google Cloud Platform.

Prerequisites

  • A registered user account on Anyscale.
  • An organization ID on Anyscale.
  • A local Anyscale CLI set up.
    • pip install -U "anyscale[gcp]"
  • A local gcloud CLI set up. Follow the instructions on installing gcloud CLI.
  • A GCP project.
    • You should have owner role on the project.
    • This project preferably only hosts Anyscale workloads.
info

Google Cloud APIs will be enabled automatically if you use the Anyscale provided Terraform module.

warning

Before deploying an Anyscale-managed cloud with cloud setup:

Anyscale Clouds on GCP

Anyscale Clouds can be deployed in GCP accounts using two different networking modes:

  • Direct Networking--Ray clusters will be deployed in public subnets and accessed via Public IP addresses
  • Customer Defined Networking--Ray clusters will be deployed in private subnets and accessed over Private IP addresses

Direct Networking clouds are the most common and developer friendly deployment option for Anyscale. Direct Networking allows you to connect to your Ray clusters, Workspaces, and Grafana dashboards without configuring additional networking infrastructure (usually a VPN) by routing over the Internet using Public IP addresses.

When deployed, an Anyscale Cloud deployed using Direct Networking has a similar architecture to the following:

Direct Networking

Notes:

  • The requirements for resources such as IAM Roles, VPCs, and Security Groups can be found in detailed resource requirements.
  • Direct Networking clouds can be created using either the Anyscale Managed or Customer Defined deployment modes whereas Customer Defined Networking clouds can only be created using the Customer Defined deployment mode using the --private-network parameter.
  • Artifact Registry is displayed to show a possible integration to support Custom Docker Environments.
  • Due to limited support for certificates on GCP internal load balancers, Anyscale Services created for private network clouds are currently exposed over http instead of https.

Create an Anyscale Cloud

Run the following command to create an Anyscale cloud depending on your deployment option:

caution

To set up your Anyscale cloud with Customer Defined Resources, ensure that the cloud resources in your GCP account satisfy all the requirements detailed resource requirements. Anyscale runs a verification of your resources automatically at the end of the registration.

info

We provide functional verification for both anyscale cloud setup and anyscale cloud register to make sure your cloud is ready. You can add the flag --functional-verify workspace or --functional-verify service in the cloud creation command, it will automatically launch a workspace or service from your cloud and verify your newly created cloud is functional.

anyscale cloud setup \
--provider gcp \
-n example_cloud_name \
--region us-west1 \
--project-id example-project
Authenticating

Output
⠧ Creating workload identity feneration provider for Anyscale access...(anyscale +8.4s) Workload Identity Pool created: projects/112233445566/locations/global/workloadIdentityPools/anyscale-provider-pool-5439d0b9
⠸ Creating workload identity feneration provider for Anyscale access...(anyscale +12.2s) Anyscale provider created: projects/112233445566/locations/global/workloadIdentityPools/anyscale-provider-pool-5439d0b9/providers/anyscale-access
(anyscale +13.4s) Track progress of Deployment Manager at https://console.cloud.google.com/dm/deployments/details/cld-abcd?project=example-project
(anyscale +13.4s) Note that it may take up to 5 minutes to create resources on GCP.
⠧ Creating cloud resources through Deployment Manager...(anyscale +3m42.6s) Deployment succeeded.
(anyscale +4m45.4s) Successfully created cloud example_cloud_name, and it's ready to use.
note

If the optional flag --enable-head-node-fault-tolerance is specified, an additional Memorystore instance will be created, which will be used to enable head node fault tolerance in services. Creating a cloud with --enable-head-node-fault-tolerance flag should take around 10 minutes.

caution

At the end of the cloud setup, we'll set up resources on the Anyscale Control Plane which may take up to a minute. If you see the following error messages, your cloud might not be set up correctly. Please contact your Anyscale support:

Timed out waiting for the cloud to become active.
Failed to get cloud provider metadata.
note

You can make an Anyscale cloud the default cloud by running

 anyscale cloud set-default <anyscale-cloud-name-or-cloud-id>

Anyscale will create new clusters in the default cloud if no cloud is specified in the compute configs.

note

By default, Anyscale doesn't set any retention policy for the storage bucket created by managed cloud setup. If you have any preference or concern, you could set on your own.

Fine-Grained Permission Control With Cloud Register (Advanced)

If you would like more fine-grained control over permissions, you can use the Anyscale provided Terraform module to construct a permission set that is more suited to your needs.

You can refer to the GCP module here:

For instructions on how to use the module, please refer to the guide here

Steps to Create a Cloud

  1. Customize your expected cloud environment by providing necessary values for parameters of the Terraform module
  2. Apply Terraform module to your cloud environment
  3. Run the cloud register command that is returned by the Terraform module
    1. NOTE: You will need to export your Anyscale and cloud credentials before running this command
warning

You are responsible for ensuring that the permissions are properly configured before deploying your workload. Please refer to Verify Cloud Resources to validate your permissions set. For further assistance, reach out to Anyscale support.

Verify Cloud Resources

Anyscale provides a CLI command that you can use to verify cloud resources for both options. Anyscale runs verification automatically during cloud creation and you can also run the verification on demand.

You can also trigger functional verification by specifying --functional-verify workspace or --functional-verify service. Anyscale launches a workspace or a service to verify the cloud is functional. You can also trigger both verifications (--functional-verify workspace,service).

$ anyscale cloud verify --name example_cloud_name --functional-verify workspace

Authenticating

Output
(anyscale +7.5s) Verification result:
anyscale access: PASSED
project: PASSED
vpc and subnet: PASSED
anyscale iam service account: PASSED
cluster node service account: PASSED
firewall policy: PASSED
filestore: PASSED
cloud storage: PASSED
(anyscale +7.5s) Start functional verification...
Functional verification for WORKSPACE is about to begin.
It will spin up one n1-standard-2 instance for each function and will incur a small amount of cost.
For workspace verification, it takes about 5 minutes.
The instances will be terminated after verification. Do you want to continue? [y/N]: y
╭───────────────────────────────────────────────────────────────────────────── workspace verification ─────────────────────────────────────────────────────────────────────────────╮
0:00:02 Workspace created at https://console.anyscale.com/workspaces/expwrk_abc/ses_abc │
0:01:22 Workspace is active. │
0:00:00 Workspace termination initiated. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
0:01:24 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Workspace verification succeeded!

Delete an Anyscale Cloud

You can delete the cloud using the following command:

$ anyscale cloud delete --name <anyscale-cloud-name>
  • This operation is only supported if the cloud has no active or pending instances/clusters associated with it.
    • If the cluster has running status, please terminate the cluster.
    • If the cluster is in an error status, please follow the instructions on the error. Also check in your GCP console to ensure that there are no running instances of this cluster.
  • After the cloud is deleted, you won’t be able to do any operations on this cloud. This means you won’t be able to create clusters, jobs, services, nor workspaces in this cloud.
  • After the cloud is deleted, you won’t be able to access clusters of this cloud from Anyscale. This means you won’t see clusters associated with this cloud whichever the status of the cluster before deletion.
caution

Note that for Anyscale Managed Resources, cloud deletion also deletes all the resources associated with the cloud. For Customer Defined Resources, Anyscale doesn't delete any cloud provider resources created by you. Revoke Anyscale's access to your project by either deleting the access Service Account or the workload identity provider associated with Anyscale's AWS account.

Example command and output:

$ anyscale cloud delete example_cloud_name

Authenticating

Output

If the cloud cld_abcd is deleted, you will not be able to access existing clusters of this cloud.

For more information, please refer to the documentation https://docs.anyscale.com/cloud-deployment/gcp/manage-clouds#delete-an-anyscale-cloud
Continue? [y/N]: y
(anyscale +3.9s)
Track progress of Deployment Manager at https://console.cloud.google.com/dm/deployments/details/cld-abcd?project=example-project
(anyscale +3m8.4s)
The cloud bucket (storage-bucket-cld-abcd) associated with this cloud still exists.
If you no longer need the data associated with this bucket, please delete it.
(anyscale +3m8.7s) Deleted cloud with name example_cloud_name.

Edit Customer Defined Resources

Anyscale provides a CLI command that you can use to edit cloud resources for registered cloud (cloud created with customer defined resources).

The editable resources are: Filestore, Storage Bucket id.

You can edit with the following example commands:

# Edit filestore. Please specify both filestore instance id and location.
$ anyscale cloud edit <cloud-name> --gcp-filestore-instance-id=<your_new_filestore_instance_id> --gcp-filestore-location=<location_of_the_new_filestore>

# Edit storage bucket id.
$ anyscale cloud edit <cloud-name> --gcp-cloud-storage-bucket-name==<your_new_storage_bucket_id>

Additional Options:

  • Use cloud-id as an alternative to cloud-name. Example command:
$ anyscale cloud edit --cloud-id=<cloud_id> --gcp-cloud-storage-bucket-name==<your_new_storage_bucket_id>
  • Ensure the functionality of your edited cloud with functional verification. You can add the flag --functional-verify workspace or --functional-verify service in the command, it will automatically launch a workspace or service from your cloud and verify your edited cloud is functional. Example command:
$ anyscale cloud edit <cloud-name> --gcp-cloud-storage-bucket-name==<your_new_storage_bucket_id> --functional-verify workspace

Important Notes:

  • Before the edit, we'll execute a static cloud verification and request your confirmation. Ensure you review any warnings or errors in the verification results.
  • The edit is only for registered clouds, and does not apply to managed clouds.
  • If there are running workloads utilizing the old resources, you may want to retain them. Please note that this edit will not automatically remove any old resources. If you wish to delete them, you'll need to handle it.

Appendix: Definitions

The following resources are required for both Anyscale Managed and Customer Defined approaches:

  • Service Accounts:
    • Anyscale access Service Account: The Service Account that Anyscale uses to manage GCE instances and Ray clusters in your GCP project.
    • Ray cluster Service Account: Default Service Account attached to Ray clusters. This Service Account can be modified to suit the needs and permissions that your workload requires.
  • Workload Identify Federation: The mechanism Anyscale uses for exchanging our control plane's AWS role credentials for time-bound GCP Service Account credentials.
  • VPC and Subnets: A VPC is a virtual network within the GCP project and every Anyscale cloud is deployed in its own VPC and is logically isolated from each other. A subnet is a range of IP addresses in your VPC to which your GCP resources (such as GCE VM instances) can be attached. Anyscale deploys workloads in your account within the VPC and subnets defined as part of the cloud setup.
  • Firewall Policies: Groups of firewall rules, which help secure the cloud environment by controlling the traffic that is allowed to reach and leave GCP resources. Anyscale requires firewall rules to enable access to Anyscale’s suite of components and applications, such as:
    • Jupyter Labs
    • Ray Dashboard
    • Ray Serve endpoints
    • Workspace
  • Filestore: A cloud based network file system for applications and workloads that can be used with other GCP services. Filestore offers shared storage, is designed for scalable performance, and is secure & compliant with common regulatory standards. Filestore is required for Anyscale Workspaces.
  • Cloud Storage Bucket: An object storage service that offers scalability, data availability, security & performance. Anyscale utilizes the specified cloud storage bucket for a variety of functions that support the management of Ray clusters and Ray applications, including:
    • General data storage that lasts beyond cluster lifespan
    • Storing model checkpoints for Ray Tune or RLlib

Appendix: Detailed Resource Requirements

Detailed requirements for customer defined resources

GCP APIs

The project used to host the Anyscale Cloud requires the following APIs

  • Compute Engine API ("compute.googleapis.com")
  • Cloud Filestore API ("file.googleapis.com")
  • Cloud Storage API ("storage-component.googleapis.com", "storage.googleapis.com")
  • Certificate Manager API ("certificatemanager.googleapis.com")
  • Deployment Manager API ("deploymentmanager.googleapis.com")
  • Cloud Resource Manager API ("cloudresourcemanager.googleapis.com")
  • Service Usage API ("serviceusage.googleapis.com")
  • Cloud Memorystore for Redis API ("redis.googleapis.com") [Optional for enabling head node fault tolerance in services]

Service Accounts

  • Service Account for Anyscale access (--anyscale-service-account-email)
    • This Service Account requires the following pre-defined GCP roles:
      • The owner role on the project.
      • Service Account Token Creator role on the Service Account itself.
  • Service Account for Ray Cluster (--instance-service-account-email)
    • This Service Account requires:
      • Read, write and list permissions on the cloud storage bucket. The easiest option is to assign the Storage Admin role to both Service Accounts in the bucket policy.
      • Artifact Registry Reader role on the project if you want to use Custom Docker Environments.

Workload Identity Federation

  • Create a workload identity pool. Inside the pool, create a workload identity provider with the following components:
    • Anyscale's control plane AWS account number as the AWS account ID. Contact Anyscale support for this number.
    • 3 attribute mappings (the first two are auto-filled on the console, the third must be manually added).
        {
      "google.subject": "assertion.arn",
      "attribute.aws_role": "assertion.arn.contains('assumed-role') ? assertion.arn.extract('{account_arn}assumed-role/') + 'assumed-role/' + assertion.arn.extract('assumed-role/{role_name}/') : assertion.arn",
      "attribute.arn": "assertion.arn"
      },
    • An attribute condition restricting access to an organization-specific AWS IAM role in Anyscale's AWS account. Find your organization ID on the admin page.
      google.subject.startsWith("arn:aws:sts::<Anyscale AWS #>:assumed-role/gcp_if_<Organization ID>")
  • Grant the pool permission to act as the access Service Account by attaching the following IAM binding to the access Service Account.
    {
    "role": "roles/iam.workloadIdentityUser",
    "members": [
    "principalSet://iam.googleapis.com/projects/<PROJECT_NUMBER>/locations/global/workloadIdentityPools/<POOL_NAME>/attribute.role_name/arn:aws:sts::<Anyscale AWS #>:assumed-role/gcp_if_<Organization ID>"
    ]
    }

VPC and Subnets

  • Create VPC with either custom or auto mode.
  • Anyscale requires exactly one subnet to launch instances in.
  • The subnet CIDR range must be greater than or equal to /24, but we recommend that it be greater than or equal to /20.
    • See here for valid GCP IPv4 ranges.
  • The subnet is public with a route to the internet (this is the default GCP setting), or the subnet is private when the --private-network flag is set in cloud registry for Customer Defined Resources.

Firewall policy

  • Anyscale requires exactly one firewall policy that's associated with the VPC.
  • Firewall policy can be either global or regional.
  • The policy should include the following firewall rules:
    • Allow ingress from 0.0.0.0/0 on port 22, 443. The 0.0.0.0/0 can be scoped down.
    • Allow all traffic within the same subnet

Filestore

  • Create a Filestore instance in the same VPC as the cloud.
  • Recommended: Create the Filestore instance in the same region as the cloud for optimal performance.
  • Recommended: Select a service tier that is appropriate for future scaling. Filestore capacity can be increased after creation, but is bound by the service tier.

Cloud Storage Bucket

  • Grant both the Anyscale access Service Account and the ray cluster node Service Account read/write/list access to the bucket.
    • The easiest option is to assign the Storage Admin (roles/storage.admin) role to both Service Accounts in the bucket policy.
    • The exact permissions are as follows:
      • storage.buckets.get: Get bucket info
      • storage.objects.[get,list,create]: List and create objects in a bucket
      • storage.multipartUploads.[create,listParts,abort]: Manage multipart uploads
  • CORS should be set up as the following if you wish to use Anyscale frontend to view data in the bucket:
    [
    {
    "origin": ["https://console.anyscale.com"],
    "method": ["GET"],
    "responseHeader": ["*"],
    "maxAgeSeconds": 3600
    }
    ]
  • Recommended: Create the bucket in the same region as the cloud.
  • Recommended: Configure the bucket to block all public access.
  • Recommended: Use the default unified access control to simplify permissions.
warning

Anyscale does not assume responsibility for data loss. To mitigate this risk, it is advisable to implement GCS object versioning and configure lifecycle management policies for data retention (GCS Documentation: Object Versioning, Object Lifecycle)

Memorystore Instance

  • Standard tier with at least 5 GB memory
  • The instance should have read replica mode enabled and has at least one read replica.
  • The instance should be in the same VPC configured for the cloud.
  • The instance should have maxmemory-policy set to allkeys-lru.
  • The instance should have TLS disabled.