Skip to main content

Deploy an Anyscale Cloud on GCP

Before you run Ray workloads on Anyscale, an Anyscale organization member must deploy an Anyscale Cloud within a Google Cloud Platform (GCP) environment. This integration enables Anyscale to manage resources like compute instances and storage directly in a GCP project.

1. Prerequisites

  1. Create a Google Cloud project for Anyscale to operate in.
  2. Install the Google Cloud CLI.
  3. Optional: Authenticate the Google Cloud CLI in the project.

2. Install the Anyscale CLI

  1. Run the following command to install the Anyscale CLI and Python client package:
pip install -U "anyscale[gcp]"
  1. To authenticate your credentials, run the following command, which fetches and updates the token that confirms your identity in the ~/.anyscale/credentials.json file.
anyscale login

If necessary, log in to the Anyscale console to complete authentication.

3. Choose an Anyscale Cloud deployment method

Deploying an Anyscale Cloud integrates Anyscale's capabilities into your GCP project to leverage its compute, storage, and networking resources for scalable, distributed computing.

You can use one of two different deployment methods that use the Anyscale CLI for Cloud configuration. Choose a method based on your organization's requirements.

  • anyscale cloud setup - Use for rapid deployment and a straightforward, low-maintenance solution; deploy in public subnets and access over public IP addresses without setting up additional networking infrastructure.
  • anyscale cloud register - Suited for teams with advanced cloud expertise, seeking enhanced security, custom private networking, and specific compliance needs.

4. Create an Anyscale Cloud

Based on the deployment method selected from the previous step, create an Anyscale Cloud with the following instructions.

For the anyscale cloud setup deployment method, Anyscale automatically creates and configures the necessary resources within your GCP project. You deploy Ray clusters in public subnets and access them using public IP addresses without needing to set up additional networking infrastructure like VPNs.

Note: To manually customize resources, use the (Custom) cloud register method instead.

Direct Networking

Step 1: Verify permissions in the GCP project

Go to your role in the GCP project and ensure that your user account has the iam.serviceAccounts.setIamPolicy permission enabled so that Anyscale can grant Google Cloud Deployment Manager the permission to set IAM policies.

Step 2: Enable Google Cloud APIs

Enable the Cloud Resource Manager API so that Anyscale can enable all other required APIs.

Step 3: Deploy a new Cloud

Run the following command to deploy a new Cloud:

anyscale cloud setup
--provider gcp
--name ANYSCALE_CLOUD_NAME
--region GCP_COMPUTE_REGION
--project-id GCP_PROJECT_ID
--enable-head-node-fault-tolerance
🏁Optional flags

--enable-head-node-fault-tolerance: Enables head node fault tolerance in Anyscale Services by configuring an additional Memorystore instance for the Ray Global Control Store. Note that this flag extends the setup time by approximately 10 minutes.

5. Verify Cloud resources

To ensure that your Anyscale Cloud works as expected, launch a test Workspace or Service from your Cloud. Run the following CLI command to trigger a functional verification:

anyscale cloud verify --name CLOUD_NAME --functional-verify workspace,service

Glossary of cloud resources

  • Service accounts

    • Anyscale Access service account: Used by Anyscale to manage Google Compute Engine (GCE) instances and Ray clusters within your GCP project.
    • Ray Cluster service account: The default service account attached to Ray clusters. You can customize this account to meet the specific needs and permissions required by your workload.
  • Workload identity federation: A mechanism that allows Anyscale to exchange control plane AWS role credentials for time-bound GCP service account credentials.

  • VPC and subnets

    • Virtual Private Cloud (VPC): A virtual network within your GCP project. Anyscale deploys each Cloud in its own VPC, providing logical isolation from others.
    • Subnets: Ranges of IP addresses within your VPC. GCP resources like GCE VM instances are attached to these subnets. Anyscale deploys workloads within these defined VPCs and subnets.
  • Firewall policies

    • Collections of firewall rules that secure the cloud environment by controlling incoming and outgoing traffic to GCP resources. Anyscale requires specific firewall rules to enable access to its suite of components and applications, such as Jupyter Labs, Ray Dashboard, Ray Serve endpoints, and Workspaces.
  • Filestore

    • A cloud-based network file system compatible with other GCP services. It offers shared storage, scalable performance, and compliance with regulatory standards. Required for Anyscale Workspaces, Filestore supports applications and workloads that need shared storage.
  • Cloud storage bucket

    • A scalable and secure object storage service. Anyscale uses the specified cloud storage bucket for various functions supporting the management of Ray clusters and Ray applications. This includes general data storage persisting beyond the cluster lifespan and storing model checkpoints for tools like Ray Tune or RLlib.

Appendix: Minimal IAM Permissions for cloud commands

This section provides the minimal IAM permissions required for the Anyscale CLI to perform cloud operations. As an GCP administrator, follow these steps to apply the policy:

  1. Create a new custom role or edit an existing role to include the following permissions.

  2. Grant the role to the service account that will be used to run the Anyscale CLI.

certificatemanager.certmapentries.delete
certificatemanager.certmapentries.get
certificatemanager.certmapentries.list
certificatemanager.certmaps.delete
certificatemanager.certmaps.get
certificatemanager.certmaps.list
certificatemanager.certs.delete
certificatemanager.certs.get
certificatemanager.certs.list
compute.firewallPolicies.get
compute.firewallPolicies.update
compute.firewallPolicies.use
compute.globalOperations.get
compute.networks.setFirewallPolicy
deploymentmanager.deployments.create
deploymentmanager.deployments.get
deploymentmanager.deployments.update
deploymentmanager.manifests.get
deploymentmanager.operations.get
file.instances.get
iam.workloadIdentityPoolProviders.create
iam.workloadIdentityPools.create
iam.workloadIdentityPools.get
resourcemanager.projects.get
resourcemanager.projects.getIamPolicy
resourcemanager.projects.setIamPolicy
servicemanagement.services.bind
serviceusage.operations.get
serviceusage.services.enable
serviceusage.services.get