Skip to main content

Deploy Anyscale on GKE (cloud register)

Deploy Anyscale on GKE (cloud register)

This page describes how to deploy an Anyscale cloud on Google Kubernetes Engine (GKE) using anyscale cloud register. This path gives you full control over cluster resources and networking. You provision and configure all resources yourself, then register the cloud with Anyscale.

tip

For a faster, automated setup, use anyscale cloud setup instead. See Set up Anyscale on GKE.

Complete the following steps to configure and deploy a new Anyscale cloud on GKE.

1. Install the Anyscale CLI

pip install -U "anyscale[gcp]"
anyscale login # authenticate

2. Authenticate the gcloud CLI

Prepare a Google Cloud project for Anyscale to use, and install the gcloud CLI if you haven't. See the Google Cloud instructions for installing the gcloud CLI.

note

Before you continue, make sure your Google Cloud credentials have the Owner role on the project you want Anyscale to use. See Configure Google Cloud resources for an Anyscale cloud.

3. Use the Anyscale Terraform module to create a GKE cluster

Anyscale provides a Terraform module to deploy a GKE cluster and supporting Google Cloud resources.

note

To use an existing GKE cluster, follow the existing GKE cluster example or see the Anyscale Operator documentation and the Anyscale Terraform repository.

Enter information about where you want to deploy your GKE cluster:

Clone the Terraform module and navigate to the GKE example:

git clone https://github.com/anyscale/terraform-kubernetes-anyscale-foundation-modules
cd terraform-kubernetes-anyscale-foundation-modules/examples/gcp/gke-new_cluster/

Run the following command to create and populate a Terraform variable file:

cat <<EOF > terraform.tfvars
google_project_id = "<your_google_project_id>"
google_region = "<your_google_region>"
gke_cluster_name = "<your_gke_cluster_name>"
EOF
note

The Terraform example enables GPU node pools (T4) by default. To customize or disable GPU pools, set gpu_instance_configs in your terraform.tfvars (for example, use an empty map {} to disable GPU pools).

Run the following commands to apply the Terraform configuration. This may take several minutes.

terraform init
terraform plan
terraform apply
note

You may need to enable some Google Cloud APIs for the Terraform to apply successfully.

Collect and enter the following values from your Terraform output:

4. Install additional GKE components

In this step, you connect to your GKE cluster and install Envoy Gateway for externally facing load balancing. For more information about customizing GKE, see the Anyscale Terraform repository.

Run the following command to connect your terminal to the GKE cluster:

gcloud container clusters get-credentials <your_gke_cluster_name> --region <your_google_region> --project <your_google_project_id>
note

You may need to install the gke-gcloud-auth-plugin if it isn't already installed.

Install Envoy Gateway v1.7.0:

helm install eg oci://docker.io/envoyproxy/gateway-helm \
--version v1.7.0 \
--namespace envoy-gateway-system \
--create-namespace
kubectl wait --for=condition=available deployment/envoy-gateway \
-n envoy-gateway-system --timeout=120s

Create a file named envoyproxy.yaml with the following contents:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: envoy-proxy
namespace: envoy-gateway-system
spec:
provider:
type: Kubernetes
kubernetes:
envoyService:
type: LoadBalancer
annotations:
cloud.google.com/load-balancer-type: "External"

Apply the resource:

kubectl apply -f envoyproxy.yaml

Create a file named gatewayclass.yaml with the following contents:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: envoy-proxy
namespace: envoy-gateway-system

Apply the resource:

kubectl apply -f gatewayclass.yaml
note

Envoy Gateway is the recommended ingress controller for Anyscale on Kubernetes. Other gateway and ingress controllers are supported. See Ingress and gateway controllers for all supported options.

5. Register the Anyscale cloud resources

Run the following command with the values from your Terraform output. Verify all variables are entered correctly.

anyscale cloud register \
--name <your_cloud_name> \
--provider gcp \
--region <your_google_region> \
--compute-stack k8s \
--kubernetes-zones us-central1-a,us-central1-b \
--anyscale-operator-iam-identity anyscale-gke-nodes@<your_google_project_id>.iam.gserviceaccount.com \
--cloud-storage-bucket-name gs://<your_storage_bucket_name>

Record the cloud resource ID from the output in the following field:

6. Install and deploy the Anyscale operator on your GKE cluster

In this step, you add the Anyscale operator Helm chart to your GKE cluster, create a values.yaml file that describes your cloud and Google Cloud identity, and install the operator with Helm.

Add the Anyscale operator Helm chart

Run the following command to add the Anyscale operator Helm chart:

helm repo add anyscale https://anyscale.github.io/helm-charts
helm repo update anyscale

Create the Gateway

The anyscale-operator namespace must exist before you create the Gateway. Create it with the following command. If the namespace already exists, this command returns an error; that's expected and you can proceed.

kubectl create namespace anyscale-operator

Create a file named gateway.yaml.

Enter your cloud resource ID from step 5. Hyphens are applied automatically for the certificate name:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: gateway
namespace: anyscale-operator
spec:
gatewayClassName: eg
listeners:
- name: http
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: All
- name: https
port: 443
protocol: HTTPS
hostname: '*.i.anyscaleuserdata.com'
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: anyscale-<cloud-resource-id>-certificate
allowedRoutes:
namespaces:
from: All
- name: https-session
port: 443
protocol: HTTPS
hostname: '*.s.anyscaleuserdata.com'
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: anyscale-svc-<cloud-resource-id>-certificate
allowedRoutes:
namespaces:
from: All

Apply the Gateway and retrieve its external address:

kubectl apply -f gateway.yaml
kubectl get gateway gateway -n anyscale-operator \
-o jsonpath='{.status.addresses[0].value}'

Record the Gateway address in the following field:

Create a values YAML file

Create a values.yaml file. If your load balancer provides a hostname, use hostname. If it provides an IP address, use ip instead.

global:
cloudDeploymentId: <your_cloud_resource_id>
cloudProvider: gcp
auth:
iamIdentity: anyscale-gke-nodes@<your_google_project_id>.iam.gserviceaccount.com

workloads:
serviceAccount:
name: anyscale-operator

networking:
gateway:
enabled: true
name: "gateway"
namespace: "anyscale-operator"
apiVersion: "gateway.networking.k8s.io/v1"
hostname: "<gateway-address>"
# ip: "<gateway-ip>" # Use this instead of hostname if the LB provides an IP address

To customize the Helm chart with custom patches or additional pod shapes, see Configure the Helm chart for the Anyscale operator. To enable TPU support, see Leverage Cloud TPUs on GKE.

Install the Anyscale operator on GKE

Run the following command to install the Anyscale operator with Helm using your values.yaml file.

helm upgrade anyscale-operator anyscale/anyscale-operator \
--namespace anyscale-operator \
-f values.yaml \
--create-namespace \
--wait \
-i

Bind the workload identity

Run the following command to bind the Google Cloud service account to the Kubernetes service account for workload identity:

gcloud iam service-accounts add-iam-policy-binding anyscale-gke-nodes@<your_google_project_id>.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<your_google_project_id>.svc.id.goog[anyscale-operator/anyscale-operator]" \
--project <your_google_project_id>

It may take several minutes for your Anyscale cloud to be ready. You can watch the deployment status with the following command:

kubectl get deployments anyscale-operator -n anyscale-operator -w

7. Verify your Anyscale cloud

After the operator is ready, verify that your cloud is registered and functional:

anyscale cloud verify --name <your_cloud_name>