Anyscale operator for Kubernetes
The Anyscale operator for Kubernetes enables deploying the Anyscale platform on Kubernetes clusters on Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), Oracle Kubernetes Engine (OKE), Azure Kubernetes Service (AKS), CoreWeave, or other Kubernetes clusters running in the cloud or on-prem. See the diagram below for a high-level overview of the Anyscale operator:

View resources the Anyscale operator operates on.
Namespaced resources
- Pods: each Anyscale / Ray node maps to a single pod.
- Services + Ingresses: used for head node connectivity (user laptop -> Ray dashboard) and for exposing Anyscale Services (user laptop -> Anyscale Service). Ingresses may be either private or public.
- Secrets: used to hold secrets used by the Anyscale operator.
- ConfigMaps: used to store configuration options for the Anyscale operator.
- Events: used to enhance workload observability.
Global resources
- TokenReview: On the startup of an Anyscale node in an Anyscale workload, Anyscale uses the Kubernetes TokenReview API to verify a pod's identity when the pod bootstraps itself to the Anyscale control plane.
- Nodes: The operator periodically reads node information to enhance workload observability.
Installing the Helm chart for the Anyscale operator requires permissions to create cluster roles and cluster role bindings, which grant the Anyscale operator the necessary permissions to manage the preceding global resources. If you don't have these permissions, consider deploying Anyscale inside of vCluster in a Namespace of your choice.
Deployment modes
Cloud-native mode (only supported for AWS and GCP)
Cloud-native mode comes with first-class support for all Anyscale features, but requires setting up additional peripheral cloud resources (S3 buckets, IAM roles, etc.) before deploying the Anyscale operator. At this time, cloud-native mode is only supported on AWS and GCP. See the Terraform modules for a reference on these peripheral cloud resources required for cloud registration.
Cloud-agnostic mode (supported for any Kubernetes cluster)
Cloud-agnostic mode is more flexible and doesn't necessarily require setting up peripheral cloud resources. However, some Anyscale features, such as viewing logs through the Anyscale console, may be missing or unsupported unless the relevant cloud resources have been provided.
If running on EKS or GKE, use cloud-native mode when possible.
Prerequisites
- A Kubernetes cluster.
- Use Kubernetes v1.28 or later when possible. Earlier versions may work, but aren't fully tested.
- Permissions to deploy a Helm chart into the Kubernetes cluster.
- The name of the Kubernetes Namespace that you would like to deploy the Anyscale operator inside of.
- An ingress controller. Use the Ingress-NGINX controller when possible. Other ingress controllers may work as well, but aren't fully tested. When using the Ingress-NGINX controller, the allow-snippet-annotations option should be set to
true
in the NGINX config map. This is used by Anyscale services.- For direct networking, configure an internet-facing load balancer.
- For customer-defined networking, configure an internal load balancer.
- In some cases, an annotation on the LoadBalancer service in front of the NGINX pods can be applied to configure internal load balancing.
- As a reference, see this link for the difference between direct and customer-defined networking modes on the AWS VM stack (+ the pros/cons of each approach).
- Egress to the internet from Anyscale pods deployed into the Kubernetes cluster. This is a requirement of all Anyscale deployments.
- If using GPU's, appropriate Nvidia drivers and device plugins (references: EKS, GKE, AKS).
- Cloud-native, AWS
- Cloud-native, GCP
- Cloud-agnostic
- An S3 bucket for system and artifact storage.
- The Anyscale operator and all Pods created by the operator must have direct access to this storage bucket.
- See object storage bucket permissions for additional details.
- An IAM role for the Anyscale operator to use, for the purposes of verifying the operator identity.
- (Optional, highly recommended) An EFS mount target with subnets and security group allowing communication from the EKS cluster.
- Anyscale uses Amazon EFS for Anyscale Workspaces persistence, as well as cluster shared storage.
- If an EFS mount target is not provided, Workspaces persistence and cluster shared storage will be disabled.
See https://registry.terraform.io/modules/anyscale/anyscale-foundation-modules/kubernetes/latest for a reference on provisioning the core cloud resources required for cloud registration.
- A GCS bucket for system and artifact storage.
- The Anyscale operator and all Pods created by the operator must have direct access to this storage bucket.
- See object storage bucket permissions for additional details.
- The project ID of the Google Project that contains the target Kubernetes cluster.
- A service account for the Anyscale operator, for the purposes of verifying the operator identity.
- (Optional, highly recommended) A Filestore mount target in the same region as the GKE cluster.
- Anyscale uses GCP Filestore for Anyscale Workspaces persistence, as well as cluster shared storage.
- If a Filestore mount target is not provided, Workspaces persistence and cluster shared storage will be disabled.
- The Filestore mount target IP should be accessible from the GKE cluster. No additional Filestore-related permissions for the Anyscale operator are required.
See https://registry.terraform.io/modules/anyscale/anyscale-foundation-modules/kubernetes/latest for a reference on provisioning the core cloud resources required for cloud registration.
- (Optional, highly recommended) A cloud storage bucket. Supported storage buckets include S3 or S3-compatible buckets, Google Cloud Storage buckets, and Azure Blob Storage containers (
s3://<bucket-name>
,gs://<bucket-name>
, orazure://<container-name>
).- Anyscale uses this cloud storage bucket for persisting various system artifacts in the customer account, including runtime environment uploads from
anyscale job
andanyscale service
CLI commands. - For S3-compatible storage, if desired, provide an endpoint URL to override the default
AWS_ENDPOINT_URL
. - For Azure Blob Storage, an endpoint URL of the form
https://<storage-account-name>.blob.core.windows.net
is required. - The Anyscale operator and all Pods created by the operator must have direct access to this storage bucket.
- See object storage bucket permissions for additional details.
- Anyscale uses this cloud storage bucket for persisting various system artifacts in the customer account, including runtime environment uploads from
- (Optional, highly recommended) An NFS mount target.
- Anyscale uses NFS for Anyscale Workspaces persistence, as well as cluster shared storage.
- If an NFS mount target is not provided, Workspaces persistence and cluster shared storage will be disabled.
- If desired, a path to pass into the NFS volume specification.
NOTE: Some Anyscale features, such as log viewing through the UI, may not be supported in cloud-agnostic mode at this time.
Permissions
The Anyscale operator requires the following permissions to be able to run Ray workloads on Kubernetes.
Kubernetes Permissions
The Anyscale operator must be run with a Kubernetes Service Account that has permissions to operate on a handful of core Kubernetes resources. For details on these permissions, see the Role
and ClusterRole
in the Anyscale operator Helm Chart.
Object Storage Bucket Permissions
The Anyscale operator (and Pods created by the Anyscale operator) must have access to the object storage bucket that is used for system and artifact storage. The Anyscale operator must additionally have the ability to generate presigned URLs for reading and writing artifacts to the object storage bucket.
Access to this storage bucket may be granted in a variety of ways, depending on the environment in which the Kubernetes cluster is running and the provider of the storage bucket.
- AWS S3
- Google Cloud Storage
- Azure
We recommend following these references to grant Anyscale workloads access to an AWS S3 bucket:
- If deploying in EKS: Learn how EKS Pod Identity grants pods access to AWS services
- If deploying on-prem: Extend AWS IAM roles to workloads outside of AWS with IAM Roles Anywhere
We recommend following these references to grant Anyscale workloads access to a Google Cloud Storage bucket:
- If deploying in GKE: About Workload Identity Federation for GKE
- If deploying on-prem: Workload Identity Federation
On GKE, the Anyscale operator must have the Service Account Token Creator role. This is required to generate presigned URLs for objects in the storage bucket.
We recommend following these references to grant Anyscale workloads access to Azure Storage:
Deployment
Add the Anyscale Helm chart repository.
helm repo add anyscale https://anyscale.github.io/helm-charts
helm repo update anyscale
Before registering and deploying the Anyscale operator, review ways to customize the Helm chart to modify the deployment. For example you can:
Then, sign in to your Anyscale account using anyscale login
, and proceed with the following steps:
- Cloud-native, AWS
- Cloud-native, GCP
- Cloud-agnostic
anyscale cloud register --name <cloud-name> \
--provider aws \
--region <region> \
--compute-stack k8s \
--kubernetes-zones <comma-separated-zones> \
--anyscale-operator-iam-identity <anyscale-operator-iam-role-arn> \
--cloud-storage-bucket-name s3://<cloud-storage-bucket-name> \
--file-storage-id <efs-id>
helm upgrade <release-name> anyscale/anyscale-operator \
--set-string cloudDeploymentId=<cloud-deployment-id> \
--set-string cloudProvider=aws \
--set-string region=<region> \
--set-string workloadServiceAccountName=anyscale-operator \
--namespace <namespace> \
--create-namespace \
-i
anyscale cloud register --name <cloud-name> \
--provider gcp \
--region <region> \
--compute-stack k8s \
--kubernetes-zones <comma-separated-zones> \
--anyscale-operator-iam-identity <anyscale-operator-service-account-email> \
--cloud-storage-bucket-name gs://<cloud-storage-bucket-name> \
--project-id <project-id> \ # (Optional) only required if using Filestore NFS mounts
--vpc-name <vpc-name> \ # (Optional) used to discover Filestore NFS mount targets
--file-storage-id <filestore-instance-id> \ # (Optional) the Filestore Instance ID
--filestore-location <filestore-location> # (Optional) the Filestore location
helm upgrade <release-name> anyscale/anyscale-operator \
--set-string cloudDeploymentId=<cloud-deployment-id> \
--set-string cloudProvider=gcp \
--set-string region=<region> \
--set-string operatorIamIdentity=<anyscale-operator-service-account-email> \
--set-string workloadServiceAccountName=anyscale-operator \
--namespace <namespace> \
--create-namespace \
-i
gcloud iam service-accounts add-iam-policy-binding <anyscale-operator-service-account-email> \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<project-id>.svc.id.goog[<namespace>/anyscale-operator]"
# Acquire an ANYSCALE_CLI_TOKEN from the Anyscale console, and set it as an environment variable.
export ANYSCALE_CLI_TOKEN=<cli-token>
anyscale cloud register --name <cloud-name> \
--provider generic \
--compute-stack k8s \
--region <region> \
--cloud-storage-bucket-name <(s3://, gs://, or azure://cloud-storage-bucket-name)> \
--cloud-storage-bucket-endpoint <(https://object.lga1.coreweave.com/ or https://<storage-account-name>.blob.core.windows.net, for example)> \
--nfs-mount-target <(passed to the "server" attr. of the NFS volume spec)> \
--nfs-mount-path <(passed to the "path" attr. of the NFS volume spec)>
helm upgrade <release-name> anyscale/anyscale-operator \
--set-string cloudDeploymentId=<cloudDeploymentId> \
--set-string cloudProvider=generic \
--set-string anyscaleCliToken=$ANYSCALE_CLI_TOKEN \
--namespace <namespace> \
--create-namespace \
-i
The helm upgrade
command requires a cloud deployment ID, which is emitted when you register the cloud. If you forget your cloud deployment ID, you can retrieve it using anyscale cloud config get --name <cloud-name>
.
At this point, the Anyscale operator should come up and start posting health checks to the Anyscale Control Plane. You should be ready to run workloads as you normally would on Anyscale clouds.
Try to submit a job to verify the Anyscale operator installation:
anyscale job submit --cloud <cloud-name> --working-dir https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip -- python hello_world.py
Uninstall the Anyscale operator
View uninstallation instructions
To uninstall the Anyscale operator, run the following command:
helm uninstall <release-name> -n <namespace>
kubectl delete namespace <namespace>
To delete the cloud, run the following command:
anyscale cloud delete --name <cloud-name>
Known limitations
Cloud deployments on Kubernetes do not support:
- A single service cannot run multiple applications using different container images. Multi-application services are only supported when all applications share the same container image.
- Attaching machines from machine pools
- Container & instance-level optimizations for accelerated cluster startup (fast model loading is still supported)
- Workspaces SSH / Local VS Code