Skip to main content

Anyscale on Kubernetes

Anyscale on Kubernetes

This page provides an overview of the Anyscale operator for Kubernetes and the requirements for deploying it. For step-by-step deployment instructions, use the guides in the table below.

Anyscale supports deployment on Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), CoreWeave Kubernetes Service (CKS), Nebius AI Cloud Managed Service for Kubernetes, Oracle Kubernetes Engine (OKE), and other Kubernetes clusters running in the cloud or on-premises.

Deployment guides

Choose the deployment guide for your Kubernetes environment. Each guide covers the full deployment path from prerequisite configuration through operator installation and verification.

Kubernetes environmentAutomated setupManual register
Amazon EKSDeploy on EKS (cloud setup)Deploy on EKS (cloud register)
Google Kubernetes EngineDeploy on GKE (cloud setup)Deploy on GKE (cloud register)
Azure Kubernetes ServiceDeploy on AKS (cloud setup)Deploy on AKS (cloud register)
CoreWeaveDeploy on CoreWeave
NebiusDeploy on Nebius
Other (non-managed Kubernetes)Deploy on non-managed Kubernetes

Automated setup (anyscale cloud setup) creates cloud resources and installs the operator in a single CLI command. It's in beta and recommended for self-service onboarding on EKS, GKE, and AKS. All features available on each platform work whether you use automated setup or manual register.

Manual register (anyscale cloud register) gives you full control. You provision and configure all resources yourself, then register the cloud with Anyscale.

Ingress and gateway controllers

Anyscale on Kubernetes requires an ingress or gateway controller for external traffic routing. Anyscale recommends Envoy Gateway for all new deployments. Traditional Ingress controllers remain available for teams that can't adopt Gateway API.

note

ingress-nginx is end of life. Don't use it for new deployments.

ControllerEKSGKEAKSCoreWeaveNebius
Envoy GatewayValidatedValidatedValidatedValidatedValidated
TraefikLikelyLikelyLikely??
HAProxyLikelyLikelyLikely??
IstioComplexComplexComplex??
ingress-nginxDeprecatedDeprecatedDeprecatedDeprecatedDeprecated

Status values:

  • Validated: Anyscale has tested this controller and published setup instructions.
  • Likely: Anyscale engineering expects this to work with standard Kubernetes configuration. Not yet validated end-to-end by Anyscale.
  • Complex: Works but requires substantial custom configuration not covered in Anyscale documentation.
  • Deprecated: End of life. Don't use for new deployments.
  • ?: No validation data available.

What is the Anyscale operator for Kubernetes?

The Anyscale operator for Kubernetes manages the relationship between the Anyscale control plane and your Kubernetes cluster. When you deploy Anyscale on Kubernetes, you configure a control plane role, networking, and security to allow the Anyscale control plane to use the Anyscale operator to manage resources in your Kubernetes cluster.

When you deploy the Anyscale operator on Kubernetes, you install the Anyscale operator to your Kubernetes cluster. You interact with the Anyscale control plane to configure workspaces, jobs, and services. The control plane sends instructions to the Anyscale operator to deploy Ray nodes using pods in your Kubernetes cluster.

The following diagram provides a high-level overview of the architecture of Anyscale on Kubernetes:

Diagram showing Anyscale control plane communicating with the Anyscale operator inside a Kubernetes cluster, which manages Ray node pods.

Namespaced resources used by the Anyscale operator

The Anyscale operator uses the following namespaced resources in your Kubernetes cluster:

  • Pods: Each Anyscale node maps to a single pod.
  • Services, Ingresses, and HTTPRoutes: Used for head node connectivity and for exposing Anyscale services. Ingresses are used with Ingress controllers. HTTPRoutes are used with Gateway API.
  • Secrets: Used to hold secrets used by the Anyscale operator.
  • ConfigMaps: Used to store configuration options for the Anyscale operator.
  • Events: Used to enhance workload observability.

Global resources used by the Anyscale operator

The Anyscale operator uses the following global resources in your Kubernetes cluster:

  • TokenReview: On the startup of an Anyscale node in an Anyscale workload, Anyscale uses the Kubernetes TokenReview API to verify a pod's identity when the pod bootstraps itself to the Anyscale control plane.
  • Nodes: The operator periodically reads node information to enhance workload observability.

Installing the Helm chart for the Anyscale operator requires permissions to create cluster roles and cluster role bindings, which grant the Anyscale operator the necessary permissions to manage these global resources. If you don't have these permissions, deploy Anyscale inside vCluster in a namespace of your choice.

Features missing from Anyscale on Kubernetes

Most Anyscale features have full support for Kubernetes deployments, with the following exceptions:

  • Some optimization features for accelerated cluster startup aren't available.
  • For zero downtime upgrades to Anyscale services, you must use an ingress controller that supports patching routing weights.
  • Head node fault tolerance requires you to provision your own Redis-compatible cluster. Anyscale doesn't auto-provision external storage for Kubernetes deployments. See Clouds created with cloud register.
note

If you don't have access to desired Anyscale features, your admin might have deployed the Anyscale operator without the required resources, networking, or permissions. Contact Anyscale support to troubleshoot your deployment.

Object storage and IAM for Kubernetes deployments

All Anyscale deployments on Kubernetes require access to a cloud object storage location to persist production artifacts, including cluster logs, workspace snapshots, workload checkpoints, and cached container images. All pods, the Anyscale operator, and the Anyscale control plane must have permissions to read and write files to this storage location.

If you use a managed Kubernetes service, configure the default object storage location using resources in the same account. See Requirements for Anyscale on managed Kubernetes services.

If you're deploying to a custom Kubernetes cluster such as on-premises, you can choose an object storage location in any cloud provider. See the following docs for details:

Requirements for Anyscale on managed Kubernetes services

You can deploy the Anyscale operator to Kubernetes services managed by AWS, Azure, or Google Cloud.

You must configure IAM permissions and a storage location in your cloud provider account.

Cloud provider managed Kubernetes serviceDefault storage locationIAM requirements
Amazon Elastic Kubernetes Service (EKS)An S3 bucket, ideally in the same region and account for simplified setup and reduced ingress and egress costs.
  • An IAM role that the Anyscale operator and control plane can use to manage infrastructure in EKS and connect to your S3 bucket.
  • An instance profile that nodes can assume to grant access to your S3 bucket.

For detailed IAM configuration, see IAM guide for EKS.

Also see Learn how EKS Pod Identity grants pods access to AWS services.
Google Kubernetes Engine (GKE)A Google Cloud Storage (GCS) bucket, ideally in the same region and project for simplified setup and reduced ingress and egress costs.
  • A service account that the Anyscale operator and control plane can use to manage infrastructure in GKE and connect to your GCS bucket.
  • The Anyscale operator must have the Service Account Token Creator role to generate presigned URLs for objects in the storage bucket.
  • A service account that nodes can assume to grant access to your GCS bucket.

For detailed IAM configuration, see IAM guide for GKE.
See About Workload Identity Federation for GKE.
Azure Kubernetes Service (AKS)A blob storage container, ideally in the same region and account for simplified setup and reduced ingress and egress costs.

Requirements for Anyscale on other Kubernetes clusters

Anyscale supports deploying the operator to most Kubernetes clusters, including on-premises.

To get full access to Anyscale platform features, you must configure a default storage account and IAM permissions in AWS, Azure, or Google Cloud alongside your custom or on-premises Kubernetes cluster. See Object storage and IAM for Kubernetes deployments.

General requirements

Consider the following requirements and recommendations when deploying Anyscale on Kubernetes:

  • Use a Kubernetes cluster v1.30 or later when possible.
  • Grant Anyscale permissions to deploy a Helm chart into the Kubernetes cluster.
  • Provide a Kubernetes service account with permissions to operate core Kubernetes resources.
  • Identify the target Kubernetes namespace to deploy the Anyscale operator.
  • Install an ingress or gateway controller for external traffic routing. Anyscale recommends Envoy Gateway for all new deployments. See Ingress and gateway controllers for supported controllers.
  • Configure egress to the internet from Anyscale pods deployed into the Kubernetes cluster.
  • If using GPUs with EKS or AKS, configure the k8s-device-plugin. This isn't required for GKE.
  • Configure your load balancer and networking rules. For direct networking, configure an internet-facing load balancer that opens port 443 access to the head pod. For custom networking, configure an internal load balancer.

Uninstall the Anyscale operator

To uninstall the Anyscale operator, run the following command on your Kubernetes cluster:

helm uninstall <release-name> -n <namespace>
kubectl delete namespace <namespace>

To delete the Anyscale cloud, run the following command from your Anyscale CLI:

anyscale cloud delete --name <cloud-name>