Configure the Helm chart for the Anyscale operator
Configure the Helm chart for the Anyscale operator
This page provides an overview of configuring Helm chart values to control how Helm installs the Anyscale operator on your Kubernetes cluster.
You must configure Helm chart parameters when deploying an Anyscale cloud to Kubernetes for the first time. You can also update settings and apply changes to an existing Anyscale cloud. See Deploy Anyscale on Kubernetes.
This page describes the Helm chart parameters introduced in Anyscale operator version 1.0.0.
If you have a deployment of the Anyscale operator configured with an earlier version, see Migrate from legacy Anyscale operator Helm charts.
Configuration workflow overview
Anyscale provides a values.yaml
file with default settings required for the operator to function correctly. Don't modify the Anyscale-provided values file directly. Instead, create your own custom values file (for example, my-custom-values.yaml
) to configure settings specific to your environment.
When you run helm install
or helm upgrade
with your custom values file, Helm merges your custom settings with Anyscale's defaults. This approach ensures that:
- You receive updated default values when upgrading to new operator versions.
- Your custom configurations persist across upgrades.
- You can version-control your configurations separately from Anyscale's defaults.
Anyscale requires default or Anyscale-provided values for some Helm chart parameters. These required values might change between operator versions.
For assistance configuring your Anyscale operator, contact Anyscale support.
See the following resources for more details:
- The Anyscale operator Helm chart GitHub repository includes a
values.yaml
file with parameters and docstrings. - For parameter descriptions and default values, see Kubernetes Helm configuration reference.
Apply custom configurations for your Anyscale operator
Complete the following steps to configure and deploy the Anyscale operator:
Specific configuration requirements and recommendations might differ based on how you've configured your Kubernetes cluster.
Some configurations might require updates to settings or IAM permissions in your cloud provider account.
To successfully customize the Anyscale operator, you should be familiar with your existing Kubernetes environment and have admin permissions in your cloud provider account.
-
Create a custom values file (for example,
my-custom-values.yaml
) with your configuration:# Required: Global configuration
global:
cloudDeploymentId: "cldrsrc_abcdefgh12345678ijklmnop12"
cloudProvider: "aws"
aws:
region: "us-west-2"
auth:
iamIdentity: "arn:aws:iam::123456789012:role/anyscale-operator-role"
# Optional: Add custom instance types
workloads:
instanceTypes:
additional:
16CPU-64GB-2xA100:
resources:
CPU: 16
GPU: 2
memory: 64Gi
accelerators:
- A100-40G
nodeSelector:
custom-node-selector: value
tolerations:
- key: "custom-taint"
operator: "Exists"
effect: "NoSchedule"
# Optional: Enable Karpenter support
enableKarpenterSupport: true -
For initial installation, run the following command:
helm install anyscale-operator anyscale/anyscale-operator \
-n <namespace> \
-f my-custom-values.yaml -
To upgrade to a new operator version with your existing configuration:
helm repo update
helm upgrade anyscale-operator anyscale/anyscale-operator \
-n <namespace> \
-f my-custom-values.yaml
Don't use --reuse-values
alone when upgrading.
The --reuse-values
flag uses old chart default values and doesn't pick up important updates from Anyscale (such as IngressClass collision prevention). Always use the -f
flag with your custom values file, which automatically merges with the latest chart defaults.
If you previously used --set
flags and need to preserve those values while getting new defaults, use --reset-then-reuse-values
instead of --reuse-values
.
Configure custom instance types
When running on Kubernetes, an Anyscale instance type maps to a Pod shape with specific CPU, memory, and accelerator resources. You define instance types through the Helm chart to make them available in the Anyscale console and for compute configs.
Anyscale recommends using the workloads.instanceTypes.additional
parameter to define custom resources. See Add custom instance types.
You can disable defaults entirely with workloads.instanceTypes.enableDefaults: false
. You can also override the workloads.instanceTypes.defaults
parameter but this can make troubleshooting with Anyscale support more difficult.
For complete parameter details, see Instance types.
How to size instance type for Anyscale on Kubernetes
You must correctly size your pod specs to ensure they respect CPU and memory reservations required by Anyscale and Kubernetes.
When the Anyscale operator applies a Pod spec to Kubernetes for an Anyscale workload, the operator uses the shapes defined in the Instance Type ConfigMap as an upper bound for the sum of all of the memory requests and limits across all containers in the pod. Anyscale reserves some memory and CPU for critical-path Anyscale sidecar containers and provides the rest to the Ray container to run the primary workload.
Kubernetes reserves a portion of CPU and memory for running the Kubelet and other Kubernetes system components. To accommodate for CPU usage by Kubernetes and Anyscale, the pod shapes defined in the ConfigMap must be smaller than the actual node shape. See Reserve Compute Resources for System Daemons in the Kubernetes documentation for more details.
As an example, an AWS m5.4xlarge
virtual machine has 16 vCPUs and 64 GiB memory. Anyscale recommends configuring this instance as a 14CPU-56GB
pod. Attempting to define the pod as 16CPU-64GB
might make the pod unschedulable.
Add custom instance types
Anyscale recommends using the workloads.instanceTypes.additional
parameter to define all your custom instance types. This keeps your configurations separate from Anyscale's defaults.
Instance type names can include alphanumeric characters, dashes, and underscores. The Anyscale console updates with new instance types approximately every 30 seconds.
The following example shows a configuration for an A100 for GKE:
workloads:
instanceTypes:
additional:
16CPU-64GB-2xA100:
resources:
CPU: 16
GPU: 2
memory: 64Gi
accelerators:
- A100-40G
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-a100
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
The accelerators
list must contain Ray-supported accelerators.
Accelerators must map to the workloads.accelerator.nodeSelectors
values for your cloud provider. Not all accelerators are available in all regions.
See Instance types for a list of supported accelerators by cloud and more details.
For an example configuring support for TPUs on GKE, see Leverage Cloud TPUs on GKE.
Anyscale default instance types for Kubernetes
Anyscale recommends against modifying default instance types. Use the workloads.instanceTypes.additional
field to configure your own instance types.
Anyscale defines the following instance types by default:
workloads:
instanceTypes:
enableDefaults: true # Set to false to disable all defaults
defaults:
2CPU-8GB:
resources:
CPU: 2
memory: 8Gi
4CPU-16GB:
resources:
CPU: 4
memory: 16Gi
8CPU-32GB:
resources:
CPU: 8
memory: 32Gi
8CPU-32GB-1xT4:
resources:
CPU: 8
GPU: 1
memory: 32Gi
accelerators:
- T4
Set up Karpenter autoscaling
Karpenter is an open source node provisioning project built for Kubernetes on AWS. When you enable Karpenter support, the Anyscale operator configures appropriate node selectors and scheduling parameters for proper pod placement with Karpenter-managed nodes.
To enable Karpenter support:
-
Add the Karpenter flag to your custom values file:
workloads:
enableKarpenterSupport: true # Default is false. -
Configure taints and labels on your node groups following the Karpenter documentation.
-
Apply the configuration using
helm upgrade
with your custom values file as shown in Apply custom configurations for your Anyscale operator.
For complete parameter details, see workloads.enableKarpenterSupport in the Helm reference.
Consider using the Anyscale Terraform provider for Kubernetes to automate Karpenter configuration.
Configure high availability with PodDisruptionBudgets
PodDisruptionBudgets (PDBs) prevent cluster maintenance from evicting your Ray head nodes. This ensures your workloads remain stable but requires special procedures for cluster upgrades.
To enable PDB protection, set the following parameter:
workloads:
enableAnyscaleRayHeadNodePDB: true # Default is true.
To prevent eviction of head nodes for running Ray clusters, PDBs block rolling Kubernetes cluster upgrades.
With PDBs enabled, you must pause or terminate all running Anyscale services, jobs, and workspaces to upgrade your Kubernetes cluster.
Configure networking
You must allow ingress to your Kubernetes cluster to support features such as dashboards and Anyscale services.
Anyscale recommends using ingress for most deployments. Anyscale also supports Istio and Kubernetes gateway custom resource definitions. Use a gateway if you already have gateway infrastructure in place or other requirements mandate gateway usage. See Networking configuration.
By default, the Anyscale operator configures ingress automatically. To manually specify the DNS address, set the following:
networking:
ingress:
address: "<IP_ADDRESS_OR_HOSTNAME>"
Anyscale uses this address for the following:
- User access to the Ray Dashboard through the Anyscale console.
- DNS resolution for Anyscale services.
- Service endpoint connectivity.
Configure operator resources
You can customize CPU and memory allocations for the operator and Vector telemetry sidecar. The default values work for most deployments.
operator:
container:
resources:
requests:
cpu: 1
memory: 512Mi
limits:
memory: 2Gi
vector:
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
memory: 512Mi
For all resource parameters, see Operator configuration in the Helm reference.
Apply custom patches
The Patch API lets you customize Anyscale-managed resources for your specific Kubernetes environment. Use patches to handle variations in spot instances, accelerators, or other cluster-specific requirements.
Patches use JSON Patch syntax (IETF RFC 6902).
Example: Add node selector for on-demand instances
patches:
- kind: Pod
# See: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
selector: "anyscale.com/market-type in (ON_DEMAND)"
# See: https://jsonpatch.com/
patch:
- op: add
path: /spec/nodeSelector/eks.amazonaws.com~1capacityType # use ~1 to escape the forward-slash
value: "ON_DEMAND"
This patch adds the eks.amazonaws.com/capacityType
node selector to all on-demand pods. The operator applies patches to resources that match the Kubernetes selector.
View all annotations provided by Anyscale that you can use for custom patches
The Anyscale control plane applies these annotations on resources created by Anyscale.
Label Name | Possible Label Values | Description |
---|---|---|
anyscale.com/market-type | SPOT, ON_DEMAND | Users with workloads that support preemption may opt to run their workloads on spot node types through the compute config. All other workloads are run on on-demand node types. This should most likely be transformed into a node affinity. |
anyscale.com/zone | user-defined through cloud setup | For Pods that have a specific zone affinity, the Anyscale operator sets this label to the zone that the Pod should be launched into (us-west-2a , for example). Zones are provided as []string at cloud registration time and can be selected from the Anyscale UI. This should most likely be transformed into a node affinity. |
anyscale.com/accelerator-type | user-defined through instance type configuration | When requesting a GPU Pod, the Anyscale operator sets one of the following values: Anyscale accelerator types. |
anyscale.com/instance-type | user-defined through instance type configuration | The operator sets this value for all Pods created through Anyscale. |
anyscale.com/canary-weight anyscale.com/canary-exists anyscale.com/canary-svc anyscale.com/ingress-type anyscale.com/bearer-token anyscale.com/primary-weight anyscale.com/primary-svc | various | For advanced use only (when using an ingress other than NGINX for inference / serving workloads with Anyscale services). Contact Anyscale for more details. |
View a sample of common advanced configuration options
{
"metadata": {
// Add a new label.
"labels": {"new-label": "example-value"},
// Add a new annotation.
"annotations": {"new-annotation": "example-value"}
},
"spec": {
// Add a node selector.
"nodeSelector": {"disktype": "ssd"},
"tolerations": [{
"effect": "NoSchedule",
"key": "dedicated",
"value": "example-anyscale"
}]
"containers": [{
// Add a PersistentVolumeClaim to the Ray container.
"name": "ray",
"volumeMounts": [{
"name": "pvc-volume",
"mountPath": "/mnt/pvc-data"
}]
},{
// Add a sidecar for exporting logs and metrics.
"name": "monitoring-sidecar",
"image": "timberio/vector:latest",
"ports": [{
"containerPort": 9000
}],
"volumeMounts": [{
"name": "vector-volume",
"mountPath": "/mnt/vector-data"
}]
}],
"volumes": [{
"name": "pvc-volume",
"persistentVolumeClaim": {
"claimName": "my-pvc"
}
},{
"name": "vector-volume",
"emptyDir": {}
}]
}
}