Skip to main content

Kubernetes Helm configuration reference

Kubernetes Helm configuration reference

This page provides a complete reference for all configuration parameters available in the Anyscale operator Helm chart.

important

This page describes the Helm chart parameters introduced in Anyscale operator version 1.0.0.

If you have a deployment of the Anyscale operator configured with an earlier version, see Migrate from legacy Anyscale operator Helm charts.

How to use this reference

Anyscale provides a default values.yaml file with settings required for the operator. Don't modify the Anyscale-provided file. Instead, create your own custom values file with only the parameters you need to configure.

Parameters in this reference fall into three categories:

  1. Required user configuration: Parameters you must set in your custom values file.
  2. Optional user configuration: Parameters you can optionally set to customize behavior.
  3. Anyscale-managed defaults: Parameters with default values that you usually shouldn't change. You can override these if needed, but contact Anyscale support for guidance.

See Configure the Helm chart for the Anyscale operator for workflow examples and Deploy Anyscale on Kubernetes for deployment procedures.

Required user configuration

You must set the following parameters in your custom values file for the Anyscale operator to function:

ParameterTypeDescriptionExample
global.cloudDeploymentIdstringAnyscale cloud deployment ID created when you register the cloud. Retrieve using the following command:

anyscale cloud config get --name <cloud_name>
cldrsrc_abcdefgh12345678ijklmnop12
global.cloudProviderstringCloud provider environment. Allowed values:
  • aws - Amazon Web Services.
  • gcp - Google Cloud.
  • azure - Microsoft Azure.
  • generic - Other Kubernetes environments.
aws
global.aws.regionstringAWS only: AWS region for deployment. Required when using workload identity authentication (when global.auth.anyscaleCliToken isn't provided).us-west-2
global.auth.anyscaleCliTokenstringAnyscale CLI token for control plane authentication.

Required for Azure and generic deployments. Not required for EKS or GKE when using workload identity.
tkn_1234567890abcdef
global.auth.iamIdentitystringCloud provider IAM identity:
  • AWS: IAM role ARN.
  • Google Cloud: Service account email.
  • Azure: Managed identity.
  • AWS: arn:aws:iam::123456789012:role/anyscale-operator-role
  • Google Cloud: anyscale-operator@project.iam.gserviceaccount.com
  • Azure: /subscriptions/00000000-1111-2222-3333-444444444444/resourcegroups/anyscale-rg/providers/Microsoft.ManagedIdentity/userAssignedIdentities/anyscale-operator-identity

Optional user configuration

The following sections describe parameters you can optionally set in your custom values file to customize operator behavior.

Instance types

Anyscale uses the term instance types to describe compute resources such as virtual machines. For Kubernetes, you define custom pod resource configurations that match your node pools and accelerators.

ParameterDescriptionExample
workloads.instanceTypes.enableDefaultsWhether to enable default instance types provided by Anyscale. Set to false to use only custom instance types. Default: true.
workloads:
instanceTypes:
enableDefaults: true
workloads.instanceTypes.additionalCustom pod resource configurations. Define your own instance types matching your node pools and accelerators. These merge with defaults if enableDefaults is true.
workloads:
instanceTypes:
additional:
16CPU-64GB-2xA100:
resources:
CPU: 16
GPU: 2
memory: 64Gi
accelerators:
- A100-40G
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-a100

For complete guidance on sizing and configuring instance types, see Configure custom instance types.

Networking configuration

Configure ingress, gateway, and DNS settings for accessing dashboards and services.

important

Anyscale recommends using ingress for most deployments. See Configure networking.

ParameterDescription
networking.ingress.addressDNS address (IP or hostname) for ingress. By default, Anyscale reads this from the Ingress resource status field. Override if needed for your environment.
networking.gateway.enabledSet to true to use gateway for load balancing instead of ingress controller. Default: false.
networking.gateway.nameName of the gateway resource when using gateway mode.
networking.gateway.ip / networking.gateway.hostnameGateway address. Provide either the IP or hostname.
networking.gateway.apiVersionGateway API version. Supports gateway.networking.k8s.io/v1 (Kubernetes Gateway API) or networking.istio.io/v1alpha3 (Istio). Default: gateway.networking.k8s.io/v1.

High availability

Configure operator redundancy for production deployments.

ParameterDescription
operator.replicasNumber of operator replicas. Values greater than 1 enable leader election for high availability. Default: 1.
workloads.enableAnyscaleRayHeadNodePDBCreate PodDisruptionBudget to prevent head node evictions. This can block Kubernetes cluster upgrades. Default: true. See Configure high availability with PodDisruptionBudgets.

Workload features

Configuration options for workload behavior and scheduling.

ParameterDescription
workloads.enableKarpenterSupportEnable support for Karpenter autoscaler on AWS. Configures appropriate node selectors and scheduling parameters. Default: false.
workloads.enableZoneSelectorEnable zone-based node selection using topology.kubernetes.io/zone nodeSelector. Disabled by default because many cluster autoscalers don't respect this selector. Default: false.
workloads.serviceAccount.nameService account assigned to Anyscale workload pods. Leave empty to use the default service account.
workloads.serviceAccount.iamMappingAnnotationAnnotation key used to identify pods that use IAM mapping. Default: anyscale.com/iam-mapping.

Advanced features

Additional configuration options for specialized deployments.

ParameterDescription
patchesCustom JSON patches applied to pods and other Kubernetes resources. Use for cluster-specific customization. See Apply custom patches.
operator.nodeSelector / operator.affinity / operator.tolerationsControl operator pod placement. Useful for dedicating nodes to the operator.
global.aws.s3.usePathStyleUse path-style S3 URLs instead of virtual-hosted-style. Enable for S3-compatible storage such as MinIO. Default: false.

Resource overrides

Override default CPU and memory allocations if your environment requires different values.

ParameterDescription
operator.container.resourcesCPU and memory requests/limits for the operator container. See Configure operator resources for examples.
operator.vector.resourcesCPU and memory requests/limits for the Vector sidecar container. See Configure operator resources for examples.
operator.config.kubernetesClient.rateLimiter.qpsKubernetes API server query per second rate limit. Increase for large workloads. Default: 1000.
operator.config.kubernetesClient.rateLimiter.burstKubernetes API server burst rate limit. Increase for large workloads. Default: 2000.

Anyscale-managed defaults

The following parameters have default values managed by Anyscale. You can override these if needed, but most deployments should use the defaults. Contact Anyscale support before modifying these values.

Operator configuration

Anyscale provides default operator configuration values.

ParameterDescriptionDefault
operator.container.image.registryDocker registry for the operator image.us-docker.pkg.dev
operator.container.image.imageDocker image path for the Anyscale operator.anyscale-artifacts/public/kubernetes_manager
operator.container.image.tagDocker image tag for the Anyscale operator. Each operator version requires a specific image tag.Version-specific (example: ci-fe4d6d5d24deb02caa5ea9b60c8e837ac1ca9e05)
operator.vector.image.registryDocker registry for the Vector sidecar. Empty string means docker.io.""
operator.vector.image.imageDocker image path for the Vector telemetry sidecar that forwards logs and metrics to the Anyscale control plane.timberio/vector
operator.vector.image.tagDocker image tag for the Vector sidecar.0.40.0-debian

Default instance types

Anyscale provides default pod resource configurations. Use workloads.instanceTypes.additional to create custom instance types instead of modifying these.

ParameterDescriptionDefault configurations
workloads.instanceTypes.defaultsDefault pod resource configurations provided by Anyscale.
  • 2CPU-8GB: 2 CPUs, 8GB memory
  • 4CPU-16GB: 4 CPUs, 16GB memory
  • 8CPU-32GB: 8 CPUs, 32GB memory
  • 8CPU-32GB-1xT4: 8 CPUs, 32GB memory, 1 T4 GPU

Accelerator mappings

Maps Ray accelerator types to cloud-specific node labels for scheduling.

Cloud providerSupported accelerators
AWSV100, T4, L4, A10G, L40S, A100-40G, A100-80G, H100
AzureT4, A10, A100, H100
Google CloudT4, L4, A100-40G, A100-80G, H100, H100-MEGA

Control accelerator behavior with these parameters:

  • workloads.accelerator.enableDefaults: Enable default accelerator mappings. Default: true.
  • workloads.accelerator.customNodeSelectorKey: Custom node selector key instead of cloud provider defaults.
  • workloads.accelerator.nodeSelectors: Accelerator type to node selector value mappings per cloud provider.
  • workloads.accelerator.tolerations.default: Tolerations applied to all GPU/accelerator workloads.

Market type configuration

The operator includes default market type configurations using JSON patches. These configurations handle spot and on-demand instance scheduling based on your cloud provider.

Control market type behavior with these parameters:

  • workloads.marketType.enableDefaults: Enable default market type patches. Default: true.
  • workloads.marketType.generic: Patches for any cloud provider.
  • workloads.marketType.aws: AWS-specific patches.
  • workloads.marketType.gcp: Google Cloud-specific patches.
  • workloads.marketType.azure: Azure-specific patches.
  • workloads.marketType.karpenter: Karpenter-specific patches (overrides cloud provider defaults when workloads.enableKarpenterSupport is true).

Other managed defaults

ParameterDescriptionDefault
operator.config.status.reportingEnabledEnable status reporting to Anyscale control plane.true
operator.config.status.checkIntervalHow often the operator checks status.5m
operator.config.status.reportIntervalHow often the operator reports status to the control plane.30s
operator.config.unscheduledPodReaper.reconcileIntervalHow often the operator checks for unscheduled pods.1m
operator.config.unscheduledPodReaper.terminationThresholdTime before terminating pods that remain unscheduled.10m
operator.config.status.excludeComponentVerificationSkip verification of specific components during operator startup. Options: STORAGE_BUCKET, KUBERNETES_VERSION, GATEWAY_RESOURCES, CLOUD_RESOURCES, IAM_IDENTITY, KUBERNETES_PERMISSIONS.[] (all components verified)
operator.nameName of the operator. Used to identify resources related to the operator in the Kubernetes cluster.anyscale-operator

Common configuration examples

The following examples show complete custom values files for common deployment scenarios. Save these as your custom values file (for example, my-custom-values.yaml) in YAML format and use with helm install or helm upgrade.

Basic AWS deployment

Minimal configuration for AWS EKS (save as my-custom-values.yaml):

global:
cloudDeploymentId: "cldrsrc_abcdefgh12345678ijklmnop12"
cloudProvider: "aws"
aws:
region: "us-west-2"
auth:
iamIdentity: "arn:aws:iam::123456789012:role/anyscale-operator-role"

Google Cloud with custom instance types

Google Cloud deployment with additional GPU instance types:

global:
cloudDeploymentId: "cldrsrc_abcdefgh12345678ijklmnop12"
cloudProvider: "gcp"
auth:
iamIdentity: "anyscale-operator@my-project.iam.gserviceaccount.com"

workloads:
instanceTypes:
additional:
16CPU-64GB-2xA100:
resources:
CPU: 16
GPU: 2
memory: 64Gi
accelerators:
- A100-40G
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-a100
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"

High availability configuration

Production setup with multiple replicas and resource limits:

global:
cloudDeploymentId: "cldrsrc_abcdefgh12345678ijklmnop12"
cloudProvider: "aws"
aws:
region: "us-west-2"
auth:
iamIdentity: "arn:aws:iam::123456789012:role/anyscale-operator-role"

# Enable HA with 3 replicas
operator:
replicas: 3

# Increase resources for production workloads
container:
resources:
requests:
memory: 1Gi
cpu: 2
limits:
memory: 4Gi

# Dedicated node pool for operator
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"

Gateway configuration with Istio

Using Istio gateway instead of ingress:

global:
cloudDeploymentId: "cldrsrc_abcdefgh12345678ijklmnop12"
cloudProvider: "gcp"
auth:
iamIdentity: "anyscale-operator@my-project.iam.gserviceaccount.com"

# Enable gateway mode
networking:
gateway:
enabled: true
name: "anyscale-gateway"
hostname: "anyscale.example.com"
apiVersion: "networking.istio.io/v1alpha3"

Source files

The complete Helm chart source is available on GitHub:

Migrate from legacy Anyscale operator Helm charts

If you have an Anyscale operator configured using version 0.9.2 or below, you must migration your Helm chart values before upgrading to version 1.0.0.

See the Anyscale operator Helm chart migration guide.