Configure machine pools for Kubernetes

This page describes how to configure an Anyscale machine pool for an Anyscale cloud deployed on Amazon EKS or Google GKE. The global resource scheduler manages machines across reserved node groups on your Kubernetes cluster using taints, tolerations, and patches applied by the Anyscale operator.

See Share compute resources with Anyscale machine pools and What is the global resource scheduler?.

note

Machine pools and the global resource scheduler are in beta release.

important

Anyscale supports machine pools on Kubernetes only for Anyscale clouds deployed on Amazon EKS or Google GKE. Anyscale doesn't support machine pools on AKS or other Kubernetes distributions.

How do machine pools work on Kubernetes?

On Kubernetes, a machine pool corresponds to one or more node groups that you dedicate to Anyscale workloads managed by the global resource scheduler. You taint those node groups with the machine pool ID so that only pods tolerating the taint can schedule on them.

The Anyscale operator applies an annotation with the machine pool ID to every pod that the global resource scheduler launches. You add a patch to the Anyscale operator that matches this annotation and injects the matching toleration. Anyscale workloads deployed through the global resource scheduler launch pods using the reserved node groups. Pods that request the same instance types outside the global resource scheduler don't receive the toleration and schedule on other available node groups.

The steps on this page show one approach to configuring taints and tolerations with operator patches. Your Kubernetes environment might require a different combination of taints, labels, and node selectors.

Requirements

Complete the following before configuring a machine pool on Kubernetes:

An Anyscale cloud deployed on Amazon EKS or Google GKE with the Anyscale operator installed. See Anyscale on Kubernetes.
Permissions to taint node groups and modify the Anyscale operator Helm values for your cluster.
You must have the following permissions:
- Organization owner for your Anyscale organization.
- Cloud collaborator or cloud owner for your Anyscale cloud.
The Anyscale CLI, authenticated to your Anyscale organization. See Get started with the Anyscale CLI.

Step 1: Create the machine pool

Run the following command to create an empty machine pool and capture the machine pool ID from the output:

anyscale machine-pool create --name reserved-nodes

The machine pool ID has the format mp_<random-id>. Record this value. You use this ID when tainting node groups and configuring the operator.

Step 2: Attach the machine pool to your cloud

Attach the machine pool to the Anyscale cloud deployed on Kubernetes:

anyscale machine-pool attach --name reserved-nodes --cloud <cloud-name>

Step 3: Taint your reserved node groups

In your Kubernetes cluster, apply a taint to each node group you want to reserve for the global resource scheduler. Use anyscale-machine-pool-id as the taint key and the machine pool ID as the value:

key: anyscale-machine-pool-id
value: mp_123xyz
effect: NoSchedule

The exact mechanism depends on your cloud provider and node group tooling. Apply the taint through your managed node group configuration, Karpenter NodePool, Terraform module, or equivalent.

note

Anyscale recommends also setting anyscale-machine-pool-id: mp_123xyz as a label on the tainted node groups. This makes the node groups selectable by other tools in your cluster.

Step 4: Define custom instance types (optional)

If your reserved nodes use a shape that isn't in the Anyscale operator's default instance types, define a matching instance type in your Helm values file so that compute configs can request it. See Configure custom instance types.

The following example defines a 7CPU-16GB instance type:

workloads:
  instanceTypes:
    additional:
      7CPU-16GB:
        resources:
          CPU: 7
          memory: 16Gi

Step 5: Add a toleration patch for the machine pool

Add a patch to your Helm values file that injects the toleration for your machine pool taint. The operator applies this patch to every pod it launches through the global resource scheduler because those pods carry the anyscale.com/machine-pool-id annotation. See Apply custom patches.

patches:
  - kind: Pod
    selector: "anyscale.com/machine-pool-id in (mp_123xyz)"
    patch:
      - op: add
        path: /spec/tolerations/-
        value:
          key: "anyscale-machine-pool-id"
          operator: "Equal"
          value: "mp_123xyz"
          effect: "NoSchedule"

Apply the updated Helm values with helm upgrade. See Apply custom configurations for your Anyscale operator.

note

Pods that request the same instance types outside the global resource scheduler don't receive this annotation, so they don't receive the toleration. They schedule on other available node groups and never preempt nodes reserved for the machine pool.

Step 6: Apply the machine pool configuration

Create a machine pool configuration that describes the instance types and partitions the global resource scheduler should manage. The following example reserves four machines of type 7CPU-16GB and assigns priorities based on project:

# example-config.yaml
kind: ANYSCALE_MANAGED
machine_types:
  - machine_type: RES_7CPU_16GB
    launch_templates:
      - instance_type: "7CPU-16GB"
        market_type: ON_DEMAND
    partitions:
      - name: project-partition
        size: 4
        rules:
          - selector: project in (high-prio)
            priority: 1
          - selector: project in (low-prio)
            priority: 0
            quota: 2

For the full set of configuration options, including rules and selectors, see Machine pool configuration file reference and Scheduling rules.

Apply the configuration:

anyscale machine-pool update --name reserved-nodes --spec-file example-config.yaml

Verify and use the machine pool

Run the following command to confirm the machine pool is ready:

anyscale machine-pool describe --name reserved-nodes

To run workloads against the machine pool, reference the machine type and pool name in your compute config. See Configure an Anyscale-managed machine pool.

How do machine pools work on Kubernetes?​

Requirements​

Step 1: Create the machine pool​

Step 2: Attach the machine pool to your cloud​

Step 3: Taint your reserved node groups​

Step 4: Define custom instance types (optional)​

Step 5: Add a toleration patch for the machine pool​

Step 6: Apply the machine pool configuration​

Verify and use the machine pool​

How do machine pools work on Kubernetes?

Requirements

Step 1: Create the machine pool

Step 2: Attach the machine pool to your cloud

Step 3: Taint your reserved node groups

Step 4: Define custom instance types (optional)

Step 5: Add a toleration patch for the machine pool

Step 6: Apply the machine pool configuration

Verify and use the machine pool