Skip to main content

Global Resource Scheduler

info

The Global Resource Scheduler is only supported on cloud deployments on VM's (EC2/GCE). For cloud deployments on Kubernetes, we recommend using Kueue to achieve similar functionality.

Introduction

The Global Resource Scheduler intelligently assigns workloads to fixed-capacity resource pools. It maximizes utilization, maintains fairness, and provides queuing for workloads that cannot start immediately for pools of resources like capacity reservations or customer-managed on-premises machines.

Many organizations rely on cloud capacity reservations or on-premises investments to secure hard-to-get hardware (for example, GPUs). Unlike spot or on-demand instances, capacity reservations provide a fixed block of capacity. With this information, Anyscale can make improved scheduling decisions—such as reserving 25% of machines for development and 75% for production—while allowing production workloads to use unused development capacity.

Architecture

The Global Resource Scheduler is built on top of Anyscale machine pools. A machine pool defines a fixed-size group of compute resources.

When workloads request machine pool instances, the Global Resource Scheduler evaluates these requests based on user-defined rules. It will then make a scheduling decision, which may involve:

  • Allocating machines from the machine pool to the workload.
  • Evicting machines being used by other workloads to make room for the incoming workload (if it is higher priority).
  • Queuing the workload until machines are available.

Both Anyscale-managed machine pools & customer-managed machine pools are supported by the Global Resource Scheduler. For more details, see Anyscale machine pools.

Example workflow

In this workflow, we will create and leverage Anyscale-managed machine pools that are backed by cloud capacity reservations on EC2. Anyscale will dynamically create & terminate cloud instances when necessary to move machines in and out of this machine pool. The Global Resource Scheduler will schedule workloads against these sets of machines.

Create a machine pool

Use the command below to create a machine pool for capacity reservations.

anyscale machine-pool create --name capacity-reservations

Attach the machine pool to clouds

Only clouds attached to the machine pool will be able to use the machines it exposes.

anyscale machine-pool attach --name capacity-reservations --cloud dev-cloud
anyscale machine-pool attach --name capacity-reservations --cloud prod-cloud

Configure the machine pool

Define instance types, capacity reservation information, and priorities / quotas in a machine pool configuration. For example:

# machine-pool-capacity-reservations.yaml
kind: ANYSCALE_MANAGED
machine_types:
- machine_type: RES-8CPU-32GB
recycle_policy:
# Rotate instances after 10 workloads.
max_workloads: 10
# Rotate instances that have been running for
# more than 60 minutes (note: rotation only
# happens when workloads are completed).
rotation_interval: 60m
# Shut down idle instances after 10 minutes.
max_idle_duration: 10m
launch_templates:
- instance_type: "m5.2xlarge"
market_type: ON_DEMAND
advanced_instance_config:
CapacityReservationSpecification:
CapacityReservationTarget:
CapacityReservationId: <your-capacity-reservation-id>
# 'zones' is optional. If not specified, the scheduler
# will attempt to provision instances in any zone in the
# cloud that is requesting this machine pool instance.
zones:
- us-west-2a
- us-west-2b
partitions:
- name: job
# The number of instances that this partition should contain.
size: 6
# Each rule can define a selector, priority, and optional quota.
# Rules are evaluated in order, and the first rule that matches
# will be used.
#
# Selectors can be used to select workloads based on labels
# (these are Kubernetes-style selectors; see below for details).
#
# Priority is an integer that defines the priority of the rule.
# The default priority is 0.
#
# Quotas are optional, and define a limit on the number of machines
# that can be allocated to workloads that fall under this rule.
rules:
- selector: workload-type in (job)
priority: 20
- selector: workload-type in (workspace)
priority: 10
quota: 3
- name: workspace
size: 4
rules:
- selector: workload-type in (workspace)
priority: 20
info

If using the Global Resource Scheduler with customer-managed machine pools, the kind field should be set to CUSTOMER_MANAGED and the machine_types field should be omitted. Then, the anyscalemachine command should be used to register on-premises machines into the pool.

Apply the configuration

anyscale machine-pool update --name capacity-reservations --spec-file machine-pool-capacity-reservations.yaml

Create a compute configuration and submit a job

Define a compute configuration requiring 10 machines. An example configuration might look like this:

# ten-node-config.yaml
cloud: CLOUD_NAME
head_node:
instance_type: "m5.2xlarge"
worker_nodes:
- name: reserved-cpu
instance_type: RES-8CPU-32GB
min_nodes: 10
max_nodes: 10
flags:
cloud_deployment:
machine_pool: capacity-reservations
flags:
# If the cluster doesn't satisfy min_nodes for > 5 minutes while
# in a RUNNING state, then terminate the workload (and re-queue
# it if it's a job).
workload_recovering_timeout: 5m
# If the cluster doesn't satisfy min_nodes for > 24h while in
# a STARTING state, then terminate the workload.
workload_starting_timeout: 24h

Create the compute configuration:

anyscale compute-config create --name ten-node-config -f ten-node-config.yaml

Prepare a job script:

# loop.py
import ray
import time

ray.init()

@ray.remote
def foo(i):
return i

for i in range(6000):
time.sleep(1)
print(f'Iteration {ray.get(foo.remote(i))}')

Submit the job:

anyscale job submit --name ten-node-job --compute-config ten-node-config --working-dir=. -- python loop.py
warning

If you see an error indicating scheduler cache is not ready, wait for the duration specified in the error message to pass before proceeding. This indicates that the scheduler is still initializing.

Verify the machine pool

Check that the machine pool shows the expected resources.

anyscale machine-pool describe --name capacity-reservations

Create a workspace

Once the job is running, create a workspace that requires 4 machines. This will evict 4 of the 10 machines used by the job. An example workspace configuration might be:

# four-node-config.yaml
cloud: CLOUD_NAME
head_node:
instance_type: "m5.2xlarge"
worker_nodes:
- name: reserved-cpu
instance_type: RES-8CPU-32GB
min_nodes: 4
max_nodes: 4
flags:
cloud_deployment:
machine_pool: capacity-reservations

Create the compute configuration:

anyscale compute-config create --name four-node-config -f four-node-config.yaml

Create the workspace:

anyscale workspace_v2 create --name four-node-workspace --compute-config four-node-config

Observe scheduler behavior

To observe scheduler behavior, describe the machine pool:

anyscale machine-pool describe --name capacity-reservations

You should see a clear breakdown of which machines are allocated to which workload.

At this point, feel free to submit additional jobs & create additional workspaces, and observe how the scheduler handles the requests. The scheduler will attempt to allocate machines to workloads based on the rules defined in the machine pool configuration.

Scale down and delete the machine pool

After workloads are finished, scale down the machine pool, detach it from clouds, and then delete it.

To scale down, update the configuration to:

kind: ANYSCALE_MANAGED
machine_types: []

Apply the update:

anyscale machine-pool update --name capacity-reservations --spec-file machine-pool-capacity-reservations.yaml

Detach from clouds:

anyscale machine-pool detach --name capacity-reservations --cloud dev-cloud
anyscale machine-pool detach --name capacity-reservations --cloud prod-cloud

Delete the machine pool:

anyscale machine-pool delete --name capacity-reservations

Additional notes

Gang scheduling

When compute configurations specify min_resources or min_nodes, the request is treated as an all-or-nothing (gang) request. The scheduler allocates resources only if the full request can be met, preventing partial allocations that might trigger unwanted preemptions.

Workload queuing

If there are insufficient machines, the workload is placed in a FIFO queue (displayed as STARTING or PENDING in the UI). In some cases, workloads may bypass the queue if they are runnable while earlier submissions are not.

Workload timeouts

Timeout flags in the compute configuration help manage queuing and recovery:

flags:
# If the cluster doesn't satisfy min_nodes for > 5 minutes while
# in a RUNNING state, then terminate the workload (and re-queue
# it if it's a job).
workload_recovering_timeout: 5m
# If the cluster doesn't satisfy min_nodes for > 24h while in
# a STARTING state, then terminate the workload.
workload_starting_timeout: 24h

These timeouts apply only to jobs and workspaces, not to services.

Falling back to cloud capacity

If machine pool instances are unavailable, workloads can fall back to SPOT or ON_DEMAND capacity. This is configured using the min_resources and instance_ranking_strategy flags. For example:

cloud: CLOUD_NAME
head_node:
instance_type: "m5.2xlarge"
worker_nodes:
- name: reserved-8CPU-32GB
instance_type: RESERVED-8CPU-32GB
cloud_deployment:
machine_pool: reserved-capacity
- name: spot-8CPU-32GB
instance_type: "m5.2xlarge"
market_type: SPOT
- name: on-demand-8CPU-32GB
instance_type: "m5.2xlarge"
market_type: ON_DEMAND
flags:
min_resources:
CPU: 64
instance_ranking_strategy:
- ranker_type: custom_group_order
ranker_config:
group_order:
- reserved-8CPU-32GB
- spot-8CPU-32GB
- on-demand-8CPU-32GB
enable_replacement: true
replacement_threshold: 30m

Observability

To debug scheduling behavior, view pending machine requests, and view recent cloud instance launch failures, use:

anyscale machine-pool describe --name <machine_pool_name>

When a machine is evicted, an eviction notice is sent to the affected workloads, detailing the cloud, project, user, and workload name.

Scheduling rules

Scheduling rules follow the Kubernetes selector language. Available labels include:

  • workload-type (one of job, service, workspace)
  • cloud (cloud name)
  • project (project name)

Examples

  • To select workloads of type “workspace” in cloud “dev-cloud”:
    workload-type in (workspace), cloud in (dev-cloud)

  • To select workloads of type “job”:
    workload-type=job

  • To select workloads of type “job” or “service”:
    workload-type in (job,service)

  • To select all workloads (since every workload has a workload-type label):
    workload-type

Billing

For Anyscale-managed machine pools, charges are incurred only when cloud instances are allocated (at cloud instance pricing). For customer-managed machine pools, standard machine pool pricing applies.

Known issues

  • Machine pool instances cannot be used as head nodes.