Skip to main content

Global resource scheduler

The global resource scheduler intelligently assigns workloads to fixed-capacity resource pools. It maximizes utilization, maintains fairness, and provides queuing for workloads that cannot start immediately for pools of resources like capacity reservations or customer-managed on-premises machines.

Many organizations rely on cloud capacity reservations or on-premises investments to secure hard-to-get hardware (for example, GPUs). Unlike spot or on-demand instances, capacity reservations provide a fixed block of capacity. With this information, Anyscale can make improved scheduling decisions—such as reserving 25% of machines for development and 75% for production—while allowing production workloads to use unused development capacity.

Architecture

The global resource scheduler is built on top of Anyscale machine pools. A machine pool defines a fixed-size group of compute resources.

When workloads request machine pool instances, the global resource scheduler evaluates these requests based on user-defined rules. It will then make a scheduling decision, which may involve:

  • Allocating machines from the machine pool to the workload.
  • Evicting machines being used by other workloads to make room for the incoming workload (if it is higher priority).
  • Queuing the workload until machines are available.

Both Anyscale-managed machine pools & customer-managed machine pools are supported by the global resource scheduler. For more details, see Anyscale machine pools.

Gang scheduling

When compute configurations specify min_resources or min_nodes, the request is treated as an all-or-nothing (gang) request. The scheduler allocates resources only if the full request can be met, preventing partial allocations that might trigger unwanted preemptions.

Workload queuing

If there are insufficient machines, the workload is placed in a FIFO queue (displayed as STARTING or PENDING in the UI). In some cases, workloads may bypass the queue if they are runnable while earlier submissions are not.

The scheduler preempts running jobs when a job with higher priority requires resources. When preempted, a job enters the queue with the following behaviors:

  • The scheduler enqueues the job using its start time. Because the job was previously running, this typically means the job is at the front of the queue relative to other jobs of the same priority.
  • The job attempts to acquire the resources defined by the min_nodes parameter.
  • The workload_recovery_timeout parameter defines how long the job continues to try to acquire resources. The job terminates due to timeout if it cannot acquire enough resources in the defined window.
  • If the max_retries parameter is greater than 0, the scheduler adds a new job to the end of the queue with a new workload start time.

Workload timeouts

Timeout flags in the compute configuration help manage queuing and recovery:

flags:
# If the cluster doesn't satisfy min_nodes for > 5 minutes while
# in a RUNNING state, then terminate the workload (and re-queue
# it if it's a job).
workload_recovering_timeout: 5m
# If the cluster doesn't satisfy min_nodes for > 24h while in
# a STARTING state, then terminate the workload.
workload_starting_timeout: 24h

These timeouts apply only to jobs and workspaces, not to services.

Falling back to cloud capacity

If machine pool instances are unavailable, workloads can fall back to SPOT or ON_DEMAND capacity. This is configured using the min_resources and instance_ranking_strategy flags. For example:

cloud: CLOUD_NAME
head_node:
instance_type: "m5.2xlarge"
worker_nodes:
- name: reserved-8CPU-32GB
instance_type: RESERVED-8CPU-32GB
cloud_deployment:
machine_pool: reserved-capacity
- name: spot-8CPU-32GB
instance_type: "m5.2xlarge"
market_type: SPOT
- name: on-demand-8CPU-32GB
instance_type: "m5.2xlarge"
market_type: ON_DEMAND
flags:
min_resources:
CPU: 64
instance_ranking_strategy:
- ranker_type: custom_group_order
ranker_config:
group_order:
- reserved-8CPU-32GB
- spot-8CPU-32GB
- on-demand-8CPU-32GB
enable_replacement: true
replacement_threshold: 30m

Observability

To debug scheduling behavior, view pending machine requests, and view recent cloud instance launch failures, use:

anyscale machine-pool describe --name <machine_pool_name>

When a machine is evicted, an eviction notice is sent to the affected workloads, detailing the cloud, project, user, and workload name.

Scheduling rules

Scheduling rules follow the Kubernetes selector language. Available labels include:

  • workload-type (one of job, service, workspace)
  • cloud (cloud name)
  • project (project name)

Examples

  • To select workloads of type “workspace” in cloud “dev-cloud”:
    workload-type in (workspace), cloud in (dev-cloud)

  • To select workloads of type “job”:
    workload-type=job

  • To select workloads of type “job” or “service”:
    workload-type in (job,service)

  • To select all workloads (since every workload has a workload-type label):
    workload-type

Billing

For Anyscale-managed machine pools, charges are incurred only when cloud instances are allocated (at cloud instance pricing). For customer-managed machine pools, standard machine pool pricing applies.

Known issues

  • Machine pool instances cannot be used as head nodes.