Global resource scheduler
The global resource scheduler intelligently assigns workloads to fixed-capacity resource pools. It maximizes utilization, maintains fairness, and provides queuing for workloads that cannot start immediately for pools of resources like capacity reservations or customer-managed on-premises machines.
Many organizations rely on cloud capacity reservations or on-premises investments to secure hard-to-get hardware (for example, GPUs). Unlike spot or on-demand instances, capacity reservations provide a fixed block of capacity. With this information, Anyscale can make improved scheduling decisions—such as reserving 25% of machines for development and 75% for production—while allowing production workloads to use unused development capacity.
Architecture
The global resource scheduler is built on top of Anyscale machine pools. A machine pool defines a fixed-size group of compute resources.
When workloads request machine pool instances, the global resource scheduler evaluates these requests based on user-defined rules. It will then make a scheduling decision, which may involve:
- Allocating machines from the machine pool to the workload.
- Evicting machines being used by other workloads to make room for the incoming workload (if it is higher priority).
- Queuing the workload until machines are available.
Both Anyscale-managed machine pools & customer-managed machine pools are supported by the global resource scheduler. For more details, see Anyscale machine pools.
Gang scheduling
When compute configurations specify min_resources
or min_nodes
, the request is treated as an all-or-nothing (gang) request. The scheduler allocates resources only if the full request can be met, preventing partial allocations that might trigger unwanted preemptions.
Workload queuing
If there are insufficient machines, the workload is placed in a FIFO queue (displayed as STARTING or PENDING in the UI). In some cases, workloads may bypass the queue if they are runnable while earlier submissions are not.
The scheduler preempts running jobs when a job with higher priority requires resources. When preempted, a job enters the queue with the following behaviors:
- The scheduler enqueues the job using its start time. Because the job was previously running, this typically means the job is at the front of the queue relative to other jobs of the same priority.
- The job attempts to acquire the resources defined by the
min_nodes
parameter. - The
workload_recovery_timeout
parameter defines how long the job continues to try to acquire resources. The job terminates due to timeout if it cannot acquire enough resources in the defined window. - If the
max_retries
parameter is greater than 0, the scheduler adds a new job to the end of the queue with a new workload start time.
Workload timeouts
Timeout flags in the compute configuration help manage queuing and recovery:
flags:
# If the cluster doesn't satisfy min_nodes for > 5 minutes while
# in a RUNNING state, then terminate the workload (and re-queue
# it if it's a job).
workload_recovering_timeout: 5m
# If the cluster doesn't satisfy min_nodes for > 24h while in
# a STARTING state, then terminate the workload.
workload_starting_timeout: 24h
These timeouts apply only to jobs and workspaces, not to services.
Falling back to cloud capacity
If machine pool instances are unavailable, workloads can fall back to SPOT
or ON_DEMAND
capacity. This is configured using the min_resources
and instance_ranking_strategy
flags. For example:
cloud: CLOUD_NAME
head_node:
instance_type: "m5.2xlarge"
worker_nodes:
- name: reserved-8CPU-32GB
instance_type: RESERVED-8CPU-32GB
cloud_deployment:
machine_pool: reserved-capacity
- name: spot-8CPU-32GB
instance_type: "m5.2xlarge"
market_type: SPOT
- name: on-demand-8CPU-32GB
instance_type: "m5.2xlarge"
market_type: ON_DEMAND
flags:
min_resources:
CPU: 64
instance_ranking_strategy:
- ranker_type: custom_group_order
ranker_config:
group_order:
- reserved-8CPU-32GB
- spot-8CPU-32GB
- on-demand-8CPU-32GB
enable_replacement: true
replacement_threshold: 30m
Observability
To debug scheduling behavior, view pending machine requests, and view recent cloud instance launch failures, use:
anyscale machine-pool describe --name <machine_pool_name>
When a machine is evicted, an eviction notice is sent to the affected workloads, detailing the cloud, project, user, and workload name.
Scheduling rules
Scheduling rules follow the Kubernetes selector language. Available labels include:
workload-type
(one ofjob
,service
,workspace
)cloud
(cloud name)project
(project name)
Examples
-
To select workloads of type “workspace” in cloud “dev-cloud”:
workload-type in (workspace), cloud in (dev-cloud)
-
To select workloads of type “job”:
workload-type=job
-
To select workloads of type “job” or “service”:
workload-type in (job,service)
-
To select all workloads (since every workload has a
workload-type
label):
workload-type
Billing
For Anyscale-managed machine pools, charges are incurred only when cloud instances are allocated (at cloud instance pricing). For customer-managed machine pools, standard machine pool pricing applies.
Known issues
- Machine pool instances cannot be used as head nodes.