Skip to main content

Anyscale scheduler

Anyscale scheduler

The Anyscale scheduler intelligently acquires and distributes compute across your many Ray clusters, using your compute estate. It decides which workloads get compute, in what order, and on which resources, so a fixed pool of expensive accelerators can be shared across many teams with predictable priority and fairness.

An admin writes one declarative scheduler config; developers keep submitting jobs, services, and workspaces as they always have, optionally with a priority.

The scheduler shares compute across Anyscale jobs, services, and workspaces, and supports a mix of compute types (capacity reservations, spot instances, and on-demand instances) that you model as resource flavors. It works across:

  • VM-based clouds on AWS or Google Cloud
  • Kubernetes-based clouds on Amazon EKS, Google GKE, Azure AKS, or neoclouds

A single resource queue can pool capacity across these clouds and clusters at once, so workloads fall back across reservations, spot, on-demand, and providers according to the quotas you set.

The scheduler configuration is modeled after Kueue, Kubernetes' job-queueing system, so resource flavors and queues will look familiar if you've used it. See How this maps to Kueue.

note

Alpha. The scheduler is in alpha. Names and fields may change.

Who it's for

RoleWhat they do
Platform / org adminsAuthor the scheduler config: flavors, queues, quotas, scheduling rules, priority.
DevelopersSubmit workloads with a compute config, custom tags, and an optional priority. They don't touch the scheduler config.

Core concepts

A scheduler config has three main sections (all optional):

resource_flavors:   # the kinds of compute you have
resource_queues: # pools with quotas that hold workloads
scheduling_rules: # how workloads are routed to queues and prioritized

Resource flavor

A named category of compute, defined by a label selector over the workload's labels, for example spot vs. on-demand, or a specific GPU reservation. A workload's usage counts against the quota of the flavor it's assigned.

resource_flavors:
- name: spot
selector:
- { key: market-type, operator: in, values: [SPOT] }

Within a queue's resource group, flavors are tried in listed order and a workload takes the first matching flavor that has quota, so list more specific flavors first. If no flavor matches, the workload is rejected.

Resource queue

A named pool that holds workloads and caps their usage with quotas. A queue's quota is grouped by the resources it covers; for each flavor you set a nominal_quota per resource.

resource_queues:
- name: research
preemption:
within_resource_queue: lower_priority # higher-priority work can evict lower
resource_groups:
- covered_resources: [gpu]
flavors:
- name: spot
resources:
- { name: gpu, nominal_quota: 64 }

Within a flavor, a covered resource with no nominal_quota is unlimited; set a nominal_quota to cap it, or 0 to block it.

A queue with no resource_groups is a passthrough queue: it admits everything routed to it with no quota. With no config at all, every workload is admitted immediately.

Scheduling rule

Routes an incoming workload to a queue and sets its priority. Each rule has a selector over the workload's labels, a target resource_queue, and an optional priority_policy. Rules are evaluated top to bottom and the first match wins; a rule with no selector is a catch-all.

scheduling_rules:
- selector:
- { key: team, operator: in, values: [research] }
resource_queue: research
priority_policy: { default: 50, min: 0, max: 100, on_violation: reject }
- resource_queue: research # catch-all for everything else
priority_policy: { default: 0 }

priority_policy supplies a default priority and bounds requests to [min, max]. When a requested priority is out of range, on_violation: reject skips the rule and falls through to the next one, while on_violation: force_update clamps the priority into range. Priority is a non-negative integer; higher is more important.

note

Flavor and rule selectors: both resource flavors and scheduling rules use a selector over the workload's labels. Each is a list of match expressions: { key, operator, values } with operators in, not_in, exists, does_not_exist.

Preemption

A queue with preemption.within_resource_queue: lower_priority lets a higher-priority workload evict lower-priority ones in the same queue to free quota.

See the Config reference for every field.

How scheduling works

  1. A user submits a workload; Anyscale attaches labels to it.
  2. The scheduler evaluates scheduling rules top to bottom and routes the workload to a queue with an assigned priority (first match wins; no match → the workload fails).
  3. The workload waits in the queue, ordered by priority (highest first), then FIFO by submission time (oldest first within the same priority).
  4. When a matching flavor has quota, the workload is admitted and runs, preempting lower-priority work first if the queue allows it.

How this maps to Kueue

The config mirrors Kueue, so admins who are familiar with Kueue may recognize it:

Anyscale schedulerKueue
resource_flavorsResourceFlavor
resource_queuesClusterQueue
match expressionslabel matchExpressions
scheduling_rules(Anyscale-specific): centralized routing, instead of users picking a queue

What's available

The scheduler supports the following:

  • Author, validate, and version configs through CLI / SDK and the UI (config, stats, events).
  • One Kueue-compatible config for both VMs and Kubernetes.
  • Declarative compute configs with free pod shapes.
  • Routing with multi-queue and flavor mixing; custom resource types and tags.
  • Integer priorities and within-queue preemption.
  • Fixed quotas per queue and flavor.
  • Capacity reservations through advanced instance config.
  • Multi-cloud / multi-cluster: a new cloud is a cloud resource only; workloads can target any or all clouds.
  • Kueue and KAI pass-through on Kubernetes.
  • Observability: queue view, basic stats, real-time events.

Next steps