Anyscale scheduler
Anyscale scheduler
The Anyscale scheduler intelligently acquires and distributes compute across your many Ray clusters, using your compute estate. It decides which workloads get compute, in what order, and on which resources, so a fixed pool of expensive accelerators can be shared across many teams with predictable priority and fairness.
An admin writes one declarative scheduler config; developers keep submitting jobs, services, and workspaces as they always have, optionally with a priority.
The scheduler shares compute across Anyscale jobs, services, and workspaces, and supports a mix of compute types (capacity reservations, spot instances, and on-demand instances) that you model as resource flavors. It works across:
- VM-based clouds on AWS or Google Cloud
- Kubernetes-based clouds on Amazon EKS, Google GKE, Azure AKS, or neoclouds
A single resource queue can pool capacity across these clouds and clusters at once, so workloads fall back across reservations, spot, on-demand, and providers according to the quotas you set.
The scheduler configuration is modeled after Kueue, Kubernetes' job-queueing system, so resource flavors and queues will look familiar if you've used it. See How this maps to Kueue.
Alpha. The scheduler is in alpha. Names and fields may change.
Who it's for
| Role | What they do |
|---|---|
| Platform / org admins | Author the scheduler config: flavors, queues, quotas, scheduling rules, priority. |
| Developers | Submit workloads with a compute config, custom tags, and an optional priority. They don't touch the scheduler config. |
Core concepts
A scheduler config has three main sections (all optional):
resource_flavors: # the kinds of compute you have
resource_queues: # pools with quotas that hold workloads
scheduling_rules: # how workloads are routed to queues and prioritized
Resource flavor
A named category of compute, defined by a label selector over the workload's labels, for
example spot vs. on-demand, or a specific GPU reservation. A workload's usage counts against the
quota of the flavor it's assigned.
resource_flavors:
- name: spot
selector:
- { key: market-type, operator: in, values: [SPOT] }
Within a queue's resource group, flavors are tried in listed order and a workload takes the first matching flavor that has quota, so list more specific flavors first. If no flavor matches, the workload is rejected.
Resource queue
A named pool that holds workloads and caps their usage with quotas. A queue's quota is grouped
by the resources it covers; for each flavor you set a nominal_quota per resource.
resource_queues:
- name: research
preemption:
within_resource_queue: lower_priority # higher-priority work can evict lower
resource_groups:
- covered_resources: [gpu]
flavors:
- name: spot
resources:
- { name: gpu, nominal_quota: 64 }
Within a flavor, a covered resource with no nominal_quota is unlimited; set a nominal_quota
to cap it, or 0 to block it.
A queue with no resource_groups is a passthrough queue: it admits everything routed to it
with no quota. With no config at all, every workload is admitted immediately.
Scheduling rule
Routes an incoming workload to a queue and sets its priority. Each rule has a selector over
the workload's labels, a target resource_queue, and an optional priority_policy. Rules
are evaluated top to bottom and the first match wins; a rule with no selector is a catch-all.
scheduling_rules:
- selector:
- { key: team, operator: in, values: [research] }
resource_queue: research
priority_policy: { default: 50, min: 0, max: 100, on_violation: reject }
- resource_queue: research # catch-all for everything else
priority_policy: { default: 0 }
priority_policy supplies a default priority and bounds requests to [min, max]. When a
requested priority is out of range, on_violation: reject skips the rule and falls through to the
next one, while on_violation: force_update clamps the priority into range. Priority is a
non-negative integer; higher is more important.
Flavor and rule selectors: both resource flavors and scheduling rules use a selector over
the workload's labels. Each is a list of match expressions: { key, operator, values } with
operators in, not_in, exists, does_not_exist.
Preemption
A queue with preemption.within_resource_queue: lower_priority lets a higher-priority workload
evict lower-priority ones in the same queue to free quota.
See the Config reference for every field.
How scheduling works
- A user submits a workload; Anyscale attaches labels to it.
- The scheduler evaluates scheduling rules top to bottom and routes the workload to a queue with an assigned priority (first match wins; no match → the workload fails).
- The workload waits in the queue, ordered by priority (highest first), then FIFO by submission time (oldest first within the same priority).
- When a matching flavor has quota, the workload is admitted and runs, preempting lower-priority work first if the queue allows it.
How this maps to Kueue
The config mirrors Kueue, so admins who are familiar with Kueue may recognize it:
| Anyscale scheduler | Kueue |
|---|---|
resource_flavors | ResourceFlavor |
resource_queues | ClusterQueue |
| match expressions | label matchExpressions |
scheduling_rules | (Anyscale-specific): centralized routing, instead of users picking a queue |
What's available
The scheduler supports the following:
- Author, validate, and version configs through CLI / SDK and the UI (config, stats, events).
- One Kueue-compatible config for both VMs and Kubernetes.
- Declarative compute configs with free pod shapes.
- Routing with multi-queue and flavor mixing; custom resource types and tags.
- Integer priorities and within-queue preemption.
- Fixed quotas per queue and flavor.
- Capacity reservations through advanced instance config.
- Multi-cloud / multi-cluster: a new cloud is a cloud resource only; workloads can target any or all clouds.
- Kueue and KAI pass-through on Kubernetes.
- Observability: queue view, basic stats, real-time events.
Next steps
- Get started: write and apply your first config with the CLI or SDK.
- Config reference: every field, with an annotated example.