[Developer Preview] Job Queues
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Context
Anyscale Job Queues enable sophisticated scheduling and execution algorithms for Anyscale Jobs.
Job Queues enable the following capabilities for many workloads already using Anyscale Jobs:
- Better resource utilization: Job Queues allow multiple jobs to be scheduled on a single Anyscale cluster for increased utilization of cluster resources (saving on cluster re-provisioning times).
- Advanced scheduling algorithms: Job Queues support FIFO (first-in, first-out), LIFO (last-in, first-out), and priority-based scheduling.
Managed Job Queues are in developer preview and not recommended for production use-cases.
Use Job Queues at your discretion with the workloads that could tolerate inadvertent failures.
How it works
Anyscale Job Queues enable multiple Anyscale Jobs to be executed on a shared cluster, as compared to standalone Anyscale Jobs requiring provisioning of individual clusters for every job. In that case, lifecycle of the cluster is tied to the lifecycle of the Job Queue: new cluster will be provisioned with the first job from the queue being scheduled for execution and will be automatically terminated with the Job Queue being idle-terminated (if configured).
Every Anyscale Job scheduled to be executed via Managed Job Queue will go through following (simplified) lifecycle:
- Job is placed onto the target specified Job Queue (awaiting submission to a cluster for execution). Based on the scheduling policy of the particular queue, Anyscale determines the position of the job in the queue.
- Job is submitted for execution based on its position in the queue. No more than
max_concurrency
jobs will be running on a cluster concurrently. - Job is executed until its completion, including any retries (up to configured
max_retries
setting).
This cycle repeats continuously until all jobs added to the Job Queue are completed.
The lifecycle of the Job Queue cluster is as follows:
- A new Ray cluster is provisioned when first job ready for execution is retrieved from the queue.
- Once all jobs in the Queue are completed, cluster enters an idle state.
- If no new jobs are submitted into the queue before idle-termination timer expiration, the Job Queue will be closed and the corresponding cluster terminated.
You can control termination of the Job Queue's cluster manually by setting job_queue_config.job_queue_spec.idle_timeout_sec
to 0 in the Job Queue configuration (outlined below)
How to use managed job queues on Anyscale
Using Managed Job Queues on Anyscale Platform is as easy as using standalone Anyscale Jobs: only difference from the standalone Anyscale Job workflow is that Job Queue configuration need to be specified.
Job Queues are created and managed as virtual resources by Anyscale's Platform following purely declarative style of defining resources.
Creating managed job queue
To create Job Queue just specify Job Queue configuration in your job.yaml
.
For example:
entrypoint: python hello_world.py
runtime_env:
working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"
job_queue_config:
# Job's relative priority (only relevant for Job Queues with "execution_mode" set to "PRIORITY").
# NOTE: Valid values range from 0 (highest) to +inf (lowest). Default value is None.
priority: 100
job_queue_spec:
# NOTE: User-provided name have to be *unique* w/in the used project (if any) so
# that other jobs could be associated with it
job_queue_name: new-job-queue-1
# Execution mode determines scheduling policy for jobs in the queue
# Currently supported modes are
# - "FIFO" (first-in, first-out)
# - "LIFO" (last-in, first-out)
# - "PRIORITY" (follows scheduling based on a priority of each individual job)
execution_mode: FIFO
# Max number of jobs from the queue that could be executed concurrently
max_concurrency: 1
# Config specifying max duration of queue and cluster being idle before being terminated
# (set this to 0 to disable idle-termination, in which case queue nor cluster won't be closed/terminated automatically)
idle_timeout_sec: 3600 # 1h
Only the first submitted job needs to provide corresponding Job Queue specification (job_queue_spec
) to create Managed Job Queue. However, for convenience of programmatic submission of the jobs sharing the same YAML configuration template, providing the same
(identical) job_queue_spec
with the same job_queue_name
will not create a new queue, but instead associate with already existing one (as long as configuration of the queue does not change).
Submitting jobs with different job_queue_spec
but the same job_queue_name
will fail as user-provided Job Queue names are enforced to be unique for active (within the project) Job Queues
For complete set of configurations exposed in both SDK and YAML job configuration template (both follow API model) please refer to API reference
Adding job to existing Managed Job Queue
To submit job(s) to an existing Job Queue, specify corresponding Job Queue identifier either as a user-provided job_queue_name
(if specified during the job queue creation) or automatically generated queue id.
For example:
- job.yaml
- job.yaml
entrypoint: python hello_world.py
runtime_env:
working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"
job_queue_config:
# User-provided identifier of the queue (`job_queue_spec.job_queue_name`)
target_job_queue_name: new-job-queue-1
entrypoint: python hello_world.py
runtime_env:
working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"
job_queue_config:
# Anyscale internal Job Queue identifier
target_job_queue_id: jq_...
Job(s) are then submitted in the same way as a standalone Anyscale Job:
anyscale job submit job.yaml
(anyscale +1.2s) View the job in the UI at https://console.anyscale.com/jobs/prodjob_...
⠏ Waiting for a job run, current state is PENDING...
Configuring job queues
Similarly to Anyscale Jobs, Job Queues offer a great amount of flexibility allowing to precisely configure compute configuration, cluster environment of the cluster that jobs will be running on.
Job Queue configuration is immutable: once it's created Job Queue configuration could not be changed, and new Job Queue with the new configuration will need to be created.
Compute configuration
Target Compute Configuration can be specified using job_queue_spec.compute_config_id
setting.
Cluster environment configuration
Cluster Environment is not configured explicitly for Job Queues and instead need to be specified as part of the individual Job's configuration.
Please note, that currently, Cluster Environment is fixed at the cluster start up time and can not be changed later on.
This entails that the Cluster Environment of the Job Queue is also fixed and determined by the configuration of the Anyscale Job creating the queue (first submitted, specifying job_queue_spec
).
Support for running jobs with different Cluster Environments in the same Job Queue will be added in the future.
Job priority configuration
Job queues with execution mode PRIORITY
rely on configuration of priority of individual jobs to determine overall execution ordering.
Job priority is expected to be set up in the range of [0, 2^64]
, where
0
is the highest priority2^64
is the lowest priority
Jobs of the same priority are executed in the order of their arrival into the queue (ie FIFO).
Other configuration
Currently max_concurrency
setting is limited to 1.