Skip to main content

[Developer Preview] Job Queues

Check your docs version

This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.

Context

Anyscale Job Queues enable sophisticated scheduling and execution algorithms for Anyscale Jobs.

Job Queues enable the following capabilities for many workloads already using Anyscale Jobs:

  • Better resource utilization: Job Queues allow multiple jobs to be scheduled on a single Anyscale cluster for increased utilization of cluster resources (saving on cluster re-provisioning times).
  • Advanced scheduling algorithms: Job Queues support FIFO (first-in, first-out), LIFO (last-in, first-out), and priority-based scheduling.
note

Managed Job Queues are in developer preview and not recommended for production use-cases.

Use Job Queues at your discretion with the workloads that could tolerate inadvertent failures.

How it works

Anyscale Job Queues enable multiple Anyscale Jobs to be executed on a shared cluster, as compared to standalone Anyscale Jobs requiring provisioning of individual clusters for every job. In that case, lifecycle of the cluster is tied to the lifecycle of the Job Queue: new cluster will be provisioned with the first job from the queue being scheduled for execution and will be automatically terminated with the Job Queue being idle-terminated (if configured).

Every Anyscale Job scheduled to be executed via Managed Job Queue will go through following (simplified) lifecycle:

  1. Job is placed onto the target specified Job Queue (awaiting submission to a cluster for execution). Based on the scheduling policy of the particular queue, Anyscale determines the position of the job in the queue.
  2. Job is submitted for execution based on its position in the queue. No more than max_concurrency jobs will be running on a cluster concurrently.
  3. Job is executed until its completion, including any retries (up to configured max_retries setting).

This cycle repeats continuously until all jobs added to the Job Queue are completed.

The lifecycle of the Job Queue cluster is as follows:

  1. A new Ray cluster is provisioned when first job ready for execution is retrieved from the queue.
  2. Once all jobs in the Queue are completed, cluster enters an idle state.
  3. If no new jobs are submitted into the queue before idle-termination timer expiration, the Job Queue will be closed and the corresponding cluster terminated.
note

You can control termination of the Job Queue's cluster manually by setting job_queue_config.job_queue_spec.idle_timeout_sec to 0 in the Job Queue configuration (outlined below)

How to use managed job queues on Anyscale

Using Managed Job Queues on Anyscale Platform is as easy as using standalone Anyscale Jobs: only difference from the standalone Anyscale Job workflow is that Job Queue configuration need to be specified.

Job Queues are created and managed as virtual resources by Anyscale's Platform following purely declarative style of defining resources.

Creating managed job queue

To create Job Queue just specify Job Queue configuration in your job.yaml.

For example:

entrypoint: python hello_world.py
runtime_env:
working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"

job_queue_config:
# Job's relative priority (only relevant for Job Queues with "execution_mode" set to "PRIORITY").
# NOTE: Valid values range from 0 (highest) to +inf (lowest). Default value is None.
priority: 100
job_queue_spec:
# NOTE: User-provided name have to be *unique* w/in the used project (if any) so
# that other jobs could be associated with it
job_queue_name: new-job-queue-1
# Execution mode determines scheduling policy for jobs in the queue
# Currently supported modes are
# - "FIFO" (first-in, first-out)
# - "LIFO" (last-in, first-out)
# - "PRIORITY" (follows scheduling based on a priority of each individual job)
execution_mode: FIFO
# Max number of jobs from the queue that could be executed concurrently
max_concurrency: 1
# Config specifying max duration of queue and cluster being idle before being terminated
# (set this to 0 to disable idle-termination, in which case queue nor cluster won't be closed/terminated automatically)
idle_timeout_sec: 3600 # 1h

Only the first submitted job needs to provide corresponding Job Queue specification (job_queue_spec) to create Managed Job Queue. However, for convenience of programmatic submission of the jobs sharing the same YAML configuration template, providing the same (identical) job_queue_spec with the same job_queue_name will not create a new queue, but instead associate with already existing one (as long as configuration of the queue does not change).

note

Submitting jobs with different job_queue_spec but the same job_queue_name will fail as user-provided Job Queue names are enforced to be unique for active (within the project) Job Queues

info

For complete set of configurations exposed in both SDK and YAML job configuration template (both follow API model) please refer to API reference

Adding job to existing Managed Job Queue

To submit job(s) to an existing Job Queue, specify corresponding Job Queue identifier either as a user-provided job_queue_name (if specified during the job queue creation) or automatically generated queue id.

For example:

entrypoint: python hello_world.py
runtime_env:
working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"

job_queue_config:
# User-provided identifier of the queue (`job_queue_spec.job_queue_name`)
target_job_queue_name: new-job-queue-1

Job(s) are then submitted in the same way as a standalone Anyscale Job:

anyscale job submit job.yaml
(anyscale +1.2s) View the job in the UI at https://console.anyscale.com/jobs/prodjob_...
⠏ Waiting for a job run, current state is PENDING...

Configuring job queues

Similarly to Anyscale Jobs, Job Queues offer a great amount of flexibility allowing to precisely configure compute configuration, cluster environment of the cluster that jobs will be running on.

note

Job Queue configuration is immutable: once it's created Job Queue configuration could not be changed, and new Job Queue with the new configuration will need to be created.

Compute configuration

Target Compute Configuration can be specified using job_queue_spec.compute_config_id setting.

Cluster environment configuration

Cluster Environment is not configured explicitly for Job Queues and instead need to be specified as part of the individual Job's configuration.

note

Please note, that currently, Cluster Environment is fixed at the cluster start up time and can not be changed later on.

This entails that the Cluster Environment of the Job Queue is also fixed and determined by the configuration of the Anyscale Job creating the queue (first submitted, specifying job_queue_spec).

Support for running jobs with different Cluster Environments in the same Job Queue will be added in the future.

Job priority configuration

Job queues with execution mode PRIORITY rely on configuration of priority of individual jobs to determine overall execution ordering.

Job priority is expected to be set up in the range of [0, 2^64], where

  1. 0 is the highest priority
  2. 2^64 is the lowest priority
note

Jobs of the same priority are executed in the order of their arrival into the queue (ie FIFO).

Other configuration

note

Currently max_concurrency setting is limited to 1.