Job queues
A job queue enables sophisticated scheduling and execution algorithms for Anyscale Jobs. This feature improves resource utilization and reduces provisioning times by enabling multiple jobs to share a single cluster.
Anyscale supports flexible scheduling algorithms, including FIFO (first-in, first-out), LIFO (last-in, first-out), and priority-based scheduling.
Job processing
Anyscale job queues optimize resource utilization and throughput by using sophisticated scheduling to run multiple jobs on the same cluster.
- Submission: The typical Anyscale Job submission workflow adds the job to the specified queue.
- Scheduling: Based on the scheduling policy, Anyscale determines ordering of the jobs in the queue and picks jobs at the top of the queue for scheduling. Anyscale schedules no more than the specified
max-concurrency
jobs for running on a cluster at the same time. - Execution: Jobs run until completion, including retries up to the specified number of
max_retries
.
Anyscale provisions a cluster when you submit the first job in a queue, and continues running until there are no more jobs in the queue and it idles.
Create a job queue
Creating a job queue is similar to creating a standalone Anyscale Job. In your job.yaml
file, specify additional job queue configurations:
- CLI
- Python SDK
Use the following syntax to create a job config YAML file that includes a job queue config:
entrypoint: python main.py
name: <job-name>
# Specify a compute_config for the cluster.
# compute_config must be the same for all jobs in a given queue.
# Each job can use a different image_uri but all images must use the same Ray and Python versions.
compute_config: <compute-config-name>:<version>
image_uri: <image-uri>:<version>
job_queue_config:
priority: 100 # Valid when `execution_mode: PRIORITY`; 0 is highest priority, 2^64 is lowest. Jobs of equal priority execute in arrival order.
job_queue_spec:
name: <job-queue-name>
execution_mode: PRIORITY # Scheduling algorithm; can also be FIFO (first-in, first-out) or LIFO (last-in, first-out).
max_concurrency: 5 # Max number of jobs that can run concurrently; limit 100.
idle_timeout_s: 3600 # Set to 0 to disable idle termination.
Submit your job using the following CLI command:
anyscale job submit --config-file /path/to/job-config.yaml
Use the following syntax to define a submit a job config with a job queue config using the SDK:
import anyscale
from anyscale.job.models import JobConfig, JobQueueConfig, JobQueueSpec, JobQueueExecutionMode
config = JobConfig(
name="<job-name>",
entrypoint="python main.py",
compute_config="<compute-config-name>:<version>",
image_uri="<image-uri>:<version>",
job_queue_config=JobQueueConfig(
# Valid when `execution_mode: PRIORITY`; 0 is highest priority, 2^64 is lowest.
# Jobs of equal priority execute in arrival order.
priority=100,
job_queue_spec=JobQueueSpec(
name="<job-queue-name>",
# Scheduling algorithm; can also be FIFO (first-in, first-out) or LIFO (last-in, first-out).
execution_mode=JobQueueExecutionMode.PRIORITY,
# Max number of jobs that can run concurrently; limit 100.
max_concurrency=5,
# Set to 0 to disable idle termination
idle_timeout_s=3600,
),
),
)
anyscale.job.submit(config)
Replace the following:
<job-name>
: (Optional) Name for the job.<compute-config-name>:<version>
: Name of an existing registered compute config with a version number. The default is the latest version.<image-uri>:<version>
: URI of an existing image with a version number. The default is the latest version.<job-queue-name>
: Name of the job queue that Anyscale uses to add other jobs to this queue.
See the API reference for JobQueueConfig
and JobQueueSpec
.
If this is the first job for a job queue, Anyscale creates a new cluster based on job_queue_spec
, compute_config
, and image_uri
. For subsequent jobs, use the same config to associate with the existing queue.
The Ray and Python versions from the container image in your first job define the version requirements for your job queue. Job queues only support container images that use the same Ray and Python versions.
The submission fails if you submit a job with the same job queue name
but a different job_queue_spec
or compute_config
.
If you don't specify compute_config
or image_uri
, Anyscale uses the defaults for the current workspace or cloud.
Add jobs to an existing queue
If you have an existing job queue, you can add new jobs by specifying the target_job_queue_name
in the job config.
You can deploy each job in your job queue with a different image_uri
.
If you specify an image_uri
that differs from the container image used by the job that created the job queue, you can't specify other runtime environment settings such as working_dir
, env_vars
, and requirements
when submitting multiple jobs to a job queue. You must package all code dependencies into your container image when scheduling jobs with job queues. See Refactor development patterns to define custom images.
If all jobs in a job queue use the same container image, you can use other runtime environment settings.
The following example syntax demonstrates this pattern and omits the compute config, Ray version, and most job queue config options. If you include these configuration options, they must match the configurations used by the job that created your job queue cluster.
- CLI
- Python SDK
Use the following syntax to create a job config YAML file to add to an existing job queue:
name: <new-job-name>
entrypoint: python hello_world.py
image_uri: <new-image-uri>:<version>
job_queue_config:
priority: 100
target_job_queue_name: <job-queue-name>
Submit your job using the following CLI command:
anyscale job submit --config-file /path/to/job-config.yaml
import anyscale
from anyscale.job.models import JobConfig, JobQueueConfig, JobQueueSpec, JobQueueExecutionMode
config = JobConfig(
name=<new-job-name>,
entrypoint="python hello_world.py",
image_uri="<new-image-uri>:<version>",
job_queue_config=JobQueueConfig(
priority=100,
target_job_queue_name=<job-queue-name>,
),
)
anyscale.job.submit(config)
Terminate all jobs in a queue
To terminate all running jobs in the queue, use the Terminate running jobs button on the upper right corner of the Job queue page. Note that Anyscale doesn't terminate pending jobs.
Terminate a specific job in a queue
If the job is still Pending, you can terminate it from the Job page or by using the CLI:
anyscale job terminate --id 'prodjob_...'
If the job is Running, you need to terminate it in the Anyscale terminal:
- Go to Job page, click the Ray dashboard tab, click the Jobs tab. Find and copy the
Submission ID
for the job you want to terminate.
- Open the Terminal tab and run
ray job stop 'raysubmit_...'
.