Skip to main content

Create and manage jobs

Create and manage jobs

Submit a job

To submit your job to Anyscale, use the Python SDK or CLI and pass in any additional options or configurations for the job.

By default, Anyscale uses your workspace or cloud to provision a cluster to run your job. You can define a custom cluster through a compute config or specify an existing cluster.

Once submitted, Anyscale runs the job as specified in the entrypoint command, which is typically a Ray Job. If the run doesn't succeed, the job restarts using the same entrypoint up to the number of max_retries.

anyscale job submit --name=my-job-name \
--working-dir=. --max-retries=5 \
--image-uri="anyscale/image/IMAGE_NAME:VERSION" \
--compute-config=COMPUTE_CONFIG_NAME \
-- python main.py

With the CLI, you can either specify an existing compute config with --compute-config=COMPUTE_CONFIG_NAME or define a new one in a job YAML.

For more information on submitting jobs with the CLI, see the reference docs.

tip

For large-scale, compute-intensive jobs, avoid scheduling Ray tasks onto the head node because it manages cluster-level orchestration. To do that, set the CPU resource on the head node to 0 in your compute config.

Define a job

You can define jobs in a YAML configuration file and submit them using the CLI.

Create your job configuration in a YAML file:

#my-job.yaml
name: my-job-name
entrypoint: python main.py

# Container image: Can be an Anyscale base image, external registry, or custom image
image_uri: anyscale/ray:2.51.1-slim-py312-cu128
# OR use containerfile: ./Dockerfile (see "Using a custom container" section below)

# Compute config: Can be name of existing config OR inline definition
# Use existing: compute_config: my-compute-config:1
# Or define inline as shown below
compute_config:
head_node:
instance_type: m5.8xlarge
worker_nodes:
- instance_type: m5.4xlarge
min_nodes: 1
max_nodes: 5

working_dir: . # Local directory to upload (defaults to current directory)
requirements: # Python dependencies - can be list or path to requirements.txt
- numpy==1.26.4
- pandas==2.3.3
env_vars:
MY_ENV_VAR: production
max_retries: 3
tags:
team: ml-ops

Container images: See What is a container image? for working with Anyscale base images, custom images, and external registries.
Compute configs: See ComputeConfig for the inline schema, or learn to create them in the console or with the CLI/SDK

For the complete list of available fields, see JobConfig reference.

Submit your job

anyscale job submit --config-file my-job.yaml

You can override your YAML config file using command-line flags:

anyscale job submit --config-file my-job.yaml --name override-job --max-retries 5

Wait on a job

You can block CLI and SDK commands until a job enters a specified state. By default, commands wait for JobState.SUCCEEDED. See all available states in the reference docs.

anyscale job wait -n my-job-name

When you submit a job, you can specify --wait, which waits for the job to succeed or exits if the job fails.

anyscale job submit -n my-job-name --wait -- sleep 30

For more information on submitting jobs with the CLI, see the reference docs.

Terminate a job

You can terminate a job from the Job page or using the CLI/SDK:

anyscale job terminate --id 'prodjob_...'

For more information on terminating jobs with the CLI, see the reference docs.

Archive a job

Archiving jobs hides them from the job list page, but you can still access them through the CLI and SDK. Anyscale automatically archives the cluster associated with an archived job.

Jobs must be in a terminal state to archive them. You must have created the job or be an organization admin to archive a job.

You can archive jobs in Anyscale console or through the CLI/SDK:

anyscale job archive --id 'prodjob_...'

For more information on archiving jobs with the CLI, see the reference docs.

Manage dependencies

When developing Anyscale jobs, you may need to include additional Python packages or system-level dependencies. There are several ways to manage these dependencies:

Use a requirements.txt file

The simplest way to manage Python package dependencies is by using a requirements.txt file.

  1. Create a requirements.txt file in your project directory:

    emoji==2.12.1
    numpy==1.21.0
  2. When submitting your job, include the -r or --requirements flag:

anyscale job submit --config-file my-job.yaml -r ./requirements.txt

This method works well for straightforward Python package dependencies. Anyscale installs these packages in the job's environment before running your code.

Use a custom container

For more complex dependency management, including system-level packages or specific environment configurations, use a custom container:

  1. Create a Dockerfile:

    FROM anyscale/ray:2.10.0-py310

    # Install system dependencies if needed
    RUN apt-get update && apt-get install -y <your-system-packages>

    # Install Python dependencies
    COPY requirements.txt /tmp/
    RUN pip install -r /tmp/requirements.txt
  2. Build and submit the job with the custom container:

anyscale job submit --config-file my-job.yaml --containerfile Dockerfile

This method gives you full control over the job's environment, allowing you to install both system-level and Python packages.

Use pre-built custom images

For frequently used environments, you can build and reuse custom images:

  1. Build the image:
anyscale image build -n my-custom-image --containerfile Dockerfile
  1. Use the built image in your job submission:
anyscale job submit --config-file my-job.yaml --image-uri anyscale/image/my-custom-image:1

This approach is efficient for teams working on multiple jobs that share the same dependencies.