Skip to main content
Version: 1.0.0

Create Anyscale Jobs

Check your docs version

Anyscale is rolling out a new design. If you have preview access to the enhanced experience, use the latest version of the docs and see the migration guide for transitioning.

note

Use of Anyscale Jobs requires Ray 2.0+.

Typical lifecycle of an Anyscale Job looks like following:

  1. Provisions a cluster (based on provided compute configuration)
  2. Runs specified entrypoint command (typically a Ray job)
  3. Restarts jobs in case of failures (up to the max_retries)
  4. Records the output sending notifications with the results of the job's run.

Defining Anyscale Job

Jobs could either be submitted using CLI or SDK. To submit job using CLI, job configuration need to be specified inside YAML config file like following:

# (Required) User-provided identifier for the job
name: my-first-job

# (Required) Job's entrypoint
entrypoint: "python my_job_script.py --some-config=value"

# (Optional) Anyscale Project job will be associated with
# Project could be identified using user-provided name or internal project id (of the form `prj_...`)
project_id: my-project

# (Optional) Compute Config specifies configuration of the cluster (node types, min/max # of nodes, etc) job will be run on.
# Compute config could be identified using user-provided name or internal compute config id (of the form `cpt_...`)
compute_config: my-compute-config

# Alternatively, a one-off compute config can be specified inline like following:
# compute_config:
# cloud_id: cld_4F7k8814aZzGG8TNUGPKnc
# region: us-west-2
# head_node_type:
# name: head
# instance_type: m5.large
# worker_node_types: []

# (Optional) Cluster Environment specifies a (Docker-like) container image that will be used to execute the job.
# Cluster environment could be identified using user-provided name and version or internal build id (of the form `bld_...`)
cluster_env: my-cluster-env:5

# (Optional) Ray's Runtime Environment configuration is specified as is under `runtime_env` branch:
# https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html
runtime_env:
working_dir: "s3://my_bucket/my_job_files.zip"

# (Optional) Maximum number of retries for a job
max_retries: 3

Here are some of the fields to provide in YAML files:

  • (Required) name Name of the Job. Job names can be the same within a project and across projects.
  • (Required) entrypoint The entrypoint command to run on the cluster to start the job (entry point can be any shell command)
  • (Optional) project_id The id of the Project you want the Job to run in. project_id can be found in the URL by navigating to the project in Anyscale Console. If not specified, Job will not belong to any Project.
  • (Optional) compute_config A Compute Config of the cluster the Job will be executed on.
    • Using SDK, this can be specified as either compute_config_id or compute_config (a one-off compute config). This is required, and only one of these fields can be specified.
    • Using CLI, you may specify compute_config as a) the name of the compute config, b) Anyscale's internal compute config id or c) inline definition. If not provided, default compute config will be used.
  • (Optional) cluster_env A Cluster Environment of the cluster the Job will run on.
    • Using SDK, this needs be specified as build_id.
    • Using CLI, you may specify cluster_env as a) the name and version of the cluster environment (colon-separated; if you don't specify a version, the latest will be used) or b) Anyscale's internal cluster environment id.
      • Note that, this attribute is optional in the CLI, but currently needs to be specified when using the SDK. Example below shows how to resolve a missing value, or a cluster_env into a build_id.
  • (Optional) runtime_env Ray's runtime environment configuration.
  • (Optional) max_retries Number of retries in case of failures encountered during Job execution (defaults to 5).
info

Please note that, for large-scale, compute-intensive Jobs, it's recommended to avoid scheduling Ray tasks onto the Ray's Head node to avoid interference with the Ray's control plane. To do that, set the CPU resource on the head node to 0 in your Compute Config.

This prevents Ray's Actors and Tasks from being scheduled on the Head node, avoiding potential interference with Ray's control plane for most intensive jobs.

note

The working_dir option of the runtime_env can be specified in two ways:

  • Remote URI: A zip file in a cloud storage bucket (AWS S3 or Google Cloud Storage) or directly accessible over HTTP/S (for example, a GitHub download URL). The zip file must contain only a single top-level directory. See Remote URIs for details.

  • Local directory: A local directory that will be uploaded to a remote storage and downloaded by the cluster before running the Anyscale Job. The external storage location is specified using the upload_path field. For example, the upload_path could be an Amazon S3 or Google Cloud Storage bucket. You are responsible for ensuring that your local environment and the future cluster has network access and IAM permissions to the remote location specified. External storage allows for Anyscale to start a new cluster with your working_dir in the case of a Job failure.

Example: runtime_env = {"working_dir": "/User/code", "upload_path": "s3://my-bucket/subdir"}

Whether using the first or second option, the cluster running the job must have permissions to download from the bucket or URL. For more on permissions, see accessing resources from cloud providers. The upload_path field is not available in OSS Ray. See Runtime Environments in Anyscale Jobs and Services for more differences with runtime_env when using Anyscale Jobs and Services.

Submit and run

To submit your Job to Anyscale, use the CLI or the Python SDK:

anyscale job submit my_production_job.yaml --follow \
--name my-production-job \
--description "A production job running on Anyscale."
info

Anyscale also supports scheduling jobs for recurring workloads. To schedule a Job, check out Anyscale Schedules

Monitor Job Status

You can check the status of the job on Anyscale Platform's Job page or query it using the CLI/SDK:

anyscale job list --job-id 'prodjob_...'

View Job logs

You can view logs of the job on Ray Dashboard or follow it using the CLI/SDK:

anyscale job logs --job-id 'prodjob_...' --follow
note

By default (for self-hosted deployments) Anyscale does NOT collect nor persist any application logs.

This entails that logs become unavailable for access once Ray cluster is shutdown, unless a third-party logging solution (like CloudWatch, Datadog, etc) is set up.

Anyscale also provides an option to ingest logs into secure logging solution providing easy access to application logs beyond the lifetime of the cluster right from the Anyscale Console. If you're interested please reach out to have this feature enabled for your environment.

Terminate Job

You can terminate a Job from Anyscale Console's Job page or using the CLI/SDK:

anyscale job terminate --job-id 'prodjob_...'

Set Job Maximum Runtime

Cluster's maximum_uptime_minutes configuration that you can specify in the Compute Config is also directly applicable to Anyscale Jobs: clusters running Anyscale Jobs will be forcibly terminated after maximum_uptime_minutes irrespective of the state of the job.

Upon hitting the maximum_uptime_minutes, the job will be automatically retried in case there are still retry attempts remaining (configured via max_retries).

This feature could be particularly useful for collection of the resources of jobs that haven't finished within the allocated time-budget.

Archive Job

Anyscale allows you to archive Jobs to hide them from list view. The clusters associated with the archived Job will be archived automatically.

Once a Job is archived it will be hidden from Job list page on Anyscale Console, but you will still be able to access its details via CLI/SDK.

How to archive Job

  • To be archived, Jobs have to be in the terminal state (Terminated, Out of Retries, and Success).
  • The user must have "write" permission for the Job to archive it.

You can archive Jobs in Anyscale Console, or through the CLI/SDK:

anyscale job archive --job-id [JOB_ID]

How to view archived Jobs

You can list archived Jobs by toggling on the "include archived" filter in Anyscale Console, or using the CLI:

anyscale job list --include-archived