Skip to main content

Production Jobs

info

This is a reference page. For a walkthrough of Production Jobs, check out the user guide.

note

Use of Production jobs requires Ray 1.9+.

Production jobs are discrete workloads managed by Anyscale. A developer designs and packages code for a job, and then submits the job to Anyscale for execution and cluster lifecycle management. These types of jobs are best suited for workflows where you want Anyscale to handle starting up the cluster and react to failures.

Submitting an Anyscale Job:

  1. Creates a cluster
  2. Runs the job on it
  3. Restarts the job on failure (up to the max_retries)
  4. Records the output and send an email on success

Development to Production

Production jobs, or Anyscale jobs, are related to but distinct from Ray Jobs.
To learn more about using Ray Jobs for development on Anyscale, see the development guide.

We recommend iterating using Ray Jobs, and moving to Anyscale Jobs once you have a workload that is tested and ready to run in production.

Configuring production jobs

Define

When you submit a production job, you must provide the following:

  • compute_config A cluster compute config for the cluster the job will run on.
    • On the SDK, this can be specified as compute_config_id or compute_config (a one-off compute config). This is required, and only one of these fields can be specified.
    • On the CLI, you may specify compute_config (the name of the cluster compute config or a one-off) or cloud (the name of an Anyscale cloud) instead for convenience. Both attributes are optional. If neither attribute is specified, the service will use a default compute config that is associated with the default cloud.
  • (Required) cluster_env A cluster environment for the cluster the job will run on. On the SDK, this can only be specified as build_id. On the CLI, you may specify cluster_env (the name and version for the cluster environment, colon-separated; if you don't specify a version, the latest will be used). This attribute is optional in the CLI. The SDK example below shows how to resolve a missing value, or a cluster_env into a build_id.
  • (Optional) runtime_env A runtime environment containing your application code and dependencies.
  • (Required) entrypoint The entrypoint command that will be run on the cluster to run the job. Although it is generally a python script, the entry point can be any shell command.
  • (Optional) max_retries A maximum number of retries before the job is considered failed (defaults to 5).
  • (Optional) name Name of the job. Job names can be the same within a project and across projects.
  • (Optional) project_id The id of the project you want the job to run in. project_id can be found in the URL by navigating to the project in Anyscale Console.
note

The working_dir option of the runtime_env, typically containing your job script and other necessary files, can be specified in one of two ways.

The first option is to set working_dir to be a remote URI: either a zip file preexisting in an AWS S3 or Google Cloud Storage bucket, or a zip file accessible over HTTP/S (e.g., a GitHub download URL). The zip file must contain only a single top-level directory; see Remote URIs for details.

The second option is to set working_dir to be a local directory. If you do this, you are responsible for providing external storage: your runtime_env must include the field upload_path specifying an Amazon S3 or Google Storage bucket or subdirectory. Your local directory will then be uploaded to remote storage and downloaded by the cluster before running the job. External storage allows for Anyscale to start a new cluster with your working_dir in the case of a job failure.

Example: runtime_env = {"working_dir": "/User/code", "upload_path": "s3://my-bucket/subdir"}

Whether using the first or second option, the cluster running the job must have permissions to download from the bucket or URL. For more on permissions, see our Cloud Access Overview, or Using a Local Directory with Anyscale Jobs and Services on GCP if using GCP.

The upload_path field is not available in OSS Ray. See Runtime Environments in Anyscale Production Jobs and Services for more differences with runtime_env when using Production Jobs and Services.

These options are specified in a configuration file:

my_production_job.yaml
name: "my-first-job"
project_id: "prj_7S7Os7XBvO6vdiVC1J0lgj"
compute_config: my-compute-config # You may specify `compute_config_id` or `cloud` instead
# Alternatively, a one-off compute config
# compute_config:
# cloud_id: cld_4F7k8814aZzGG8TNUGPKnc
# region: us-west-2
# head_node_type:
# name: head
# instance_type: m5.large
# worker_node_types: []
cluster_env: my-cluster-env:5 # You may specify `build_id` instead
runtime_env:
working_dir: "s3://my_bucket/my_job_files.zip"
# You may also specify other runtime environment properties like `pip` and `env_vars`
entrypoint: "python my_job_script.py --option1=value1"
max_retries: 3

Submit

To submit your job to Anyscale, use the the CLI or the Python SDK:

anyscale job submit my_production_job.yaml \
--name my-production-job \
--description "A production job running on Anyscale."

Monitor

You can check the status of a job on the Web UI, or query it using the CLI or the Python SDK:

anyscale job list --job-id [JOB_ID]

Terminate

You can terminate a job using the CLI or the Python SDK:

anyscale job terminate --job-id [JOB_ID]

Job Outputs

caution

Job Outputs is in Alpha, and is supported in Anyscale CLI >= 0.5.32. Please contact support for any feedback on this feature.

Job Outputs are stored in Anyscale's cloud account. We do not recommend storing sensitive data in your Job output.

Anyscale Jobs can emit JSON output when they finish. Here is an example output:

{
"accuracy": 0.67,
"model_path": "s3://ml-models/latest"
}

To see an end to end example of Job Outputs, please read the usage guide

Restrictions

  1. Outputs must be JSON dictionaries.
  2. Outputs must be less than 1MB in size.
  3. Outputs are stored in Anyscale's cloud account, with a retention of 60 days.
  4. Outputs cannot be overwritten. Each job can only write output once.
  5. Outputs must be submitted from an Anyscale Job entrypoint.

Submit an output

There are 2 APIs to submit an Output.

important

These APIs should be invoked from the entrypoint of your Production Job. From any other context, they will just print the output, and not record it.

Each of these APIs can only be called once per Production Job.

# From a JSON file
anyscale job output write -f output.json

# From a string
anyscale job output write '{"model_path": "s3://ml-models/latest"}'

# From stdout of your script
python script_that_prints_json.py | anyscale job output write

Read the output

You can view your Job Output on the UI page for your Job.

drawing

You can also fetch it programatically.

anyscale job output get --id JOB_ID