Production Jobs
This is a reference page. For a walkthrough of Production Jobs, check out the user guide.
Use of Production jobs requires Ray 1.9+.
Production jobs are discrete workloads managed by Anyscale. A developer designs and packages code for a job, and then submits the job to Anyscale for execution and cluster lifecycle management. These types of jobs are best suited for workflows where you want Anyscale to handle starting up the cluster and react to failures.
Submitting an Anyscale Job:
- Creates a cluster
- Runs the job on it
- Restarts the job on failure (up to the
max_retries
) - Records the output and send an email on success
Development to Production
Production jobs, or Anyscale jobs, are related to but distinct from Ray Jobs. Here, you can learn more about the Ray jobs
We recommend iterating using Ray Jobs, and moving to Anyscale Jobs once you have a workload that is tested and ready to run in production.
Check workspaces to learn more about the recommended way to develop Ray applications on the Anyscale platform.
Configuring production jobs
Define
When you submit a production job, you must provide the following:
- compute_config A cluster compute config for the cluster the job will run on.
- On the SDK, this can be specified as
compute_config_id
orcompute_config
(a one-off compute config). This is required, and only one of these fields can be specified. - On the CLI, you may specify
compute_config
(the name of the cluster compute config or a one-off) orcloud
(the name of an Anyscale cloud) instead for convenience. Both attributes are optional. If neither attribute is specified, the service will use a default compute config that is associated with the default cloud.
- On the SDK, this can be specified as
- (Required) cluster_env A cluster environment for the cluster the job will run on. On the SDK, this can only be specified as
build_id
. On the CLI, you may specifycluster_env
(the name and version for the cluster environment, colon-separated; if you don't specify a version, the latest will be used). This attribute is optional in the CLI. The SDK example below shows how to resolve a missing value, or acluster_env
into abuild_id
. - (Optional) runtime_env A runtime environment containing your application code and dependencies.
- (Required) entrypoint The entrypoint command that will be run on the cluster to run the job. Although it is generally a python script, the entry point can be any shell command.
- (Optional) max_retries A maximum number of retries before the job is considered failed (defaults to 5).
- (Optional) name Name of the job. Job names can be the same within a project and across projects.
- (Optional) project_id The id of the project you want the job to run in.
project_id
can be found in the URL by navigating to the project in Anyscale Console.
The working_dir
option of the runtime_env
, typically containing your job script and other necessary files, can be specified in one of two ways.
The first option is to set working_dir
to be a remote URI: either a zip file preexisting in an AWS S3 or Google Cloud Storage bucket, or a zip file accessible over HTTP/S (e.g., a GitHub download URL). The zip file must contain only a single top-level directory; see Remote URIs for details.
The second option is to set working_dir
to be a local directory. If you do this, you are responsible for providing external storage: your runtime_env
must include the field upload_path
specifying an Amazon S3 or Google Storage bucket or subdirectory. Your local directory will then be uploaded to remote storage and downloaded by the cluster before running the job. External storage allows for Anyscale to start a new cluster with your working_dir
in the case of a job failure.
Example:
runtime_env = {"working_dir": "/User/code", "upload_path": "s3://my-bucket/subdir"}
Whether using the first or second option, the cluster running the job must have permissions to download from the bucket or URL. For more on permissions, see our Cloud Access Overview, or Using a Local Directory with Anyscale Jobs and Services on GCP if using GCP.
The upload_path
field is not available in OSS Ray. See Runtime Environments in Anyscale Production Jobs and Services for more differences with runtime_env
when using Production Jobs and Services.
These options are specified in a configuration file:
name: "my-first-job"
project_id: "prj_7S7Os7XBvO6vdiVC1J0lgj"
compute_config: my-compute-config # You may specify `compute_config_id` or `cloud` instead
# Alternatively, a one-off compute config
# compute_config:
# cloud_id: cld_4F7k8814aZzGG8TNUGPKnc
# region: us-west-2
# head_node_type:
# name: head
# instance_type: m5.large
# worker_node_types: []
cluster_env: my-cluster-env:5 # You may specify `build_id` instead
runtime_env:
working_dir: "s3://my_bucket/my_job_files.zip"
# You may also specify other runtime environment properties like `pip` and `env_vars`
entrypoint: "python my_job_script.py --option1=value1"
max_retries: 3
Submit
To submit your job to Anyscale, use the the CLI or the Python SDK:
- CLI
- Python SDK
anyscale job submit my_production_job.yaml \
--name my-production-job \
--description "A production job running on Anyscale."
import yaml
from anyscale.sdk.anyscale_client.models import *
from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
job_config = {
# IDs can be found on Anyscale Console under Configurations.
# The IDs below are examples and should be replaced with your own IDs.
'compute_config_id': 'cpt_U8RCfD7Wr1vCD4iqGi4cBbj1',
# The compute config can also specified as a one-off instead:
# 'compute_config': ClusterComputeConfig(
# cloud_id="cld_V1U8Jk3ZgEQQbc7zkeBq24iX",
# region="us-west-2",
# head_node_type=ComputeNodeType(
# name="head",
# instance_type="m5.large",
# ),
# worker_node_types=[],
# ),
# The id of the cluster env build
'build_id': 'bld_1277XIinoJmiM8Z3gNdcHN',
'runtime_env': {
'working_dir': 's3://my_bucket/my_job_files.zip'
},
'entrypoint': 'python my_job_script.py --option1=value1',
'max_retries': 3
}
job = sdk.create_job(CreateProductionJob(
name="my-production-job",
description="A production job running on Anyscale.",
# project_id can be found in the URL by navigating to the project in Anyscale Console
project_id='prj_7S7Os7XBvO6vdiVC1J0lgj',
config=job_config
))
Monitor
You can check the status of a job on the Web UI, or query it using the CLI or the Python SDK:
- CLI
- Python SDK
anyscale job list --job-id [JOB_ID]
from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
job = sdk.get_production_job(JOB_ID)
Logs
You can view logs of a job on the Web UI, or follow it using the CLI or the Python SDK:
- CLI
- Python SDK
anyscale job logs --job-id <JOB_ID> --follow
from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
job_logs = sdk.get_production_job_logs(JOB_ID)
Terminate
You can terminate a job using the CLI or the Python SDK:
- CLI
- Python SDK
anyscale job terminate --job-id [JOB_ID]
from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
sdk.terminate_job(JOB_ID)
Job Outputs
Job Outputs is in Alpha, and is supported in Anyscale CLI >= 0.5.32. Please contact support for any feedback on this feature.
Job Outputs are stored in Anyscale's cloud account. We do not recommend storing sensitive data in your Job output.
Anyscale Jobs can emit JSON output when they finish. Here is an example output:
{
"accuracy": 0.67,
"model_path": "s3://ml-models/latest"
}
To see an end to end example of Job Outputs, please read the usage guide
Restrictions
- Outputs must be JSON dictionaries.
- Outputs must be less than 1MB in size.
- Outputs are stored in Anyscale's cloud account, with a retention of 60 days.
- Outputs cannot be overwritten. Each job can only write output once.
- Outputs must be submitted from an Anyscale Job entrypoint.
Submit an output
There are 2 APIs to submit an Output.
These APIs should be invoked from the entrypoint
of your Production Job. From any other context, they will just print the output, and not record it.
Each of these APIs can only be called once per Production Job.
- CLI
- Python
# From a JSON file
anyscale job output write -f output.json
# From a string
anyscale job output write '{"model_path": "s3://ml-models/latest"}'
# From stdout of your script
python script_that_prints_json.py | anyscale job output write
import anyscale.job
anyscale.job.output({
"accuracy": 0.67,
"model_path": "s3://ml-models/latest"
})
Read the output
You can view your Job Output on the UI page for your Job.

You can also fetch it programatically.
anyscale job output get --id JOB_ID