Skip to main content

Production Jobs

Anyscale supports production jobs that are submitted as a standalone package and managed by the platform. These types of jobs are best suited for production workflows where you want Anyscale to automatically handle starting up the cluster and handling failures.

After submitting the job definition to Anyscale, Anyscale will automatically create a cluster, run the job on it, and monitor the job until it succeeds. If the job fails, it will automatically be restarted (up to a configurable number of retries).

Using production jobs

Define

When you submit a production job, you must provide the following:

  • (Required) A compute config for the cluster the job will run on. On the SDK, this can only be specified as compute_config_id. On the CLI, you may specify compute_config (the name of the compute config) or cloud (the name of an Anyscale cloud) instead for convenience. This attribute is optional in the CLI. The SDK example below shows how to resolve a missing value, or a compute_config or a cloud into a compute_config_id.
  • (Required) A cluster environment for the cluster the job will run on. On the SDK, this can only be specified as build_id. On the CLI, you may specify cluster_env (the name and version for the cluster environment, colon-separated; if you don't specify a version, the latest will be used). This attribute is optional in the CLI. The SDK example below shows how to resolve a missing value, or a cluster_env into a build_id.
  • (Optional) A runtime environment containing your application code and dependencies.
  • (Required) The entrypoint command that will be run on the cluster to run the job.
  • (Optional) A maximum number of retries before the job is considered failed (defaults to 5).
note

The working_dir option of the runtime environment must be a remote URL to a zip file stored on AWS S3 or Google Cloud Storage bucket, or a zip file accessible over HTTP/S (e.g., a GitHub download URL). The cluster running the job must have permissions to download from that URL.

These options can be specified in a configuration file:

my_production_job.yaml
compute_config: my-compute-config # You may specify `compute_config_id` or `cloud` instead
cluster_env: my-cluster-env:5 # You may specify `build_id` instead
runtime_env:
working_dir: "s3://my_bucket/my_job_files.zip"
# You may also specify other runtime environment properties like `pip` and `env_vars`
entrypoint: "python my_job_script.py --option1=value1"
max_retries: 3

Submit

Your job can be submitted to Anyscale with the CLI or the Python SDK:

anyscale job submit my_production_job.yaml \
--name my-production-job \
--description "A production job running on Anyscale."

Monitor

You can check the status of a job on the Web UI, or query it using the CLI or the Python SDK:

anyscale job list --job-id [JOB_ID]

Terminate

You can attempt to terminate a job using the CLI or the Python SDK:

anyscale job terminate --job-id [JOB_ID]

Example

You can try out production services using a public GitHub repo. The following hello_world.py example is located on Anyscale's docs_examples repo:

hello_world.py
import ray
import anyscale

@ray.remote
def say_hi(message):
return f"Hello, {message}."

ray.init()
print(ray.get(say_hi.remote("World")))

You can define the following job spec:

hello_world.yaml
entrypoint: "python hello_world.py"
runtime_env:
working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"

You can run, monitor and stop the job as follows:

# Submit the job defined above using the CLI
anyscale job submit hello_world.yaml --name hello_world
No cluster environment provided, setting default based on local Python and Ray versions.
No cloud or compute config specified, using the default: cpt_5e2TnnRv8m6hyPXqCg5CTEfX.
Job prodjob_e1xd11CfHKditQK6XJnx4zqZ has been successfully submitted. Current state of job: PENDING.
Query the status of the job with `anyscale job list --job-id prodjob_e1xd11CfHKditQK6XJnx4zqZ`.
View the job in the UI at https://console.anyscale-staging.com/jobs/prodjob_e1xd11CfHKditQK6XJnx4zqZ.

# Query the status of the service using the ID returned above
anyscale job list --job-id prodjob_e1xd11CfHKditQK6XJnx4zqZ
View your jobs in the UI at https://console.anyscale-staging.com/jobs
Jobs:
NAME ID COST PROJECT NAME CLUSTER NAME CURRENT STATE CREATOR ENTRYPOINT
hello_world prodjob_e1xd11CfHKditQK6XJnx4zqZ 0 my-project cluster_for_prodjob_e1xd11CfHKditQK6XJnx4zqZ_HII5B88lSl RUNNING user python hello_world.py

# Terminate the job
anyscale job terminate --job-id prodjob_e1xd11CfHKditQK6XJnx4zqZ
Job prodjob_e1xd11CfHKditQK6XJnx4zqZ has begun terminating...
Current state of job: RUNNING. Goal state of job: TERMINATED
Query the status of the job with `anyscale job list --job-id prodjob_e1xd11CfHKditQK6XJnx4zqZ`.

# Query the status again
anyscale job list --job-id prodjob_e1xd11CfHKditQK6XJnx4zqZ
View your jobs in the UI at https://console.anyscale-staging.com/jobs
Jobs:
NAME ID COST PROJECT NAME CLUSTER NAME CURRENT STATE CREATOR ENTRYPOINT
hello_world prodjob_e1xd11CfHKditQK6XJnx4zqZ 0 my-project TERMINATED user python hello_world.py