Skip to main content

Runtime Environments

Basic usage

Runtime environments specify the Python environment that you want the driver, tasks, and actors of your job to run in. These are specified at runtime rather than being pre-built like cluster environments. Runtime environments are nested within cluster environments. You may have different runtime environments for different jobs, tasks, or actors on the same cluster.

A runtime environment is a JSON-serializable dictionary that can contain a number of different options, such as pip dependencies or files. Please see the Ray documentation for more details and the full list of options.

The example runtime environment below syncs the local directory and installs a few pip dependencies:

ray.init(runtime_env={
"working_dir": ".",
"pip": ["requests", "torch>=1.4.0"]
})

Working directory

The working directory (sometimes called "project directory") is also part of the runtime environment; code from the working directory is uploaded to Anyscale when you establish a connection. By default, the working directory is the current directory, and is also associated with an Anyscale project. You can however also configure the working directory. Here it is using ray_code, a relative path:

ray.init(runtime_env={"working_dir":"ray_code"})

You can also use schemes such as s3:// and gs:// to specify code located elsewhere.

Your working directory might contain testing artifacts or other data that is either large or that you don't need on the Ray cluster. If the directory is larger than 100M, Anyscale will not upload it. In order to prune large files and directories from your runtime environment, use the excludes key in its configuration. excludes takes an array of strings, which can include .gitignore style wildcards:

ray.init(runtime_env={
"working_dir": ".",
"excludes": ["large_file1.csv", "tests/*.dat"]
})

If you need a large working directory, consider putting it in the cloud ahead of time and referencing from there:

ray.init(runtime_env={
"working_dir": "s3://my-code-bucket",
"excludes": ["large_file1.csv", "tests/*.dat"]
})

The runtime code from your project is uploaded to a temporary location on each node in Ray cluster. You can find it under /tmp/ray/session_latest/runtime_resources.

Dependencies

The runtime environment supports several ways to define your Ray application dependencies. The two most common are dependency packages and environment variables. For more information on defining the runtime environment, refer to the Ray API Reference here

Conda and Pip Packages

Your ray application may depend on packages via import statements. Setting dependencies through pip or Conda in your runtime environment will allow you to dynamically specify packages to be downloaded and installed in a virtual environment for your Ray application. It's important to note that you can only specify either pip or Conda in the runtime environment.

ray.init(runtime_env={
"pip": ["requests", "torch==0.9.0"],
})

Environment Variables

You can dynamically specify environment variables in the runtime environment to be used by your ray application.

ray.init(
runtime_env={
"env_vars": {
"OMP_NUM_THREADS": "32",
"TF_WARNINGS": "none"
}
},
)

Runtime Environments in Anyscale Production Jobs and Services

The usage of runtime_env differs slightly when using Anyscale Production Jobs and Services as opposed to Ray Job Submission and all other Ray applications.

note

Like all Ray applications, Ray Job Submission and Ray Serve can be used on Anyscale with runtime_env behavior exactly as in OSS Ray. (For Ray Job Submission, you will need to set the environment variable RAY_ADDRESS to the address of your Anyscale cluster; for example, RAY_ADDRESS=anyscale://my-cluster ray job submit --working-dir your_working_directory -- python script.py.)

The differences below only apply to Anyscale Production Jobs and Services.

Differences for runtime_env with Anyscale Production Jobs and Services

  • The py_modules field only supports Remote URIs. It does not support local directories or '.whl' files.
  • The working_dir field does not support local .zip files. (Note however that .zip files hosted remotely as Remote URIs are still supported.)
  • The working_dir field supports local directories.
    • A upload_path field can be specified to define where to upload your current working directory.
    • If no upload_path is specified, Anyscale automatically uploads local directories to your cloud account (see below for details)
    • The cluster running the job must have permissions to download from the bucket or URL. See how to access resources from cloud providers.
  • Look at the section below for examples on how to define your runtime_env for Anyscale Production Jobs and Services

Runtime Environment Examples

Defining your working directory

Set working_dir to a remote URI

runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip

Sample Runtime Environment

Below is an example of a runtime environment that utilizes a remote working directory and defines various dependencies through environment variables and pip packages.

runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
env_vars: {
OMP_NUM_THREADS: 32,
TF_WARNINGS: none
}
pip: [requests, torch==0.9.0]

Using multiple runtime environments

By default, all tasks and actors in a job will run in the job-level runtime environment, but you can also specify per-task and per-actor runtime environments. This can be useful when you have conflicting dependencies within the same application (for example, requiring two different versions of TensorFlow).

You can specify the runtime environment in the @ray.remote decorator or using .options:

@ray.remote(runtime_env=my_runtime_env1)
def task():
pass

# Task runs in env1.
task.remote()

# Alternative API:
task.options(runtime_env=my_runtime_env1).remote()

@ray.remote(runtime_env=my_runtime_env2)
class Actor:
pass

# Actor runs in env2.
Actor.remote()

# Alternative API:
Actor.options(runtime_env=my_runtime_env2).remote()
note

If the runtime environment specifies pip or Conda dependencies, these will be installed in an isolated Conda environment. In particular, pip and Conda dependencies from the cluster environment will not be inherited by the runtime environment.

note

For Anyscale Jobs and Anyscale Services, the usage of runtime_env differs slightly; see the section below for details.

note

If you know the runtime environment will take a long time to set up, you can increase the setup timeout by setting config.setup_timeout_seconds in your job YAML. Default timeout is 600 seconds.

For example:

runtime_env:
config:
setup_timeout_seconds: 900
working_dir: .