Runtime Environments
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Basic usage
Runtime environments specify the Python environment that you want the driver, tasks, and actors of your job to run in. These are specified at runtime rather than being pre-built like cluster environments. Runtime environments are nested within cluster environments. You may have different runtime environments for different jobs, tasks, or actors on the same cluster.
A runtime environment is a JSON-serializable dictionary that can contain a number of different options, such as pip dependencies or files. Please see the Ray documentation for more details and the full list of options.
The example runtime environment below syncs the local directory and installs a few pip dependencies:
ray.init(runtime_env={
"working_dir": ".",
"pip": ["requests", "torch>=1.4.0"]
})
Working directory
The working directory (sometimes called "project directory") is also part of the runtime environment; code from the working directory is uploaded to Anyscale when you establish a connection. By default, the working directory is the current directory, and is also associated with an Anyscale project. You can however also configure the working directory. Here it is using ray_code
, a relative path:
ray.init(runtime_env={"working_dir":"ray_code"})
You can also use schemes such as s3://
and gs://
to specify code located elsewhere.
Your working directory might contain testing artifacts or other data that is either large or that you don't need on the Ray cluster. If the directory is larger than 100M, Anyscale will not upload it. In order to prune large files and directories from your runtime environment, use the excludes
key in its configuration. excludes
takes an array of strings, which can include .gitignore
style wildcards:
ray.init(runtime_env={
"working_dir": ".",
"excludes": ["large_file1.csv", "tests/*.dat"]
})
If you need a large working directory, consider putting it in the cloud ahead of time and referencing from there:
ray.init(runtime_env={
"working_dir": "s3://my-code-bucket",
"excludes": ["large_file1.csv", "tests/*.dat"]
})
The runtime code from your project is uploaded to a temporary location on each node in Ray cluster. You can find it under /tmp/ray/session_latest/runtime_resources
.
Dependencies
The runtime environment supports several ways to define your Ray application dependencies. The two most common are dependency packages and environment variables. For more information on defining the runtime environment, refer to the Ray API Reference here
Conda and Pip Packages
Your ray application may depend on packages via import statements. Setting dependencies through pip or Conda in your runtime environment will allow you to dynamically specify packages to be downloaded and installed in a virtual environment for your Ray application. It's important to note that you can only specify either pip or Conda in the runtime environment.
- Pip dependencies
- Conda dependencies
ray.init(runtime_env={
"pip": ["requests", "torch==0.9.0"],
})
ray.init(
runtime_env={
"conda": {
"dependencies": [
"python=3.9",
"bokeh=2.4.2",
"conda-forge::numpy=1.21.*",
"nodejs=16.13.*",
"flask",
"pip",
{"pip": ["Flask-Testing"]},
]
}
},
)
Environment Variables
You can dynamically specify environment variables in the runtime environment to be used by your ray application.
ray.init(
runtime_env={
"env_vars": {
"OMP_NUM_THREADS": "32",
"TF_WARNINGS": "none"
}
},
)
Runtime Environments in Anyscale Production Jobs and Services
The usage of runtime_env
differs slightly when using Anyscale Production Jobs and Services as opposed to Ray Job Submission and all other Ray applications.
Like all Ray applications, Ray Job Submission and Ray Serve can be used on Anyscale with runtime_env
behavior exactly as in OSS Ray. (For Ray Job Submission, you will need to set the environment variable RAY_ADDRESS
to the address of your Anyscale cluster; for example, RAY_ADDRESS=anyscale://my-cluster ray job submit --working-dir your_working_directory -- python script.py
.)
The differences below only apply to Anyscale Production Jobs and Services.
Differences for runtime_env
with Anyscale Production Jobs and Services
- The
py_modules
field only supports Remote URIs. It does not support local directories or'.whl'
files. - The
working_dir
field does not support local.zip
files. (Note however that.zip
files hosted remotely as Remote URIs are still supported.) - The
working_dir
field supports local directories.- A
upload_path
field can be specified to define where to upload your current working directory. - If no
upload_path
is specified, Anyscale automatically uploads local directories to your cloud account (see below for details) - The cluster running the job must have permissions to download from the bucket or URL. See how to access resources from cloud providers.
- A
- Look at the section below for examples on how to define your
runtime_env
for Anyscale Production Jobs and Services
Runtime Environment Examples
Defining your working directory
- Remote working_dir
- Local working_dir
- Local working_dir and upload_path
- No runtime_env
Set
working_dir
to a remote URI
runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
Set
working_dir
to a local directory
In the example below, your current directory will be automatically uploaded to cloud storage on your account
The default path on the cloud is:
[s3, gs]://{defualt_bucket_name}/{org_id}/{cloud_id}/{workload_type}/{backup_zip}
runtime_env:
working_dir: .
noteYou need to export cloud credentials with write permissions
Set
working_dir
to a local directory and define aupload_path
- In the example below, your current directory will be automatically uploaded to the remote URI specified in the
upload_path
runtime_env:
working_dir: .
upload_path: <Remote URI>
Set empty
runtime_env
- You can bundle your application code into a cluster environment, and subsequently leave your
runtime_env
empty
Sample Runtime Environment
Below is an example of a runtime environment that utilizes a remote working directory and defines various dependencies through environment variables and pip packages.
runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
env_vars: { OMP_NUM_THREADS: 32, TF_WARNINGS: none }
pip: [requests, torch==0.9.0]
Using multiple runtime environments
By default, all tasks and actors in a job will run in the job-level runtime environment, but you can also specify per-task and per-actor runtime environments. This can be useful when you have conflicting dependencies within the same application (for example, requiring two different versions of TensorFlow).
You can specify the runtime environment in the @ray.remote
decorator or using .options
:
@ray.remote(runtime_env=my_runtime_env1)
def task():
pass
# Task runs in env1.
task.remote()
# Alternative API:
task.options(runtime_env=my_runtime_env1).remote()
@ray.remote(runtime_env=my_runtime_env2)
class Actor:
pass
# Actor runs in env2.
Actor.remote()
# Alternative API:
Actor.options(runtime_env=my_runtime_env2).remote()
If the runtime environment specifies pip or Conda dependencies, these will be installed in an isolated Conda environment. In particular, pip and Conda dependencies from the cluster environment will not be inherited by the runtime environment.
For Anyscale Jobs and Anyscale Services, the usage of runtime_env
differs slightly; see the section below for details.
If you know the runtime environment will take a long time to set up, you can increase the setup timeout by setting config.setup_timeout_seconds
in your job YAML. Default timeout is 600 seconds.
For example:
runtime_env:
config:
setup_timeout_seconds: 900
working_dir: .