Virtually all code depends on other code to function. For example, my simple Python program depends not only on a Python runtime such as "Python 3.8.3 on MacOS" but also, because of
import statements, it may depend on
pandas, or thousands of other packages. Python provides tools for managing these dependencies, but the problem can still be hard. Python developers learn that dependencies can be a nightmare, particularly when package A depends on one version of package B, and package C depends on an incompatible version of package B. This document can't address all issues of dependency management, but it can help you deal with how using Ray affects them.
You'll have to consider the relationship between the Python environment on your laptop, in a CI environment, and on Anyscale. Here are some environments to consider, and how your application lifecycle will lead you from one to another until you've considered an independently running, production environment.
When working with Ray locally, the Ray cluster uses the same Python environment as does the command line. This is the simplest arrangement, and beyond the normal foibles of dependency management, Ray does not provide additional barriers. Let's say I have a very simple script that depends on Ray and
pkutils, a utility to parse requirements files. It's not important at all what this program does, simply that it can find its dependencies.
If you've installed Ray and
pkutils then this script will run, otherwise you'll get the dreaded
ModuleNotFoundError: No module named 'ray' message. If you have put your dependencies into a file called
requirements.txt you can ensure they are installed with
pip install -r requirements.txt and then the script will run successfully.
When you run the same code on Anyscale, you have different options to install the required dependencies.
Machines that are provisioned by Anyscale come with a certain set of dependencies. Some libraries, in particular the Ray and Anyscale dependencies, are provided in the base images for any Anyscale-managed cluster. In order to further control dependencies, Anyscale provides cluster environment, which allows you to configure dependencies.
Cluster environment dependencies are installed at Docker image build time, so they do not make cluster launch times increase. Dependencies in cluster environments are also immutable. Usually, once someone has established the requirements for an application, they place the dependencies into a cluster Environment to lock them down for the application.
To tell Anyscale about your dependencies, you may also add a runtime environment to your
ray.init() command. The runtime environment can be a list of dependencies, or, as in this case, a reference to the requirements file:
Now the remote function executes appropriately. You can also specify requirements directly within the Python code. Runtime environments can specify Anaconda, Debian and pip packages. However, keep in mind that dependencies in a runtime environment will be installed on the cluster at launch time, an operation which can extend the time it takes to launch any given node.