Skip to main content

Manage environment and dependencies

Manage package and file dependencies

Workspaces supports two options for managing dependencies:

  • Install to the cluster storage: this is useful when you are rapidly experimenting with different PyPI packages, for example. They will be accessible to every node of the cluster.
  • Build a new Cluster Environment this is useful when you want to checkpoint the dependencies of a Workspace and reuse them in the future. A Cluster Environment also gives you more control of you environment, including but not limited to different packages (PyPi, Debian, and Conda), custom docker environments, post build commands, etc.

Install to cluster storage

  1. Install your package with pip install --user and it will be put into cluster storage: /mnt/cluster_storage/pypi.
  2. Since Workspaces sets PYTHONUSERBASE=/mnt/cluster_storage/pypi, the library will automatically be accessible to all nodes in the cluster.
caution

Running pip install without the --user flag only installs a package on the head node of the cluster. The package will not be accessible by the worker nodes and will be lost upon cluster termination.

info

If you encounter ModuleNotFoundError in the worker and using pip install --user doesn't resolve it, utilize pip install --user --force-reinstall to ensure the module is forcefully installed in cluster storage.

info
  • Current limitations:
    • We do not currently replicate /mnt/cluster_storage when you duplicate a Workspace, so packages will need to be manually reinstalled in your new Workspace.
    • Installing in cluster local storage (NFS) will slow down initial import of the modules. For the best performance, we recommend installing packages in the Cluster Environment.

Build a new Cluster Environment

  1. Test out pip install in the Web Terminal until you are satisfied with your environment.
  2. Run pip freeze to get the full list of installed packages.
  3. Build a new Cluster Environment and paste the list as the pip dependencies.
  4. Terminate your Workspace and edit it to use the newly created Cluster Environment.
  5. Restart the Workspace.

Using environment variable store

You can store environment variables associated with your Anyscale account to use with various Anyscale features, including Workspaces, Jobs and Services. For example, this may be useful for storing your W&B credentials to log ML experiments.

An easy way to do so is to run these commands in the Workspace Web Terminal:

python -m snapshot_util put_env <ENV_KEY> <ENV_VALUE>
python -m snapshot_util get_env <ENV_KEY>
python -m snapshot_util del_env <ENV_KEY>
info
  1. These variables are stored per-user. This means they apply to all clusters that you start, including Workspaces, Jobs and Services clusters.
  2. You'll need to restart your Workspace for environment variables to show up in the Web Terminal / JupyterLab terminal. You can debug the environment variables set by running cat ~/.bashrc. The environment variables export commands are appended at the end of this file.
  3. Similar to the SSH key management, this feature doesn't provide strong security yet.