Skip to main content

Known Issues

pip packages fail to install within a cluster environment

Problem

If you see this message in Cluster Environment build logs:

[INFO] 6/23/2021, 4:20:16 PM:     Running setup.py install for <SOME PIP PACKAGE>: finished with status 'error'
[ERROR] 6/23/2021, 4:20:16 PM: ERROR: Command errored out with exit status 1:
...
ImportError: Something is wrong with the numpy installation. While importing we detected an older version of numpy in ['/home/ray/anaconda3/lib/python3.8/site-packages/numpy']. One method of fixing this is to repeatedly uninstall numpy until none is found, then reinstall this version.

Workaround

  • Remove from pip section
  • Add the following to post build commands:
/home/ray/anaconda3/bin/python -m pip uninstall -y numpy
rm -rf /home/ray/anaconda3/lib/python3.<7 OR 8>/site-packages/numpy
/home/ray/anaconda3/bin/pip install numpy
/home/ray/anaconda3/bin/pip install --upgrade --no-cache-dir <SOME PIP PACKAGE>

Tensorboard support requires port forwarding

Problem

Future versions of anyscale will include support for tensorboard out of the box. Here is how to use it today.

Workaround

  • After having run your training clusters, there will be log files in your cluster at /home/ray/ray_results.
  • Use the Anyscale CLI to ssh into your cluster, including a port forwarding option for tensorboard:

anyscale ssh -o -L6006:localhost:6006

  • Inside the resulting cluster, launch tensorboard.
  • Open a brower on your local machine to http://localhost:6006 and use the tensorboard UI.

Installing Horovod requires extra steps

Problem

Horovod cannot be installed simply by adding it to the pip packages.

Workaround

Set the appropriate environment variables in the cluster environment:

HOROVOD_WITH_TENSORFLOW=1
HOROVOD_WITH_GLOO=1

Add the following post-build commands:

pip install horovod[tensorflow]
pip install horovod[ray]