pip packages fail to install within a cluster environment
If you see this message in Cluster Environment build logs:
[INFO] 6/23/2021, 4:20:16 PM: Running setup.py install for <SOME PIP PACKAGE>: finished with status 'error'
[ERROR] 6/23/2021, 4:20:16 PM: ERROR: Command errored out with exit status 1:
ImportError: Something is wrong with the numpy installation. While importing we detected an older version of numpy in ['/home/ray/anaconda3/lib/python3.8/site-packages/numpy']. One method of fixing this is to repeatedly uninstall numpy until none is found, then reinstall this version.
- Remove from pip section
- Add the following to post build commands:
/home/ray/anaconda3/bin/python -m pip uninstall -y numpy
rm -rf /home/ray/anaconda3/lib/python3.<7 OR 8>/site-packages/numpy
/home/ray/anaconda3/bin/pip install numpy
/home/ray/anaconda3/bin/pip install --upgrade --no-cache-dir <SOME PIP PACKAGE>
Tensorboard support requires port forwarding
Future versions of anyscale will include support for tensorboard out of the box. Here is how to use it today.
- After having run your training clusters, there will be log files in your cluster at /home/ray/ray_results.
- Use the Anyscale CLI to ssh into your cluster, including a port forwarding option for tensorboard:
anyscale ssh -o -L6006:localhost:6006
- Inside the resulting cluster, launch
- Open a brower on your local machine to
http://localhost:6006and use the tensorboard UI.
Installing Horovod requires extra steps
Horovod cannot be installed simply by adding it to the pip packages.
Set the appropriate environment variables in the cluster environment:
Add the following post-build commands:
pip install horovod[tensorflow]
pip install horovod[ray]