Known Issues
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
apt-get update -y fails when building a docker image based on Anyscale ray images
If you see this message when building a docker image:
[INFO] 3/4/2024, 5:56:03 PM: #13 1.970 Reading package lists...
[INFO] 3/4/2024, 5:56:03 PM: #13 2.857 E: The repository 'http://apt.kubernetes.io kubernetes-xenial Release' no longer has a Release file.
[INFO] 3/4/2024, 5:56:03 PM: #13 ERROR: process "/bin/sh -c sudo apt-get update -y" did not complete successfully: exit code: 100
Workaround
- Remove the kubernetes repository
- Run
sudo apt-get update -y
again
RUN sudo rm -f /etc/apt/sources.list.d/kubernetes.list
RUN sudo apt-get update -y
pip packages fail to install within a cluster environment
If you see this message in Cluster Environment build logs:
[INFO] 6/23/2021, 4:20:16 PM: Running setup.py install for <SOME PIP PACKAGE>: finished with status 'error'
[ERROR] 6/23/2021, 4:20:16 PM: ERROR: Command errored out with exit status 1:
...
ImportError: Something is wrong with the numpy installation. While importing we detected an older version of numpy in ['/home/ray/anaconda3/lib/python3.8/site-packages/numpy']. One method of fixing this is to repeatedly uninstall numpy until none is found, then reinstall this version.
Workaround
- Remove from pip section
- Add the following to post build commands:
/home/ray/anaconda3/bin/python -m pip uninstall -y numpy
rm -rf /home/ray/anaconda3/lib/python3.<7 OR 8>/site-packages/numpy
/home/ray/anaconda3/bin/pip install numpy
/home/ray/anaconda3/bin/pip install --upgrade --no-cache-dir <SOME PIP PACKAGE>
Installing Horovod requires extra steps
Horovod cannot be installed simply by adding it to the pip packages.
Workaround
Set the appropriate environment variables in the cluster environment:
HOROVOD_WITH_TENSORFLOW=1
HOROVOD_WITH_GLOO=1
Add the following post-build commands:
pip install horovod[tensorflow]
pip install horovod[ray]
Anyscale VSCode Desktop Plugin on Windows
The Anyscale Plugin for VSCode Desktop is not supported on Windows. Anyscale is working on a resolution for this. There is no workaround at this time.
Workaround
This will be resolved with a future Anyscale Plugin for VSCode Desktop.
Cluster Runtime Environment Log is empty
This is a known issue with the UI and will be resolved with a future Anyscale release.
Workaround
Logs for a cluster can be downloaded via
anyscale logs cluster --id <cluster_id> --download
Anyscale Services fail to start with Ray 2.3.0
Anyscale Services with Ray 2.3.0 will fail to start if the entrypoint uses serve. ex:
entrypoint: serve run --non-blocking serve_hello:entrypoint
Workaround
Verify that you're python code has:
if __name__ == __main__:
serve.run()
And change the entrypoint to just call your python file:
entrypoint: python <python_entry_point.py>
This will be fixed in Ray 2.3.1
Grafana Dashboards fail to load
Cluster Grafana dashboards fail to load with a 401 Unauthorized error
in Safari and Firefox browsers. This alert can also appear on the Ray dashboard. While this will be addressed in a future release of Anyscale, there are currently workarounds for both browsers.
Workaround
In Safari, you need to disable Prevent cross-site tracking
.
In Firefox, you need to disable Enhanced Tracking Protection
Empty job logs from SDK/CLI
When on an Anyscale 2.0 cloud, empty job logs are returned from any of following methods:
CLI
> anyscale job logs --id prodjob_123
SDK
from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
sdk.fetch_production_job_logs()
sdk.get_production_job_logs()
sdk.fetch_job_logs()
Workaround
Please make sure your anyscale
package >= 0.5.84
pip install anyscale --upgrade
In addition, get_production_job_logs
is deprecated on Anyscale 2.0. Please retrieve the logs string directly with fetch_production_job_logs
.
Network Connectivity
The Anyscale Console checks connectivity with your running Anyscale Cluster because all network traffic goes directly from your browser to the cluster, never transiting the Anyscale Control Plane. When these health checks fail, a banner (like the one below) is shown to indicate potential problems.
Effect of the issue
Some features of the Anyscale Console may not work properly, including:
- Ray Dashboard
- Workspace Jupyter Notebook
- Workspace Web Terminal
- Workspace Duplication
Possible causes of the issue
Network connectivity issues can arise at any of the components between the browser on your machine and the web servers on the Anyscale Cluster Head Node. Common sources of issues are:
- Browser-based restrictions on cross-origin requests.
- Residential ISP blocking.
- Overly restrictive networking configuration in the Anyscale Cloud.
- Head node pressure disrupting the networking stack.
If you are using a cloud with customer defined networking (access to Anyscale Clusters is not routed over the internet and is aided by a VPN or similar solution), please ensure your access technology is working properly on your local device.
Potential Workarounds
- Ensure that the Head node is healthy. Navigate to Grafana and verify that the head node is not using all available resources (CPU, memory, disk).
- Check for ISP-based blocking if you are connected to residential WiFi with Comcast XFinity or AT&T Home WiFi. We have noticed that the home routers that these ISPs provide will occasionally block connections to Anyscale Clusters. To fix this, access your router's Advanced Protection page and allow access for the Anyscale Cluster's domain. Alternatively, try activating a consumer VPN (if you have one) or connecting from a different ISP--for example, from a mobile hot-spot or an alternative location.
- Try an alternate browser to rule out the possibility of browser-based blocking. Some browser-based privacy features interfere with this health checking because the Anyscale Cluster is on a different domain than the Anyscale Console.
- Finally, ensure that network egress on the cloud platform allows access from your location. Restrictive Security Group (AWS) or Firewall Rules (GCP) can limit which IP addresses can access Anyscale Clusters.
Unexpected costs from default logging on GCP
By default, GCP enables a logging functionality that may result in unexpected costs on your GCP account. To avoid these costs you can optionally disable logging as follows:
- Navigate to "Log Router"
- Select "_Default" sink, click "Edit Sink"
- At the bottom, navigate to "Choose logs to filter out of sink (optional)" and type in the following:
resource.type="gce_instance" AND logName="projects/dogfood-clouds/logs/syslog"
If you want to exclude all GCE logs related to VMs, you can do the following:
resource.type="gce_instance"