Skip to main content

Known Issues

apt-get update -y fails when building a docker image based on Anyscale ray images

If you see this message when building a docker image:

[INFO] 3/4/2024, 5:56:03 PM: #13 1.970 Reading package lists...
[INFO] 3/4/2024, 5:56:03 PM: #13 2.857 E: The repository 'http://apt.kubernetes.io kubernetes-xenial Release' no longer has a Release file.
[INFO] 3/4/2024, 5:56:03 PM: #13 ERROR: process "/bin/sh -c sudo apt-get update -y" did not complete successfully: exit code: 100

Workaround

  • Remove the kubernetes repository
  • Run sudo apt-get update -y again
RUN sudo rm -f /etc/apt/sources.list.d/kubernetes.list
RUN sudo apt-get update -y

pip packages fail to install within a cluster environment

If you see this message in Cluster Environment build logs:

[INFO] 6/23/2021, 4:20:16 PM:     Running setup.py install for <SOME PIP PACKAGE>: finished with status 'error'
[ERROR] 6/23/2021, 4:20:16 PM: ERROR: Command errored out with exit status 1:
...
ImportError: Something is wrong with the numpy installation. While importing we detected an older version of numpy in ['/home/ray/anaconda3/lib/python3.8/site-packages/numpy']. One method of fixing this is to repeatedly uninstall numpy until none is found, then reinstall this version.

Workaround

  • Remove from pip section
  • Add the following to post build commands:
/home/ray/anaconda3/bin/python -m pip uninstall -y numpy
rm -rf /home/ray/anaconda3/lib/python3.<7 OR 8>/site-packages/numpy
/home/ray/anaconda3/bin/pip install numpy
/home/ray/anaconda3/bin/pip install --upgrade --no-cache-dir <SOME PIP PACKAGE>

Installing Horovod requires extra steps

Horovod cannot be installed simply by adding it to the pip packages.

Workaround

Set the appropriate environment variables in the cluster environment:

HOROVOD_WITH_TENSORFLOW=1
HOROVOD_WITH_GLOO=1

Add the following post-build commands:

pip install horovod[tensorflow]
pip install horovod[ray]

Anyscale VSCode Desktop Plugin on Windows

The Anyscale Plugin for VSCode Desktop is not supported on Windows. Anyscale is working on a resolution for this. There is no workaround at this time.

Workaround

This will be resolved with a future Anyscale Plugin for VSCode Desktop.

Cluster Runtime Environment Log is empty

This is a known issue with the UI and will be resolved with a future Anyscale release.

Workaround

Logs for a cluster can be downloaded via

anyscale logs cluster --id <cluster_id> --download

Anyscale Services fail to start with Ray 2.3.0

Anyscale Services with Ray 2.3.0 will fail to start if the entrypoint uses serve. ex:

entrypoint: serve run --non-blocking serve_hello:entrypoint

Workaround

Verify that you're python code has:

if __name__ == __main__:
serve.run()

And change the entrypoint to just call your python file:

entrypoint: python <python_entry_point.py>

This will be fixed in Ray 2.3.1

Grafana Dashboards fail to load

Cluster Grafana dashboards fail to load with a 401 Unauthorized error in Safari and Firefox browsers. This alert can also appear on the Ray dashboard. While this will be addressed in a future release of Anyscale, there are currently workarounds for both browsers.

Workaround

In Safari, you need to disable Prevent cross-site tracking.

Safari Fix

In Firefox, you need to disable Enhanced Tracking Protection

Firefox Fix

Empty job logs from SDK/CLI

When on an Anyscale 2.0 cloud, empty job logs are returned from any of following methods:

CLI

> anyscale job logs --id prodjob_123

SDK

from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
sdk.fetch_production_job_logs()
sdk.get_production_job_logs()
sdk.fetch_job_logs()

Workaround Please make sure your anyscale package >= 0.5.84

pip install anyscale --upgrade

In addition, get_production_job_logs is deprecated on Anyscale 2.0. Please retrieve the logs string directly with fetch_production_job_logs.

Network Connectivity

The Anyscale Console checks connectivity with your running Anyscale Cluster because all network traffic goes directly from your browser to the cluster, never transiting the Anyscale Control Plane. When these health checks fail, a banner (like the one below) is shown to indicate potential problems. Example Banner

Effect of the issue

Some features of the Anyscale Console may not work properly, including:

  • Ray Dashboard
  • Workspace Jupyter Notebook
  • Workspace Web Terminal
  • Workspace Duplication

Possible causes of the issue

Network connectivity issues can arise at any of the components between the browser on your machine and the web servers on the Anyscale Cluster Head Node. Common sources of issues are:

  • Browser-based restrictions on cross-origin requests.
  • Residential ISP blocking.
  • Overly restrictive networking configuration in the Anyscale Cloud.
  • Head node pressure disrupting the networking stack.
note

If you are using a cloud with customer defined networking (access to Anyscale Clusters is not routed over the internet and is aided by a VPN or similar solution), please ensure your access technology is working properly on your local device.

Potential Workarounds

  1. Ensure that the Head node is healthy. Navigate to Grafana and verify that the head node is not using all available resources (CPU, memory, disk).
  2. Check for ISP-based blocking if you are connected to residential WiFi with Comcast XFinity or AT&T Home WiFi. We have noticed that the home routers that these ISPs provide will occasionally block connections to Anyscale Clusters. To fix this, access your router's Advanced Protection page and allow access for the Anyscale Cluster's domain. Alternatively, try activating a consumer VPN (if you have one) or connecting from a different ISP--for example, from a mobile hot-spot or an alternative location.
  3. Try an alternate browser to rule out the possibility of browser-based blocking. Some browser-based privacy features interfere with this health checking because the Anyscale Cluster is on a different domain than the Anyscale Console.
  4. Finally, ensure that network egress on the cloud platform allows access from your location. Restrictive Security Group (AWS) or Firewall Rules (GCP) can limit which IP addresses can access Anyscale Clusters.