Skip to main content
Version: 1.0.0

Anyscale Image Specification (Beta)

Check your docs version

Anyscale is rolling out a new design. If you have preview access to the enhanced experience, use the latest version of the docs and see the migration guide for transitioning.

Introduction

This document outlines the specifications for creating a standardized image used in Anyscale infrastructure. It details the system requirements, essential software packages, and the Python libraries needed to ensure compatibility and performance in Anyscale environments.

System Requirements

  • Base Image: The image must use ubuntu:22.04 as its foundation, ensuring a stable and widely supported Linux environment.
  • User Configuration: Must include a user named "ray" with user ID 1000 and group ID 100. Also ray needs to able to run sudo without a password.
  • Working Directory: Set WORKDIR to /home/ray, which designates the primary directory for user operations and application execution. Ensure that the ray user has read and write permissions to this directory.
  • Home Directory: Establish /home/ray as HOME, centralizing user configurations and runtime files.

Required System Packages

  • sudo
  • python
  • bash
  • openssh-server
  • openssh-client
  • rsync
  • zip
  • unzip
  • git
  • gdb
  • curl

Required Python Packages

  • ray (The installed version of Ray must be greater than 2.7 to ensure optimal functionality and compatibility.)
  • anyscale
  • packaging
  • boto3
  • google
  • google-cloud-storage
  • jupyterlab

Anyscaled reserved resources

Filesystem Paths:

  • /etc/anyscale
  • /opt/anyscale
  • /tmp/anyscale
  • /tmp/ray
  • /mnt/

Network Ports:

80, 443, 1010, 1012, 2222, 5555, 5903, 6379, 6822, 6823, 6824, 6826, 7878, 8000 ,8076, 8085, 8201, 8265, 8266, 8686, 8687, 8912, 8999, 9090, 9092, 9100, 9478 ,9479, 9480, 9481, 9482

Workspace Dependencies

If the image is intended to run on Workspaces, the following additional dependencies are required:

  • Persistent Bash History: Add PROMPT_COMMAND="history -a" to /home/ray/.bashrc to ensure that the bash history is saved after each command.
  • Source .workspacerc: source ~/.workspacerc if it exists.

Example Dockerfile

# syntax=docker/dockerfile:1.3-labs

FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive

# Install basic dependencies and setup ray user with sudoer permissions.
# Note that ray user should be (uid: 1000, gid: 100) to work with shared file
# systems.
# Add gdb since ray dashboard uses `memray attach`, which requires gdb.
RUN <<EOF
#!/bin/bash
set -euxo pipefail

apt-get update -y
apt-get install -y --no-install-recommends sudo tzdata openssh-client openssh-server rsync zip unzip git gdb
# Install Python -- you can replace this with whatever Python installation method
# you want (i.e. conda, etc...), as long as `python` is on PATH. At runtime
# we'll source `/home/ray/.bashrc` in case you modify PATH there. This example uses
# virtualenv
apt-get install -y python3-venv

apt-get clean
rm -rf /var/lib/apt/lists/*

# Work around for https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/45234
mkdir -p /var/run/sshd

useradd -ms /bin/bash -d /home/ray ray --uid 1000 --gid 100
usermod -aG sudo ray
echo 'ray ALL=NOPASSWD: ALL' >> /etc/sudoer
EOF

# Switch to ray user
USER ray
ENV HOME=/home/ray
ENV PATH=/home/ray/virtualenv/bin:$PATH

RUN <<EOF
#!/bin/bash
# Run as user ray from here.
su --login ray
python3 -m venv --system-site-packages /home/ray/virtualenv
export PATH=/home/ray/virtualenv/bin:$PATH

# jupyterlab is only needed if you want to access Jupyter notebooks from the web UI.
# Note that this only installs `ray[default]` to minimize the amount of dependencies,
# you can add extra libraries such as tune with `ray[default,tune]`. See the Ray
# docs for more info: https://docs.ray.io/en/latest/ray-overview/installation.html
pip install --no-cache-dir anyscale jupyterlab ray[default]

# If you want to run your cluster on google cloud platform, you should uncomment the following line.
# pip install --no-cache-dir google google-cloud-storage

# Start of Workspace dependencies: this section is only needed if you want your image to run on Workspaces.

# This flushes bash history after each command, so that workspaces can persist it.
echo 'PROMPT_COMMAND="history -a"' >> /home/ray/.bashrc

# If the workspacerc exists, load it.
echo '[ -e ~/.workspacerc ] && source ~/.workspacerc' >> /home/ray/.bashrc


# End of Workspace dependencies
EOF