Skip to main content

Anyscale Image Specification (Beta)

Introduction

This document outlines the specifications for creating a standardized image used in Anyscale infrastructure. It details the system requirements, essential software packages, and the Python libraries needed to ensure compatibility and performance in Anyscale environments.

System Requirements

  • Base Image: The image must use ubuntu:22.04 as its foundation, ensuring a stable and widely supported Linux environment.
  • User Configuration: Must include a user named "ray" with user ID 1000 and group ID 100. Also ray needs to able to run sudo without a password.
  • Working Directory: Set WORKDIR to /home/ray, which designates the primary directory for user operations and application execution.
  • Home Directory: Establish /home/ray as HOME, centralizing user configurations and runtime files.

Required System Packages

  • sudo
  • python
  • bash
  • openssh-server
  • openssh-client
  • rsync
  • zip
  • unzip
  • git
  • gdb
  • curl

Required Python Packages

  • ray (The installed version of Ray must be greater than 2.7 to ensure optimal functionality and compatibility.)
  • anyscale
  • packaging
  • boto3
  • google
  • google-cloud-storage
  • jupyterlab

Anyscaled reserved resources

Filesystem Paths:

  • /etc/anyscale
  • /opt/anyscale
  • /tmp/anyscale
  • /tmp/ray
  • /mnt/

Network Ports:

80, 443, 1010, 1012, 2222, 5555, 5903, 6379, 6822, 6823, 6824, 6826, 7878, 8000 ,8076, 8085, 8201, 8265, 8266, 8686, 8687, 8912, 8999, 9090, 9092, 9100, 9478 ,9479, 9480, 9481, 9482

Example Dockerfile

# syntax=docker/dockerfile:1.3-labs

FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive

# Install basic dependencies and setup ray user with sudoer permissions.
# Note that ray user should be (uid: 1000, gid: 100) to work with shared file
# systems.
# Add gdb since ray dashboard uses `memray attach`, which requires gdb.
RUN <<EOF
#!/bin/bash
set -euxo pipefail

apt-get update -y
apt-get install -y --no-install-recommends sudo tzdata openssh-client openssh-server rsync zip unzip git gdb
# Install Python -- you can replace this with whatever Python installation method
# you want (i.e. conda, etc...), as long as `python` is on PATH. At runtime
# we'll source `/home/ray/.bashrc` in case you modify PATH there. This example uses
# virtualenv
apt-get install -y python3-venv

apt-get clean
rm -rf /var/lib/apt/lists/*

# Work around for https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/45234
mkdir -p /var/run/sshd

useradd -ms /bin/bash -d /home/ray ray --uid 1000 --gid 100
usermod -aG sudo ray
echo 'ray ALL=NOPASSWD: ALL' >> /etc/sudoer
EOF

# Switch to ray user
USER ray
ENV HOME=/home/ray
ENV PATH=/home/ray/virtualenv/bin:$PATH

RUN <<EOF
#!/bin/bash
# Run as user ray from here.
su --login ray
python3 -m venv --system-site-packages /home/ray/virtualenv
export PATH=/home/ray/virtualenv/bin:$PATH

# jupyterlab is only needed if you want to access Jupyter notebooks from the web UI.
# Note that this only installs `ray[default]` to minimize the amount of dependencies,
# you can add extra libraries such as tune with `ray[default,tune]`. See the Ray
# docs for more info: https://docs.ray.io/en/latest/ray-overview/installation.html
pip install --no-cache-dir anyscale jupyterlab ray[default]

# If you want to run your cluster on google cloud platform, you should uncomment the following line.
# pip install --no-cache-dir google google-cloud-storage

# Start of Workspace dependencies: this section is only needed if you want your image to run on Workspaces.

# This flushes bash history after each command, so that workspaces can persist it.
echo 'PROMPT_COMMAND="history -a"' >> /home/ray/.bashrc

# If the workspacerc exists, load it.
if [[ -e ~/.workspacerc ]]; then source ~/.workspacerc; fi

# End of Workspace dependencies
EOF