Skip to main content

Build your own image

Anyscale clusters can start with any image using the image_uri flag on jobs and services, however those images needs to adhere to Anyscale minimum requirements. In this guide we will build an end to end example of building an Anyscale compatible image locally, push it to a remote registry, and use it in Anyscale.

Getting started

Prerequisites

  • A local installation of Docker (for building and pushing images).
  • (Optional) Anyscale CLI version 0.5.50 or higher, if you want to use the CLI to create cluster environments.
  • (Optional) Amazon ECR access set up, if you want to access images stored in a private ECR repository.

Step 1: Build an Anyscale compatible image

Anyscale provides public base images pre-installed with all the necessary dependencies to run Ray on Anyscale, for example anyscale/ray:2.9.3. A full list of base images and their dependencies can be found here. Once you've selected a base image, you can create a Dockerfile with additional dependencies:

info

You can build an image that's not based on an Anyscale base image, however you must follow the specification outlined in the Image requirements reference.

# Use Anyscale base image
FROM anyscale/ray:2.9.3-py310

RUN sudo apt-get update && sudo apt-get install -y axel nfs-common zip unzip awscli && sudo apt-get clean

RUN pip install --no-cache-dir -U sympy

# (Optional) Verify that dependencies from the base image still work. This
# is useful for catching dependency conflicts at build time.
RUN echo "Testing Ray Import..." && python -c "import ray"
RUN ray --version
RUN jupyter --version
RUN anyscale --version
RUN sudo supervisord --version

Once you've created your Dockerfile, you can build and tag it with:

docker build -t <your-registry>:<your-tag> .
info

The Anyscale base images come with a default entrypoint set. Overwriting this entrypoint may break the Web Terminal and Jupyter notebook server when you launch your cluster. See this section for details on bypassing this entrypoint when running the image locally.

info

If your image is based on an image with Ray version 2.7.X or lower.

Step 2: Push your image

Push your image to a Docker registry. This is supported for the following registries:

  • Any publicly accessible registry. For example, Docker Hub with no auth.

  • Private cloud provider managed registries:

    • Amazon Elastic Container Registry Amazon ECR. See this guide for setting up access to private ECR on AWS.
    • Google Artifact Registry. By default, Anyscale managed nodes on GCP have read access to images stored in Artifact Registry within the same project.
  • Private third-party registries (Docker Hub, JFrog Artifactory). See this guide for setting up access to third-party registries.

See the following guides for details on pushing images to Amazon ECR:

See the following guides for details on pushing images to Artifact Registry:

Step 3: Use the image in Anyscale

For new workspaces:

  1. Go to the workspaces page.
  2. Click on the Create button, then "Custom blank workspace."

create blank workspace

  1. In the "Container image" section, select use my own image and configure the image_uri and the Ray version. create blank workspace container configuration

For existing workspaces:

  1. Go to the dependency tab
  2. Click the edit button on the container image.
  3. Select "Use my own image" option.

workspace

Troubleshooting

Debugging cluster startup failures

To troubleshoot clusters that won't start up, start by looking in the cluster's Event Log for any helpful tips.

Debugging Ray container utilities (Jupyter, VS Code, Web Terminal)

To troubleshoot issues with utilities that are run inside of the Ray container, the following log files may be useful:

  • /tmp/ray/jupyter.log - Jupyter log
  • /tmp/ray/vscode.log - VS Code log
  • /tmp/ray/web_terminal_server.log - Web Terminal system log

Running the image locally

When doing docker run -it <your-image>, you may run into an error similar to the following:

Error: Format string '/home/ray/anaconda3/bin/anyscale session web_terminal_server --deploy-environment %(ENV_ANYSCALE_DEPLOY_ENVIRONMENT)s --cli-token %(ENV_ANYSCALE_CLI_TOKEN)s --host %(ENV_ANYSCALE_HOST)s --working-dir %(ENV_ANYSCALE_WORKING_DIR)s --session-id %(ENV_ANYSCALE_SESSION_ID)s' for 'program:web_terminal_server.command' contains names ('ENV_ANYSCALE_DEPLOY_ENVIRONMENT') which cannot be expanded. Available names: ENV_BUILD_DATE, ENV_HOME, ENV_HOSTNAME, ENV_LANG, ENV_LC_ALL, ENV_LOGNAME, ENV_PATH, ENV_PWD, ENV_PYTHONUSERBASE, ENV_RAY_USAGE_STATS_ENABLED, ENV_RAY_USAGE_STATS_PROMPT_ENABLED, ENV_RAY_USAGE_STATS_SOURCE, ENV_SHELL, ENV_SUDO_COMMAND, ENV_SUDO_GID, ENV_SUDO_UID, ENV_SUDO_USER, ENV_TERM, ENV_TZ, ENV_USER, group_name, here, host_node_name, process_num, program_name in section 'program:web_terminal_server' (file: '/etc/supervisor/conf.d/supervisord.conf')

This is caused by Anyscale's custom entrypoint, which requires certain environment variables set to work. To get around this, you can manually override the entrypoint when running the image with the following command:

docker run -it --entrypoint bash <your-image>

This will give you an interactive shell into the image locally.

Docker write: no space left on device

If you’re pulling a large image, you may run out of disk space on your nodes. You can work around this by configuring a larger volume in your compute config’s advanced options:

  1. Navigate to Configurations->compute configs in the Anyscale console.
  2. Select "Create new config"
  3. Navigate to the Advanced configuration section near the bottom of the page.

advanced-config

  1. Add the following configuration to the Advanced configuration setting to attach a 250 GB volume (you can tune this to an appropriate size for your image).
{
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"VolumeSize": 250,
"DeleteOnTermination": true
}
}
]
}

Note that "DeleteOnTermination" should be set to true to clean up the volume after the instance is terminated.

Installing stable versions of Ray on top of nightly CUDA images

Older versions of Ray may not have base images available for newer versions of CUDA. In this scenario, you can use the nightly base images and reinstall a stable version of Ray on top of the nightly image. For example, to use CUDA 12.1 with Ray 2.5.0, you can create a Dockerfile similar to the following:

FROM anyscale/ray:nightly-py310-cu121

pip uninstall -y ray && pip install -U ray==2.5.0

If the version of CUDA you need isn't already supported in the nightly images, contact support.

Docker: Error response from daemon: no basic auth credentials

info

This section assumes that Anyscale nodes are launched into your account with the <cloud-id>-cluster_node_role role. If your nodes are being launched with ray-autoscaler-v1, or if you are using a custom AWS IAM role then you can apply the same steps to that role instead to grant ECR access.

This error can happen if the nodes launched in your account don't have permission to pull the image you specified. If you're using Amazon ECR to host your images, check that you've completed the Amazon ECR access set up steps. In particular, make sure that:

  • The <cloud-id>-cluster_node_role role has the AmazonEC2ContainerRegistryReadOnly policy attached.
  • The private ECR repository allows pulls from nodes with the <cloud-id>-cluster_node_role role. This is necessary if the private ECR repository is in a separate account from your EC2 instances.