Bring your own Docker environments
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Anyscale cluster environments can be configured to launch with user-specified Docker images. This can be useful to:
- Build images with dependencies and packages that aren't publicly available.
- Keep images within your organization's account.
- Leverage your existing CI/CD pipelines to build and manage Anyscale cluster environments.
The following diagram illustrates the relationship between the Anyscale control plane and your organization's data plane in a CI/CD pipeline:
Getting started
Prerequisites
- A local installation of Docker (for building and pushing images).
- (Optional) Anyscale CLI version
0.5.50
or higher, if you want to use the CLI to create cluster environments. - (Optional) Amazon ECR access set up, if you want to access images stored in a private ECR repository.
Step 1: Build an Anyscale compatible image
Anyscale provides public base images pre-installed with all the necessary dependencies to run Ray on Anyscale, for example anyscale/ray:2.9.3
. A full list of base images and their dependencies can be found here. Once you've selected a base image, you can create a Dockerfile with additional dependencies:
# Use Anyscale base image
FROM anyscale/ray:2.9.3-py310
RUN sudo apt-get update && sudo apt-get install -y axel nfs-common zip unzip awscli && sudo apt-get clean
RUN pip install --no-cache-dir -U sympy
# (Optional) Verify that dependencies from the base image still work. This
# is useful for catching dependency conflicts at build time.
RUN echo "Testing Ray Import..." && python -c "import ray"
RUN ray --version
RUN jupyter --version
RUN anyscale --version
RUN sudo supervisord --version
Once you've created your Dockerfile, you can build and tag it with:
docker build -t <your-registry>:<your-tag> .
The Anyscale base images come with a default entrypoint set. Overwriting this entrypoint may break the Web Terminal and Jupyter notebook server when you launch your cluster. See this section for details on bypassing this entrypoint when running the image locally.
If your image is based on an image with ray version 2.7.X or lower. See this section for details about apt-get update
failures caused by legacy k8s repository.
Step 2: Push your image
Push your image to a docker registry. This is currently supported for the following registries:
- Any publicly accessible registry. For example, Docker Hub with no auth.
- Private cloud provider managed registries:
- Amazon Elastic Container Registry (Amazon ECR). See this guide for setting up access to private ECR on AWS.
- Google Artifact Registry. By default, Anyscale managed nodes on GCP have read access to images stored in Artifact Registry within the same project.
- Private third-party registries (Docker Hub, JFrog Artifactory, etc...). See this guide for setting up access to third-party registries.
See the following guides for details on pushing images to Amazon ECR:
See the following guides for details on pushing images to Artifact Registry:
Step 3: Create a cluster environment for your image
- CLI
- SDK
- UI
Create a YAML configuration file like the following:
docker_image: my-registry/my-image:tag
ray_version: 2.9.3 # Replace this with the version of Ray in your image
env_vars: # Optionally, specify environment variables
MY_VAR: value
registry_login_secret: mysecretid # Optional, only needed for private third-party registries
Then, run the following:
anyscale cluster-env build -n <cluster-env-name> my_cluster_env.yaml
from anyscale import AnyscaleSDK
from anyscale.sdk.anyscale_client import CreateBYODClusterEnvironment
sdk = AnyscaleSDK()
create_cluster_environment = CreateBYODClusterEnvironment(
name="<cluster-env-name>",
config_json={
"docker_image": "my-registry/my-image:tag",
"ray_version": "2.9.3", # Replace this with the version of Ray in your image
"env_vars": { # Optionally, specify environment variables
"MY_VAR": "value"
},
"registry_login_secret": "mysecretid" # Optional, only needed for private third-party registries
}
)
cluster_environment_build = sdk.build_cluster_environment(
create_cluster_environment=create_cluster_environment
)
In the Anyscale console, navigate to "Configurations -> Cluster Environments -> Create new environment," and under "Base docker image" select "Use my own docker image"
Step 4: Launch a workload with your image
Once you've created a cluster environment with your image, you can reference the environment when starting clusters or other workloads in your account.
Clusters
- CLI
- SDK
- UI
anyscale cluster start --env=<cluster-env-name>
from anyscale import AnyscaleSDK
cluster_env_name = "cluster-env-name" # Replace this
cluster_name = "my-cluster" # Replace this
sdk = AnyscaleSDK()
# Search for your cluster environment
cluster_environments = sdk.search_cluster_environments(
{
"name": {"equals": cluster_env_name},
"paging": {"count": 1},
}
).results
if len(cluster_environments) == 0:
print(f"Couldn't find cluster environment `{cluster_env_name}`")
exit()
cluster_environment = cluster_environments[0]
# Search for builds for that cluster environment
builds = sdk.list_cluster_environment_builds(
cluster_environment.id
).results
# Select the highest revision
build = max(builds, key=lambda b: b.revision)
# Launch your cluster
cluster = sdk.launch_cluster(
project_id=None,
cluster_name=cluster_name,
cluster_environment_build_id=build.id
).result
print(f"Started cluster with ID `{cluster.id}`")
In the Anyscale console, navigate to "Clusters -> Create", and under " Cluster environment" select your cluster environment.
Jobs and Services
You can specify your custom Docker environment in the cluster_env
field of the YAML configuration for your jobs and services.
Advanced
Init Scripts (Public Beta)
An init script is a shell script that runs inside of the Ray container on all nodes before Ray starts. Common use cases include:
- Performing commands to fetch resources or other runtime dependencies
- Installing container-based monitoring / security agents
- Pre-job testing & verification for complex health-checks (for example, verifying network paths before starting jobs)
To add init scripts to your Docker image, write them into /anyscale/init
when you build your image.
All output from init scripts is written into /tmp/ray/startup-actions.log
. If init scripts fail to execute on a node, standard out & standard error will be shown in the Event Log for the cluster associated with your Job/Service/Workspace, and the node will be terminated.
Troubleshooting
Debugging cluster startup failures
To troubleshoot clusters that won't start up, start by looking in the cluster's Event Log
for any helpful tips.
Debugging Ray container utilities (Jupyter, VS Code, Web Terminal)
To troubleshoot issues with utilities that are run inside of the Ray container, the following log files may be useful:
/tmp/ray/jupyter.log
- Jupyter log/tmp/ray/vscode.log
- VS Code log/tmp/ray/web_terminal_server.log
- Web Terminal system log
If you are unable to access these log files through the Web Terminal, they are also accessible by downloading Ray logs for the cluster:
anyscale logs cluster --id [CLUSTER_ID] --download
Running the image locally
When doing docker run -it <your-image>
, you may run into an error similar to the following:
Error: Format string '/home/ray/anaconda3/bin/anyscale session web_terminal_server --deploy-environment %(ENV_ANYSCALE_DEPLOY_ENVIRONMENT)s --cli-token %(ENV_ANYSCALE_CLI_TOKEN)s --host %(ENV_ANYSCALE_HOST)s --working-dir %(ENV_ANYSCALE_WORKING_DIR)s --session-id %(ENV_ANYSCALE_SESSION_ID)s' for 'program:web_terminal_server.command' contains names ('ENV_ANYSCALE_DEPLOY_ENVIRONMENT') which cannot be expanded. Available names: ENV_BUILD_DATE, ENV_HOME, ENV_HOSTNAME, ENV_LANG, ENV_LC_ALL, ENV_LOGNAME, ENV_PATH, ENV_PWD, ENV_PYTHONUSERBASE, ENV_RAY_USAGE_STATS_ENABLED, ENV_RAY_USAGE_STATS_PROMPT_ENABLED, ENV_RAY_USAGE_STATS_SOURCE, ENV_SHELL, ENV_SUDO_COMMAND, ENV_SUDO_GID, ENV_SUDO_UID, ENV_SUDO_USER, ENV_TERM, ENV_TZ, ENV_USER, group_name, here, host_node_name, process_num, program_name in section 'program:web_terminal_server' (file: '/etc/supervisor/conf.d/supervisord.conf')
This is caused by Anyscale's custom entrypoint, which requires certain environment variables set to work. To get around this, you can manually override the entrypoint when running the image with the following command:
docker run -it --entrypoint bash <your-image>
This will give you an interactive shell into the image locally.
Docker write: no space left on device
If you’re pulling a large image, you may run out of disk space on your nodes. You can work around this by configuring a larger volume in your compute config’s advanced options:
- Navigate to Configurations->Cluster compute configs in the Anyscale console.
- Select "Create new config"
- Pick a name for your compute config, and set Cloud name to the cloud corresponding you want to launch your workload in.
- Navigate to the Advanced configuration section near the bottom of the page.
- Add the following configuration to the Advanced configuration setting to attach a 250 GB volume (you can tune this to an appropriate size for your image).
{
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"VolumeSize": 250,
"DeleteOnTermination": true
}
}
]
}
Note that "DeleteOnTermination" should be set to true to clean up the volume after the instance is terminated.
Installing stable versions of Ray on top of nightly CUDA images
Older versions of Ray may not have base images available for newer versions of CUDA. In this scenario, you can use the nightly base images and reinstall a stable version of Ray on top of the nightly image. For example, to use CUDA 12.1 with Ray 2.5.0, you can create a Dockerfile similar to the following:
FROM anyscale/ray:nightly-py310-cu121
pip uninstall -y ray && pip install -U ray==2.5.0
If the version of CUDA you need isn't already supported in the nightly images, contact support.
docker: Error response from daemon: no basic auth credentials.
This section assumes that Anyscale nodes are launched into your account with the <cloud-id>-cluster_node_role
role. If your nodes are being launched with ray-autoscaler-v1
, or if you are using a custom AWS IAM role then you can apply the same steps to that role instead to grant ECR access.
This error can happen if the nodes launched in your account don't have permission to pull the image you specified. If you're using Amazon ECR to host your images, check that you've completed the Amazon ECR access set up steps. In particular, make sure that:
- The
<cloud-id>-cluster_node_role
role has theAmazonEC2ContainerRegistryReadOnly
policy attached. - The private ECR repository allows pulls from nodes with the
<cloud-id>-cluster_node_role
role. This is necessary if the private ECR repository is in a separate account from your EC2 instances.