Skip to main content
Version: 1.0.0

Running Stable Diffusion V2 on Anyscale

Check your docs version

Anyscale is rolling out a new design. If you have preview access to the enhanced experience, use the latest version of the docs and see the migration guide for transitioning.

In this example, we will deploy a stable diffusion model on Anyscale using Anyscale Production Services

Setup your environment

Start by making sure Anyscale is properly setup on your local machine. You only need to do it once.

pip install -U anyscale
anyscale login

The configuration required is in a GitHub repo, let's clone it.

git clone
cd docs_examples/stable-diffusion

An Anyscale production service consists of a managed Ray Cluster running on your infrastructure. To run it, we need the following setup:

  • A cluster environment describing the container image and Python dependencies. You can learn more about it here.
  • A service spec describing the configuration for the size of cluster (number of nodes) and the entrypoint to your code. You can learn more about it here.
  • The code that host the model. The code is written in Ray Serve framework to enable seamless scaling for your ML model in production.

Build the cluster environment

We will start by building the cluster environment. In particular, we will be specifying a list of pip dependencies to be installed on top of Anyscale base machine learning image.

The following command will take about 10-20 minutes because it is installing and packaging the diffusers library and re-installing PyTorch. You only need to do it once.

$ anyscale cluster-env build cluster_env.yaml --name stable-diffusion-env

Loaded Anyscale authentication token from ~/.anyscale/credentials.json.

(anyscale +0.5s) Creating new cluster environment stable-diffusion-env
(anyscale +1.0s) Waiting for cluster environment to build. View progress at
(anyscale +1.0s) status: pending
(anyscale +16.0s) status: pending
(anyscale +31.1s) status: in_progress
(anyscale +17m58.4s) status: in_progress
(anyscale +17m58.6s) Cluster environment successfully finished building.

You can take a look at the cluster_env.yaml to see which packages we installed.

base_image: anyscale/ray-ml:2.4.0-py310-gpu
env_vars: {}
- curl
- accelerate==0.14.0
- diffusers @ git+
- Pillow==9.3.0
- scipy==1.9.3
- torch==1.13.0
- torchvision==0.14.0
- transformers==4.24.0
- numpy==1.23.0
conda_packages: []
post_build_cmds: []

Deploy the service

Once the environment is built, you can run the following command to deploy a running service on Anyscale.

$ anyscale service rollout -f service.yaml
Loaded Anyscale authentication token from ~/.anyscale/credentials.json.

(anyscale +1.3s) No cloud or compute config specified, using the default: cpt_cbajad48yc8pi149wi9tai1e4j.
(anyscale +1.4s) No project specified. Continuing without a project.
(anyscale +2.2s) Maximum uptime is disabled for clusters launched by this service.
(anyscale +2.2s) Service service_wp5ygyetu3sn6s1bulp92ypr has been deployed. Current state of service: PENDING.
(anyscale +2.2s) Query the status of the service with `anyscale service list --service-id service_wp5ygyetu3sn6s1bulp92ypr`.
(anyscale +2.2s) View the service in the UI at

You can take a look at the service.yaml to see what we just deployed. You can also take a look at the service_aws.yaml or service_gcp.yaml which adds more configuration to select their instance types. In Anyscale, you can directly select the type of instances the cloud provider offers.

name: "diffusion-service"
cluster_env: stable-diffusion-env
import_path: ""
working_dir: ""

While the service will take sometimes to start, let us explain the configuration items:

cluster_env: stable-diffusion-env

This specifies the base environment for the cluster to use. This is the image we just built. The environment image is versioned and you can pin it to a specific version. You can also overlay runtime environment to your Ray application to allow different Python dependencies for different models in the same cluster.

working_dir: ""

This specifies the code repository to be used for our application. Anyscale also supports private repo and local directory upload. We recommend use a git URL for reproducibility.


This specifies how does Anyscale starts your service. We will use the python command to deploy the application in The application has two component: a FastAPI application validates and handle requests and a StableDiffusion deployment that auto-scale between 0 and 2 replicas.

Invoke the service

Let's navigate to the service page. You can find the link in the logs of the anyscale service rollout command. Something like:

(anyscale +2.9s) View the service in the UI at

We will now test the service by invoking it through the web interface. You can also call the service programmatically (see the instruction from top right corner's Query button).


  1. Wait for the service to be in a "Running" state.
  2. In the "Deployments" section, find the "APIIngress" row, click the "View" under "API Docs".
  3. You should now see a OpenAPI rendered documentation page.
  4. Click the /imagine endpoint, then "Try it out" to enable calling it via the interactive API browser.
  5. Fill in your prompt and click execute.

Because the diffusion model scales to zero, the first "cold start" invocation will be slow. Anyscale will bring up a GPU node and deploys the model there. You can observe the cold start process by going into the cluster page (in service page, under "Resource Usage", and under "Cluster").


Once you are in the cluster page, you can observe the status via "Autoscaler status log". It typically takes 5 to 8 minutes for a GPU node to be ready. You can also click "Dashboard" in the cluster page to observe the current node and process status.


After the GPU node is up and model deployed, the execute call should go through and inference should be performed.

Here's what a successful run should look like:


If the model is idle for some duration (default is 5 minutes), the model will be shutdown along with the GPU node but the APIIngress will still be available.

Next Up: Many Models Serving

This is just serving one single instance of the model. Anyscale can easily be extended to serve many different copies of stable diffusion models and scale them dynamically. Stay tuned and let us know if you are interested in learning more.