Develop and Deploy

Check your docs version

This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.

This tutorial covers the end to end workflow of developing, testing, and deploying a Ray Serve application on Anyscale. In particular, it covers the following workflows:

The development workflow using Anyscale Workspaces.
The production workflow using Anyscale Services.

Throughout the tutorial, we will be using a HuggingFace natural language model to classify sentiment of text. Even though we will only be serving one single model, you can adapt this tutorial to scale out the number of models being served.

Development workflow

Create a workspace

The development workflow is similar to any other Ray libraries. Use Anyscale Workspaces to iterate on your Ray application. Especially for the serving use case, Workspaces provide a persistent and scalable development environment for you to easily test machine learning models.

To start a workspace, you can either use the Web UI or the CLI.

Web UI
CLI

On your laptop, make sure anyscale CLI is installed:

pip install -U anyscale

Create a workspace using the anyscale workspace CLI command. To start, we need the following parameters:

project-id: you can obtain this by calling anyscale project list.
cloud-id: you can obtain this by calling anyscale cloud list.

compute-config-id: you can create one using the following file and command:

cloud_id: cld_xyz # TODO: fill in your cloud id
head_node_type:
  name: head_node
  instance_type: m5.2xlarge
worker_node_types:
  - name: cpu_worker
    instance_type: m5.4xlarge
    min_workers: 0
    max_workers: 10
    use_spot: true
  - name: gpu_worker
    instance_type: g4dn.4xlarge
    min_workers: 0
    max_workers: 10
    use_spot: true

$ anyscale cluster-compute create compute_config.yaml --name serve-tutorial-config

Authenticating
Loaded Anyscale authentication token from ~/.anyscale/credentials.json.

Output
(anyscale +0.6s) View this cluster compute at: https://console.anyscale.com/configurations/cluster-computes/cpt_Hsmn2dtxAiZytWZ3iPtwTD1y
(anyscale +0.6s) Cluster compute id: cpt_Hsmn2dtxAiZytWZ3iPtwTD1y
(anyscale +0.6s) Cluster compute name: serve-tutorial-config

cluster-env-build-id: Anyscale uses the default build ID anyscaleray-ml270optimized-py310-gpu, which provides most of the necessary environment already. Learn more about crafting your own cluster environment here.

Now we have all the parameters, let's create the workspace:

anyscale workspace create \
  --name serve-tutorial \
  --project-id "prj_a3cug4HTAiLFg2TiiY1K5ftZ" \
  --cloud-id "cld_4F7k8814aZzGG8TNUGPKnc" \
  --compute-config-id "cpt_Hsmn2dtxAiZytWZ3iPtwTD1y" \
  --cluster-env-build-id "anyscaleray-ml270optimized-py310-gpu"

You can check the status of the workspace via the web console (https://console.anyscale.com/workspaces). The workspace should be ready in a few minutes.

You can access the workspace in several ways:

Jupyter Lab
VS Code Web
VS Code Desktop
SSH Terminal

Write your application

For Ray Serve application, we recommend writing the program as Python scripts and modules, instead of using Jupyter Notebooks. Therefore, VS Code or SSH is preferred.

The development workflow of Ray Serve on Anyscale follows the one you can do on your laptop using Open Source Ray Serve:

Test the application with HTTP requests
Update your code and repeat

Now we will show how to try run the sentiment analysis model on Anyscale workspaces:

Workspace Development
Local Development

Open the workspace through either the web browser version or your desktop app. You can start by selecting VS Code Desktop under Tools menu.

If you don't have local VS Code installed, you can use the hosted VS Code option.

Once VSCode open, you should see an empty workspace at the start. In the VS Code terminal, type

# Initialize the workspace folder with the example repo
git clone https://github.com/anyscale/docs_examples .

You can iterate on the code repo with standard git workflow. The files are persisted across workspace restarts as well.

Now let's open sentiment_analysis folder and view the app.py file. You can directly edit the file, along with proper type hints and auto-completion built-in.

You can then activate clone the workspace directory locally and setup the code directory:

anyscale workspace clone --name serve-tutorial
cd serve-tutorial
# Initialize the workspace folder with the example repo
git clone https://github.com/anyscale/docs_examples .
# sync the local file to the remote cluster
anyscale workspace push
# Optionally, you can open a SSH tunnel to the remote cluster to browse around.
anyscale workspace ssh

You can open and edit the file locally and use anyscale workspace push command to sync the file. Alternatively, you can also edit the file remotely via SSH tunnel. Anyscale workspace tailors to your preferred development workflow.

Test your application

Now, let's run the Serve application.

Workspace Development
Local Development

Open the VS Code terminal:

cd sentiment_analysis && serve run app:model

In your local terminal:

anyscale workspace run 'cd sentiment_analysis && serve run app:model'

The output will be similar to the following:

2022-12-15 11:33:25,404 INFO scripts.py:304 -- Deploying from import path: "app:model".
2022-12-15 11:34:39,841 INFO worker.py:1342 -- Connecting to existing Ray cluster at address: 172.31.228.32:9031...
2022-12-15 11:34:39,849 INFO worker.py:1525 -- Connected to Ray cluster. View the dashboard at https://console.anyscale.com/api/v2/sessions/ses_HhL2FK5XwrNUkzsDHRXPndSj/services?redirect_to=dashboard
2022-12-15 11:34:39,852 INFO packaging.py:354 -- Pushing file package 'gcs://_ray_pkg_094d425b0eb2726023050ff58001a46e.zip' (0.04MiB) to Ray cluster...
2022-12-15 11:34:39,852 INFO packaging.py:367 -- Successfully pushed file package 'gcs://_ray_pkg_094d425b0eb2726023050ff58001a46e.zip'.
raylet) 2022-12-15 11:34:39,863        INFO runtime_env_agent.py:377 -- Successfully created runtime env: {"working_dir": "gcs://_ray_pkg_094d425b0eb2726023050ff58001a46e.zip"}, the context: {"command_prefix": ["cd", "/tmp/ray/session_2022-12-15_09-29-44_780598_165/runtime_resources/working_dir_files/_ray_pkg_094d425b0eb2726023050ff58001a46e", "&&"], "env_vars": {"PYTHONPATH": "/tmp/ray/session_2022-12-15_09-29-44_780598_165/runtime_resources/working_dir_files/_ray_pkg_094d425b0eb2726023050ff58001a46e"}, "py_executable": "/home/ray/anaconda3/bin/python", "resources_dir": null, "container": {}, "java_jars": []}
(ServeController pid=25363) INFO 2022-12-15 11:34:41,988 controller 25363 http_state.py:132 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-0f19344b4199e1b8b3c4db4638afc1bd47e072dbe5dba896fac7c5e3' on node '0f19344b4199e1b8b3c4db4638afc1bd47e072dbe5dba896fac7c5e3' listening on '127.0.0.1:8000'
(HTTPProxyActor pid=25417) INFO:     Started server process [25417]
(ServeController pid=25363) INFO 2022-12-15 11:34:43,933 controller 25363 deployment_state.py:1311 - Adding 1 replica to deployment 'LanguageModel'.
(ServeReplica:LanguageModel pid=25460) No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]
...
Downloading: 100%|██████████| 226k/226k [00:00<00:00, 799kB/s]
2022-12-15 11:35:03,897 SUCC scripts.py:333 -- Deployed successfully.

This command is a blocking command, it will run in the terminal forever until you interrupt it with Control+C command.

You can now query it using any HTTP client. For this example, we will demonstrate it using Python requests module. Open a new terminal using VS Code or SSH:

import requests

resp = requests.get(
    "http://localhost:8000/predict", params={"text": "Anyscale workspaces are great!"}
)
print(resp.json())

$ python query.py
{"label":"POSITIVE","score":0.9998451471328735}

(Optional) Make the endpoint accessible from your local machine's browser

Workspace Development
Local Development

Follow these steps on your local machine to expose the port on your localhost. Open a terminal and execute the following commands:

# Clone the workspace
anyscale workspace clone --name serve-tutorial
# Navigate to the workspace directory
cd serve-tutorial
# Set up SSH local port forwarding to expose port 8000
anyscale workspace ssh -- -L 8000:localhost:8000

In your local terminal, run the below command to port forward the serve endpoint to your local machine:

# Set up SSH local port forwarding to expose port 8000
anyscale workspace ssh -- -L 8000:localhost:8000

Once you've completed the above steps for your chosen development environment, you can visit the endpoint directly in your browser. For instance, http://localhost:8000/predict?text=Anyscale%20are%20great!.

(Optional) Edit your application

Now that you have a working application, you can edit the application for your own use case. Here are few suggestions:

Use Anyscale Workspaces to leverage multiple nodes and GPU nodes. Try changing the @serve.deployment(route_prefix="/") to the following to see what happens!
- Add a GPU using @serve.deployment(route_prefix="/", ray_actor_options={"num_gpus": 1})
- Run multiple replicas using @serve.deployment(route_prefix="/", num_replicas=10)
Try a more heavyweight HuggingFace pipeline. Anyscale Workspaces give you access to powerful instances in the cloud, with faster network connections to cloud object storage--no more waiting for the model to download!
- Change the model to a heavy model like text generation by setting model = LanguageModel.bind(task="text-generation")
- Add support for configuring the model by pass in init parameters similar to task.

The takeaway of these tasks is that these are not something that comes easily with local laptop. Anyscale workspace is the natural place to develop your serving application and have access to elastic resource. For the complete list of configurable options for your serve deployment, please refer to the Ray Serve deployment docs.

note

Pausing and continuing workspace

As long as you terminate the python command. Anyscale will automatically shutdown idle clusters after some timeout (defaults to 2 hours).

When you come back to work on the same project, you can just resume the workspace by starting it in the UI or running anyscale workspace start in the project directory.

Moving to production

After completing the development of your application, you have the ability to deploy it using Anyscale production services. Anyscale production services offer the benefits of running your Ray Serve workload with high availability and fault tolerance. You can learn more here.

Set up your service

To use Anyscale services, you can use the CLI command or python SDK. These can be executed from your personal laptop or within a development workspace.

CLI
Python SDK

First, create and update the YAML configuration file for service.

name: "sentiment-service"
cluster_env: default_cluster_env_ml_2.9.0_py310
ray_serve_config:
  applications:
    - name: sentiment_analysis
      import_path: "sentiment_analysis.app:model"
      runtime_env:
        working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"

note

For more information regarding the YAML schema, checkout the API reference.

Next, you can deploy the service using the following CLI command. You should see relevant links in the output.

$ anyscale service rollout -f config.yaml
Authenticating
Loaded Anyscale authentication token from ~/.anyscale/credentials.json.

Output
(anyscale +4.5s) Service service2_v96jsyvutffntejfivh3inczcd has been deployed. Service is transitioning towards: RUNNING.
(anyscale +4.5s) View the service in the UI at https://console.anyscale.com/services/service2_v96jsyvutffntejfivh3inczcd

from anyscale.sdk.anyscale_client.models import *
from anyscale import AnyscaleSDK

sdk = AnyscaleSDK()

apply_model = ApplyProductionServiceV2Model(
        name="sentiment-service",
        # The IDs below are examples and should be replaced with your own IDs.
        # project_id can be found in the URL by navigating to the project in Anyscale Console
        project_id="<PROJECT_ID>",
        # IDs can be found on Anyscale Console under Configurations.
        compute_config_id="<COMPUTE_CONFIG_ID>",
        build_id="anyscaleray-ml29-py310-gpu",
        ray_serve_config={
          "applications" [
              {
                "name": "sentiment_analysis"
                "runtime_env": {
                    "working_dir": "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"
                },
                "import_path": "sentiment_analysis.app:model",
              }
          ],
        },
    )

service = sdk.rollout_service(apply_model).result
service.id = service.id

print(f"View the service in the UI at https://console.anyscale.com/services/{service_id}")

You can then visit the service link to:

Observe the current status of the service
View service logs and metrics
View the OpenAPI documentation page
Query the API

note

After cloning the repository, you can set the working_dir to . if the Service YAML definition file is in the same folder as the python files you are deploying.

runtime_env:
-  working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
+  working_dir: "."

If you inspect the service configuration in the created service page, you can see the runtime environment being automatically populated for you.

runtime_env:
  working_dir: >-
    s3://anyscale-test-data-cld-1234567890abcdefghijklmn/org_1234567890abcdefghijkl/cld_1234567890abcdefghijklmn/workspace_snapshots/expwrk_z6az4rxzspb1gce1bnucuntcar/_anyscale_pkg_d622760f82c2a6650e94b009fcb3196b.zip

You can also upload to a bucket of your choice by adding an upload_path line.

runtime_env:
-  working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
+  working_dir: .
+  upload_path: "s3://your-bucket/path"

Query your service

To query your service, navigate to the service page and select the 'Query' button located at the top right corner. This will display both the curl and Python commands. Moreover, the Python SDK provides a programmatic way to retrieve the token and URL needed to query the service.

Anyscale services are exposed over the HTTP protocol. Thus, you can use any HTTP client as long as you add the Bearer Token in the headers.

curl (Web)
Python (Web)
Python SDK

import requests
import time
from anyscale.sdk.anyscale_client.models import *
from anyscale import AnyscaleSDK

def wait_for_service_state(sdk, service_id: str, expected_state: ServiceEventCurrentState, timeout_s = 1200):
    """
    This is a helper method to wait for the Service to enter an expected state
    """
    start = time.time()
    while time.time() - start < timeout_s:
        service = sdk.get_service(service_id).result
        curr_state = service.current_state
        if curr_state == expected_state:
            return
        if expected_state != ServiceEventCurrentState.UNHEALTHY and curr_state == ServiceEventCurrentState.UNHEALTHY:
            raise RuntimeError(
                f"Service entered an unexpected state {curr_state}\n"
                f"Please check the error for {service.name} in the web UI."
            )
        time.sleep(1)
    curr_state = sdk.get_service(service_id).result.current_state
    raise TimeoutError(
        f"Service did not enter {expected_state} after {timeout_s} seconds."
    )

sdk = AnyscaleSDK()

# Replace the ID below with your Service ID
service_id = "<SERVICE_ID>"

service = sdk.get_service(service_id).result
token = service.auth_token

wait_for_service_state(sdk, service_id, ServiceEventCurrentState.RUNNING)

base_url = f"https://{service.hostname}"

# Requests config
path = "/"
full_url = f"{base_url}{path}"
headers = {"Authorization": f"Bearer {token}"}

resp = requests.get(full_url, headers=headers)

print(f"Service query returns: {resp.text}")

Tear-down

To shut down the service, you can use the Anyscale Web Console, the CLI, or the Python SDK.

Web
CLI
Python SDK

Find your Anyscale Service in the Web Console and click the Terminate button.

$ anyscale service terminate --name sentiment-service

Authenticating
Loaded Anyscale authentication token from ~/.anyscale/credentials.json.

Output
(anyscale +4.1s) Service service_zezrkzehhbrmem7mliq3xrnt has begun terminating...
(anyscale +4.1s)  Current state of service: RUNNING. Goal state of service: TERMINATED
(anyscale +4.1s) Query the status of the service with `anyscale service list --service-id service_zezrkzehhbrmem7mliq3xrnt`.

from anyscale import AnyscaleSDK

sdk = AnyscaleSDK()

# Replace the ID below with your Service ID
service_id = "<SERVICE_ID>"

service = sdk.terminate_service(service_id).result

print(f"You can view the status of Service at https://console.anyscale.com/services/{service.id}")

Development workflow​

Create a workspace​

Write your application​

Test your application​

(Optional) Make the endpoint accessible from your local machine's browser​

(Optional) Edit your application​

Moving to production​

Set up your service​

Query your service​

Tear-down​

Development workflow

Create a workspace

Write your application

Test your application

(Optional) Make the endpoint accessible from your local machine's browser

(Optional) Edit your application

Moving to production

Set up your service

Query your service

Tear-down