Develop and Deploy
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
This tutorial covers the end to end workflow of developing, testing, and deploying a Ray Serve application on Anyscale. In particular, it covers the following workflows:
- The development workflow using Anyscale Workspaces.
- The production workflow using Anyscale Services.
Throughout the tutorial, we will be using a HuggingFace natural language model to classify sentiment of text. Even though we will only be serving one single model, you can adapt this tutorial to scale out the number of models being served.
Development workflow
Create a workspace
The development workflow is similar to any other Ray libraries. Use Anyscale Workspaces to iterate on your Ray application. Especially for the serving use case, Workspaces provide a persistent and scalable development environment for you to easily test machine learning models.
To start a workspace, you can either use the Web UI or the CLI.
- Web UI
- CLI
Go to https://console.anyscale.com/workspaces.
Click "Create" to create a new workspace for this tutorial. The default configuration should suffice.
The workspace should be created in few minutes.
See Get Started for more details about Anyscale Workspaces.
On your laptop, make sure anyscale
CLI is installed:
pip install -U anyscale
Create a workspace using the anyscale workspace
CLI command. To start, we need the following parameters:
-
project-id
: you can obtain this by callinganyscale project list
. -
cloud-id
: you can obtain this by callinganyscale cloud list
. -
compute-config-id
: you can create one using the following file and command:cloud_id: cld_xyz # TODO: fill in your cloud id
head_node_type:
name: head_node
instance_type: m5.2xlarge
worker_node_types:
- name: cpu_worker
instance_type: m5.4xlarge
min_workers: 0
max_workers: 10
use_spot: true
- name: gpu_worker
instance_type: g4dn.4xlarge
min_workers: 0
max_workers: 10
use_spot: true$ anyscale cluster-compute create compute_config.yaml --name serve-tutorial-config
Authenticating
Loaded Anyscale authentication token from ~/.anyscale/credentials.json.
Output
(anyscale +0.6s) View this cluster compute at: https://console.anyscale.com/configurations/cluster-computes/cpt_Hsmn2dtxAiZytWZ3iPtwTD1y
(anyscale +0.6s) Cluster compute id: cpt_Hsmn2dtxAiZytWZ3iPtwTD1y
(anyscale +0.6s) Cluster compute name: serve-tutorial-config -
cluster-env-build-id
: Anyscale uses the default build IDanyscaleray-ml270optimized-py310-gpu
, which provides most of the necessary environment already. Learn more about crafting your own cluster environment here.
Now we have all the parameters, let's create the workspace:
anyscale workspace create \
--name serve-tutorial \
--project-id "prj_a3cug4HTAiLFg2TiiY1K5ftZ" \
--cloud-id "cld_4F7k8814aZzGG8TNUGPKnc" \
--compute-config-id "cpt_Hsmn2dtxAiZytWZ3iPtwTD1y" \
--cluster-env-build-id "anyscaleray-ml270optimized-py310-gpu"
You can check the status of the workspace via the web console (https://console.anyscale.com/workspaces). The workspace should be ready in a few minutes.
You can access the workspace in several ways:
- Jupyter Lab
- VS Code Web
- VS Code Desktop
- SSH Terminal
Write your application
For Ray Serve application, we recommend writing the program as Python scripts and modules, instead of using Jupyter Notebooks. Therefore, VS Code or SSH is preferred.
The development workflow of Ray Serve on Anyscale follows the one you can do on your laptop using Open Source Ray Serve:
- Test the application with HTTP requests
- Update your code and repeat
Now we will show how to try run the sentiment analysis model on Anyscale workspaces:
- Workspace Development
- Local Development
Open the workspace through either the web browser version or your desktop app. You can start by selecting VS Code Desktop under Tools menu.
If you don't have local VS Code installed, you can use the hosted VS Code option.
Once VSCode open, you should see an empty workspace at the start. In the VS Code terminal, type
# Initialize the workspace folder with the example repo
git clone https://github.com/anyscale/docs_examples .
You can iterate on the code repo with standard git workflow. The files are persisted across workspace restarts as well.
Now let's open sentiment_analysis
folder and view the app.py
file. You can directly edit the file, along with proper type hints and auto-completion built-in.
You can then activate clone the workspace directory locally and setup the code directory:
anyscale workspace clone --name serve-tutorial
cd serve-tutorial
# Initialize the workspace folder with the example repo
git clone https://github.com/anyscale/docs_examples .
# sync the local file to the remote cluster
anyscale workspace push
# Optionally, you can open a SSH tunnel to the remote cluster to browse around.
anyscale workspace ssh
You can open and edit the file locally and use anyscale workspace push
command to sync the file. Alternatively, you can also edit the file remotely via SSH tunnel. Anyscale workspace tailors to your preferred development workflow.
Test your application
Now, let's run the Serve application.
- Workspace Development
- Local Development
Open the VS Code terminal:
cd sentiment_analysis && serve run app:model
In your local terminal:
anyscale workspace run 'cd sentiment_analysis && serve run app:model'
The output will be similar to the following:
2022-12-15 11:33:25,404 INFO scripts.py:304 -- Deploying from import path: "app:model".
2022-12-15 11:34:39,841 INFO worker.py:1342 -- Connecting to existing Ray cluster at address: 172.31.228.32:9031...
2022-12-15 11:34:39,849 INFO worker.py:1525 -- Connected to Ray cluster. View the dashboard at https://console.anyscale.com/api/v2/sessions/ses_HhL2FK5XwrNUkzsDHRXPndSj/services?redirect_to=dashboard
2022-12-15 11:34:39,852 INFO packaging.py:354 -- Pushing file package 'gcs://_ray_pkg_094d425b0eb2726023050ff58001a46e.zip' (0.04MiB) to Ray cluster...
2022-12-15 11:34:39,852 INFO packaging.py:367 -- Successfully pushed file package 'gcs://_ray_pkg_094d425b0eb2726023050ff58001a46e.zip'.
raylet) 2022-12-15 11:34:39,863 INFO runtime_env_agent.py:377 -- Successfully created runtime env: {"working_dir": "gcs://_ray_pkg_094d425b0eb2726023050ff58001a46e.zip"}, the context: {"command_prefix": ["cd", "/tmp/ray/session_2022-12-15_09-29-44_780598_165/runtime_resources/working_dir_files/_ray_pkg_094d425b0eb2726023050ff58001a46e", "&&"], "env_vars": {"PYTHONPATH": "/tmp/ray/session_2022-12-15_09-29-44_780598_165/runtime_resources/working_dir_files/_ray_pkg_094d425b0eb2726023050ff58001a46e"}, "py_executable": "/home/ray/anaconda3/bin/python", "resources_dir": null, "container": {}, "java_jars": []}
(ServeController pid=25363) INFO 2022-12-15 11:34:41,988 controller 25363 http_state.py:132 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-0f19344b4199e1b8b3c4db4638afc1bd47e072dbe5dba896fac7c5e3' on node '0f19344b4199e1b8b3c4db4638afc1bd47e072dbe5dba896fac7c5e3' listening on '127.0.0.1:8000'
(HTTPProxyActor pid=25417) INFO: Started server process [25417]
(ServeController pid=25363) INFO 2022-12-15 11:34:43,933 controller 25363 deployment_state.py:1311 - Adding 1 replica to deployment 'LanguageModel'.
(ServeReplica:LanguageModel pid=25460) No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
Downloading: 0%| | 0.00/255M [00:00<?, ?B/s]
...
Downloading: 100%|██████████| 226k/226k [00:00<00:00, 799kB/s]
2022-12-15 11:35:03,897 SUCC scripts.py:333 -- Deployed successfully.
This command is a blocking command, it will run in the terminal forever until you interrupt it with Control+C command.
You can now query it using any HTTP client. For this example, we will demonstrate it using Python requests
module. Open a new terminal using VS Code or SSH:
import requests
resp = requests.get(
"http://localhost:8000/predict", params={"text": "Anyscale workspaces are great!"}
)
print(resp.json())
$ python query.py
{"label":"POSITIVE","score":0.9998451471328735}
(Optional) Make the endpoint accessible from your local machine's browser
- Workspace Development
- Local Development
Follow these steps on your local machine to expose the port on your localhost. Open a terminal and execute the following commands:
# Clone the workspace
anyscale workspace clone --name serve-tutorial
# Navigate to the workspace directory
cd serve-tutorial
# Set up SSH local port forwarding to expose port 8000
anyscale workspace ssh -- -L 8000:localhost:8000
In your local terminal, run the below command to port forward the serve endpoint to your local machine:
# Set up SSH local port forwarding to expose port 8000
anyscale workspace ssh -- -L 8000:localhost:8000
Once you've completed the above steps for your chosen development environment, you can visit the endpoint directly in your browser. For instance, http://localhost:8000/predict?text=Anyscale%20are%20great!.
(Optional) Edit your application
Now that you have a working application, you can edit the application for your own use case. Here are few suggestions:
- Use Anyscale Workspaces to leverage multiple nodes and GPU nodes. Try changing the
@serve.deployment(route_prefix="/")
to the following to see what happens!- Add a GPU using
@serve.deployment(route_prefix="/", ray_actor_options={"num_gpus": 1})
- Run multiple replicas using
@serve.deployment(route_prefix="/", num_replicas=10)
- Add a GPU using
- Try a more heavyweight HuggingFace pipeline. Anyscale Workspaces give you access to powerful instances in the cloud, with faster network connections to cloud object storage--no more waiting for the model to download!
- Change the model to a heavy model like text generation by setting
model = LanguageModel.bind(task="text-generation")
- Add support for configuring the model by pass in init parameters similar to
task
.
- Change the model to a heavy model like text generation by setting
The takeaway of these tasks is that these are not something that comes easily with local laptop. Anyscale workspace is the natural place to develop your serving application and have access to elastic resource. For the complete list of configurable options for your serve deployment, please refer to the Ray Serve deployment docs.
Pausing and continuing workspace
As long as you terminate the python
command. Anyscale will automatically shutdown idle clusters after some timeout (defaults to 2 hours).
When you come back to work on the same project, you can just resume the workspace by starting it in the UI or running anyscale workspace start
in the project directory.
Moving to production
After completing the development of your application, you have the ability to deploy it using Anyscale production services. Anyscale production services offer the benefits of running your Ray Serve workload with high availability and fault tolerance. You can learn more here.
Set up your service
To use Anyscale services, you can use the CLI command or python SDK. These can be executed from your personal laptop or within a development workspace.
- CLI
- Python SDK
First, create and update the YAML configuration file for service.
name: "sentiment-service"
cluster_env: default_cluster_env_ml_2.9.0_py310
ray_serve_config:
applications:
- name: sentiment_analysis
import_path: "sentiment_analysis.app:model"
runtime_env:
working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"
For more information regarding the YAML schema, checkout the API reference.
Next, you can deploy the service using the following CLI command. You should see relevant links in the output.
$ anyscale service rollout -f config.yaml
Authenticating
Loaded Anyscale authentication token from ~/.anyscale/credentials.json.
Output
(anyscale +4.5s) Service service2_v96jsyvutffntejfivh3inczcd has been deployed. Service is transitioning towards: RUNNING.
(anyscale +4.5s) View the service in the UI at https://console.anyscale.com/services/service2_v96jsyvutffntejfivh3inczcd
from anyscale.sdk.anyscale_client.models import *
from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
apply_model = ApplyProductionServiceV2Model(
name="sentiment-service",
# The IDs below are examples and should be replaced with your own IDs.
# project_id can be found in the URL by navigating to the project in Anyscale Console
project_id="<PROJECT_ID>",
# IDs can be found on Anyscale Console under Configurations.
compute_config_id="<COMPUTE_CONFIG_ID>",
build_id="anyscaleray-ml29-py310-gpu",
ray_serve_config={
"applications" [
{
"name": "sentiment_analysis"
"runtime_env": {
"working_dir": "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"
},
"import_path": "sentiment_analysis.app:model",
}
],
},
)
service = sdk.rollout_service(apply_model).result
service.id = service.id
print(f"View the service in the UI at https://console.anyscale.com/services/{service_id}")
You can then visit the service link to:
- Observe the current status of the service
- View service logs and metrics
- View the OpenAPI documentation page
- Query the API
After cloning the repository, you can set the working_dir
to .
if the Service YAML definition file is in the same folder as the python files you are deploying.
runtime_env:
- working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
+ working_dir: "."
If you inspect the service configuration in the created service page, you can see the runtime environment being automatically populated for you.
runtime_env:
working_dir: >-
s3://anyscale-test-data-cld-1234567890abcdefghijklmn/org_1234567890abcdefghijkl/cld_1234567890abcdefghijklmn/workspace_snapshots/expwrk_z6az4rxzspb1gce1bnucuntcar/_anyscale_pkg_d622760f82c2a6650e94b009fcb3196b.zip
You can also upload to a bucket of your choice by adding an upload_path
line.
runtime_env:
- working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
+ working_dir: .
+ upload_path: "s3://your-bucket/path"
Query your service
To query your service, navigate to the service page and select the 'Query' button located at the top right corner. This will display both the curl and Python commands. Moreover, the Python SDK provides a programmatic way to retrieve the token and URL needed to query the service.
Anyscale services are exposed over the HTTP protocol. Thus, you can use any HTTP client as long as you add the Bearer Token in the headers.
- curl (Web)
- Python (Web)
- Python SDK
import requests
import time
from anyscale.sdk.anyscale_client.models import *
from anyscale import AnyscaleSDK
def wait_for_service_state(sdk, service_id: str, expected_state: ServiceEventCurrentState, timeout_s = 1200):
"""
This is a helper method to wait for the Service to enter an expected state
"""
start = time.time()
while time.time() - start < timeout_s:
service = sdk.get_service(service_id).result
curr_state = service.current_state
if curr_state == expected_state:
return
if expected_state != ServiceEventCurrentState.UNHEALTHY and curr_state == ServiceEventCurrentState.UNHEALTHY:
raise RuntimeError(
f"Service entered an unexpected state {curr_state}\n"
f"Please check the error for {service.name} in the web UI."
)
time.sleep(1)
curr_state = sdk.get_service(service_id).result.current_state
raise TimeoutError(
f"Service did not enter {expected_state} after {timeout_s} seconds."
)
sdk = AnyscaleSDK()
# Replace the ID below with your Service ID
service_id = "<SERVICE_ID>"
service = sdk.get_service(service_id).result
token = service.auth_token
wait_for_service_state(sdk, service_id, ServiceEventCurrentState.RUNNING)
base_url = f"https://{service.hostname}"
# Requests config
path = "/"
full_url = f"{base_url}{path}"
headers = {"Authorization": f"Bearer {token}"}
resp = requests.get(full_url, headers=headers)
print(f"Service query returns: {resp.text}")
Tear-down
To shut down the service, you can use the Anyscale Web Console, the CLI, or the Python SDK.
- Web
- CLI
- Python SDK
Find your Anyscale Service in the Web Console and click the Terminate
button.
$ anyscale service terminate --name sentiment-service
Authenticating
Loaded Anyscale authentication token from ~/.anyscale/credentials.json.
Output
(anyscale +4.1s) Service service_zezrkzehhbrmem7mliq3xrnt has begun terminating...
(anyscale +4.1s) Current state of service: RUNNING. Goal state of service: TERMINATED
(anyscale +4.1s) Query the status of the service with `anyscale service list --service-id service_zezrkzehhbrmem7mliq3xrnt`.
from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
# Replace the ID below with your Service ID
service_id = "<SERVICE_ID>"
service = sdk.terminate_service(service_id).result
print(f"You can view the status of Service at https://console.anyscale.com/services/{service.id}")