Skip to main content

Production Services

info

Production services is a new feature that's currently under development.

Anyscale supports production services that are submitted as a standalone package containing a Ray Serve application. The service will be fully managed by the platform, providing fault tolerance, declarative upgrades, monitoring, and alerting.

Services consist of a single, long-running Ray cluster with a persistent DNS name that may be restarted over time.

If you're still in the development phase, you can refer to this tutorial for an example on how to develop Ray services interactively.

Using production services

Define

To create a production service, you must provide the following:

  • compute_config A cluster compute config for the cluster the service will run on.
    • On the SDK, this can be specified as compute_config_id or compute_config (a one-off compute config). This is required, and only one of these fields can be specified.
    • On the CLI, you may specify compute_config (the name of the cluster compute config or a one-off) or cloud (the name of an Anyscale cloud) instead for convenience. Both attributes are optional. If neither attribute is specified, the service will use a default compute config that is associated with the default cloud. See the SDK example below.
  • (Required) cluster_env A cluster environment for the cluster the service will run on. On the SDK, this can only be specified as build_id. On the CLI, you may specify cluster_env (the name and version for the cluster environment, colon-separated; if you don't specify a version, the latest will be used). This attribute is optional in the CLI. The SDK requires a build_id which can be found on the Anyscale Console under Configurations. See the SDK example below.
  • (Optional) runtime_env A runtime environment containing your application code and dependencies.
  • (Required) entrypoint The entrypoint command that will be run on the cluster to deploy the service. Although it is generally a python script, the entry point can be any shell command.
  • (Required) healthcheck_url A health check that the platform will use to determine if your service is healthy. This should be an HTTP GET endpoint that returns status 200 if the service is healthy.
  • (Optional) access Access setting for the service. This is only available for kubernetes clouds (eg: GCP). If public access is specified, the endpoints of the service can be queried from the public internet but will require a service access token. If private access specified, the endpoints of the service can only be queried from within your Anyscale cloud (where VPC and security groups can be specified). For private access services, the HTTP server for the serve deployment must listen on 0.0.0.0 (can be specified in http_options of serve.start). By default, all services will have public access.
  • (Optional) name Name of the service. Service names must be unique within projects.
  • (Optional) project_id The id of the project you want the service to deploy in. project_id can be found in the URL by navigating to the project in Anyscale Console.
note

Please make sure your cluster environment contains anyscale>=0.5.20 to get the correct behavior for querying your service for services run on non kubenetes clouds. This will be included in the default cluster environments for ray versions >= 0.12.0.

note

The working_dir option of the runtime_env can be specified in one of two ways.

The first option is to set working_dir to be a remote URI: either a zip file preexisting in an AWS S3 or Google Cloud Storage bucket, or a zip file accessible over HTTP/S (e.g., a GitHub download URL). The zip file must contain only a single top-level directory; see Remote URIs for details.

The second option is to set working_dir to be a local directory. If you do this, you are responsible for providing external storage: your runtime_env must include the field upload_path specifying an Amazon S3 or Google Storage bucket or subdirectory. Your local directory will then be uploaded to remote storage and downloaded by the cluster before running the service. External storage allows for Anyscale to start a new cluster with your working_dir in the case of a failure.

Example: runtime_env = {"working_dir": "/User/code", "upload_path": "s3://my-bucket/subdir"}

Whether using the first or second option, the cluster running the job must have permissions to download from the bucket or URL. For more on permissions, see our Cloud Access Overview, or Using a Local Directory with Anyscale Jobs and Services on GCP if using GCP.

The upload_path field is not available in OSS Ray. See Runtime Environments in Anyscale Production Jobs and Services for more differences with runtime_env when using Production Jobs and Services.

These options can be specified in a configuration file:

my_production_service.yaml
name: "my-first-service"
project_id: "prj_7S7Os7XBvO6vdiVC1J0lgj"
compute_config: my-cluster-compute # You may specify `compute_config_id` or `cloud` instead
# Alternatively, a one-off compute config
# compute_config:
# cloud_id: cld_4F7k8814aZzGG8TNUGPKnc
# region: us-west-2
# head_node_type:
# name: head
# instance_type: m5.large
# worker_node_types: []
cluster_env: my-cluster-env:5 # You may specify `build_id` instead
runtime_env:
working_dir: "s3://my_bucket/my_service_files.zip"
entrypoint: "python deploy.py"
healthcheck_url: "/healthcheck"
access: "public"

Deploy

Use the CLI or SDK to deploy your service to Anyscale.

anyscale service deploy my_production_service.yaml \
--name my-production-service \
--description "A production service running on Anyscale."
note

Service names must be unique within a project. This enables referencing services by name and project rather than needing to fetch the underlying ID.

Upgrade

If you want to upgrade your service, update your configuration accordingly. Use the CLI or SDK to upgrade your service to Anyscale using the new configuration with the specific service name and project id. For example, to update the entrypoint config from python deploy.py to echo update && python deploy.py:

my_production_service_update.yaml
name: "my-first-service"
project_id: "prj_7S7Os7XBvO6vdiVC1J0lgj"
compute_config: my-cluster-compute # You may specify `compute_config_id` or `cloud` instead
cluster_env: my-cluster-env:5 # You may specify `build_id` instead
runtime_env:
working_dir: "s3://my_bucket/my_service_files.zip"
entrypoint: "echo update && python deploy.py"
healthcheck_url: "/healthcheck"
access: "public"
anyscale service deploy my_production_service_update.yaml \
--name my-production-service \
--description "A production service running on Anyscale."
note

The first time you deploy a service, Anyscale will create it. Subsequent deployments will update the existing service.

note

Services do not support updating their access. To change the access, please do either of the following:

  • start a new service.
  • terminate the service first and start it again.

Monitor

You can check the status of a service on the Anyscale Console, or query it using the CLI or the Python SDK:

anyscale service list --service-id [SERVICE_ID]

Additionally, for each deployment, Anyscale will generate a dashboard in which you can monitor the health and activity of your deployment:

Service monitoring dashboard

The following graphs are automatically generated for you:

  • CPU utilization (cluster-wide)
  • Memory utilization (cluster-wide)
  • P95 latency (for queries)
  • Exceptions (exceptions per second)
  • QPS (queries per second)

Querying Service

Based on the access settings specified when starting the service, Anyscale will create a public or private url that can be used to query the endpoints in your service. The url and (optionally) authorization token can be found in the service read model or through the "Query" button in the Anyscale Console. The serve deployments table in the Anyscale Console also contains a link to the Fast API docs page for your deployment.

The service URL and token can be obtained from the Anyscale SDK as follows:

from anyscale import AnyscaleSDK

sdk = AnyscaleSDK()
service = sdk.get_service(service_id).result
service_url = service.url
service_token = service.token

The service can then be queried with:

import requests
resp = requests.get(service_url, headers={"Authorization": f"Bearer {service_token}"})
Private services don't require a service token, but can only be queried from within your Anyscale cloud.

Terminate

You can terminate a service using the CLI or the Python SDK:

anyscale service terminate --service-id [SERVICE_ID]

Tutorial

Now let's start to write a simple production service and deploy it with Anyscale CLI tool by ourselves.

Write Simple Python Service

First, let's fork this repo, so we have a place to start. You get a fork repo generally with your github account name like this:

github.com/{your-github-account}/docs_examples
note

Remember to replace all {your-github-account} with the real value you have.

Go ahead and clone the repo to your local environment.

gh repo clone {your-github-account}/docs_examples 

Let's write a python script as below. It provides two http endpoints. You can take a reference with the serve_hello.py file.

my_serve_hello.py
from fastapi import FastAPI
from ray import serve

serve.start(detached=True)

app = FastAPI()

@serve.deployment(route_prefix="/")
@serve.ingress(app)
class HelloWorld:
@app.get("/")
def hello(self):
return "Hello world!"

@app.get("/healthcheck")
def healthcheck(self):
return

HelloWorld.deploy()

Let's commit the file and push it to the github.

git add --all && git commit -m "write my test serve hello service" && git push

Then let's write a service config file for the python script. You can define the following service:

my_serve_hello.yaml
runtime_env:
working_dir: "https://github.com/{your-github-account}/docs_examples/archive/refs/heads/main.zip"
entrypoint: python my_serve_hello.py
healthcheck_url: "/healthcheck"

Deploy Service

Let's use the staging environment to test our service. Go to staging web page to get the credentials first. Replace the var with your credential values.

export ANYSCALE_HOST=... ANYSCALE_CLI_TOKEN=...

Now we have everything ready, you can run Anyscale CLI to deploy your service. The following command creates a service with a unique name within a project.

# Deploy the service defined above using the CLI
anyscale service deploy my_serve_hello.yaml --name my_serve_hello

Output
(anyscale +1.6s) Query the status of the service with `anyscale service list --service-id service_DFBLGAGfrAdQeBMzPMDRJkv9`.
(anyscale +1.6s) View the service in the UI at https://console.anyscale-staging.com/services/service_DFBLGAGfrAdQeBMzPMDRJkv9.
note

Remember the URL of your service in the UI from the above output.

Wait for a while, your service will be ready. You can check the status of your service with the command in you previous output results.

anyscale service list --service-id service_DFBLGAGfrAdQeBMzPMDRJkv9

Check until the current state becomes RUNNING.

anyscale service list --service-id service_DFBLGAGfrAdQeBMzPMDRJkv9

Output
View your services in the UI at https://console.anyscale-staging.com/services
Name: my_serve_hello
Id: service_DFBLGAGfrAdQeBMzPMDRJkv9
Cost (dollars): 0.004300305912
Project name: default
Cluster name: cluster_for_service_DFBLGAGfrAdQeBMzPMDRJkv9_dEwht9WpTK
Current state: RUNNING
Creator: test-user
Entrypoint: python my_serve_hello.py
Access: public
URL: https://serve-session-p1erqcdx9zj4n2uztnwbcs3y.i.anyscaleuserdata-staging.com
Token: 44dgQNx_BmigQwsYXTbM_QEebq7Rz7P4ouQNJd6a-8Y
note

Remember the URL and Token shown here, we will use it to access our service.

Access Service

Now we have a running service, let's try the two endpoints provided in our service. Use the URL and Token you get from the anyscale service list command.

You can also get the curl command from your service webpage UI. Service query curl command

curl -H "Authorization: Bearer 44dgQNx_BmigQwsYXTbM_QEebq7Rz7P4ouQNJd6a-8Y" https://serve-session-p1erqcdx9zj4n2uztnwbcs3y.i.anyscaleuserdata-staging.com

"Hello world!"%

curl -H "Authorization: Bearer 44dgQNx_BmigQwsYXTbM_QEebq7Rz7P4ouQNJd6a-8Y" https://serve-session-p1erqcdx9zj4n2uztnwbcs3y.i.anyscaleuserdata-staging.com/healthcheck

null%

Awesome! We can see "Hello world!" and correct response from our endpoints.

Update Service

Now let's update the service.

Go back to our code my_serve_hello.py. Update it as following:

diff --git a/my_serve_hello.py b/my_serve_hello.py
index 1bd7dc5..29e5cd4 100644
--- a/my_serve_hello.py
+++ b/my_serve_hello.py
@@ -1,5 +1,9 @@
from fastapi import FastAPI
from ray import serve
+import os
+
+# Use this var to test service inplace update. When the env var is updated, users see new return value.
+msg = os.getenv("SERVE_RESPONSE_MESSAGE", "Hello world!")

serve.start(detached=True)

@@ -10,8 +14,8 @@ app = FastAPI()
class HelloWorld:
@app.get("/")
def hello(self):
- return "Hello world!"
-
+ return msg
+
@app.get("/healthcheck")
def healthcheck(self):
return

Let's commit the file and push it to the github again.

git add --all && git commit -m "update my test serve hello service" && git push

Then let's update the service config file too:

my_serve_hello.yaml
runtime_env:
working_dir: "https://github.com/{your-github-account}/docs_examples/archive/refs/heads/main.zip"
entrypoint: SERVE_RESPONSE_MESSAGE="Hello world again!" python my_serve_hello.py
healthcheck_url: "/healthcheck"

Deploy again to update your service:

# Update the service defined above using the CLI
anyscale service deploy my_serve_hello.yaml --name my_serve_hello

Output
(anyscale +1.6s) Query the status of the service with `anyscale service list --service-id service_DFBLGAGfrAdQeBMzPMDRJkv9`.
(anyscale +1.6s) View the service in the UI at https://console.anyscale-staging.com/services/service_DFBLGAGfrAdQeBMzPMDRJkv9.

Wait for your service becomes ready again. Try to access your service endpoint now:

curl -H "Authorization: Bearer 44dgQNx_BmigQwsYXTbM_QEebq7Rz7P4ouQNJd6a-8Y" https://serve-session-p1erqcdx9zj4n2uztnwbcs3y.i.anyscaleuserdata-staging.com

"Hello world again!"%

curl -H "Authorization: Bearer 44dgQNx_BmigQwsYXTbM_QEebq7Rz7P4ouQNJd6a-8Y" https://serve-session-p1erqcdx9zj4n2uztnwbcs3y.i.anyscaleuserdata-staging.com/healthcheck

null%

Great, now we see the service returns the new message, which means we successfully update our service!

Terminate Service

We have finished the tutorial, let's clean up the service. Run the following Anyscale CLI command to terminate your service:

anyscale service terminate --service-id service_DFBLGAGfrAdQeBMzPMDRJkv9

Output
(anyscale +12.0s) Service service_DFBLGAGfrAdQeBMzPMDRJkv9 has begun terminating...
(anyscale +12.0s) Current state of service: RUNNING. Goal state of service: TERMINATED
(anyscale +12.0s) Query the status of the service with `anyscale service list --service-id service_DFBLGAGfrAdQeBMzPMDRJkv9`.

Wait for a while. Get the status again, it shows the service is in TERMINATED state.

anyscale service list --service-id service_DFBLGAGfrAdQeBMzPMDRJkv9

Output
View your services in the UI at https://console.anyscale-staging.com/services
Name: my_serve_hello
Id: service_DFBLGAGfrAdQeBMzPMDRJkv9
Cost (dollars): 0.0
Project name: default
Cluster name: cluster_for_service_DFBLGAGfrAdQeBMzPMDRJkv9_dEwht9WpTK
Current state: TERMINATED
Creator: test-user
Entrypoint: SERVE_RESPONSE_MESSAGE="Hello world again!" python my_serve_hello.py
Access: public
URL: None
Token: 44dgQNx_BmigQwsYXTbM_QEebq7Rz7P4ouQNJd6a-8Y

Congratulations! Now you have learned how to create, deploy, update, check status, access and terminate a production service.

note

For users who have access to s3 bucket (docs), you can store your code in s3 instead of github. You can write your service yaml like this:

my_serve_hello.yaml
runtime_env:
working_dir: .
upload_path: s3://bucket/path/to/dir
entrypoint: SERVE_RESPONSE_MESSAGE="Hello world again!" python my_serve_hello.py
healthcheck_url: "/healthcheck"

Remember to replace s3://bucket/path/to/dir with your own path.