Skip to main content

Production Services

info

Production services is a new feature that's currently under development.

Anyscale supports production services that are submitted as a standalone package containing a Ray Serve application. The service will be fully managed by the platform, providing fault tolerance and declarative upgrades.

Services consist of a single, long-running Ray cluster with a persistent DNS name that may be restarted over time.

If you're still in the development phase, you can refer to this tutorial for an example on how to develop Ray services interactively.

Using production services

Define

To create a production service, you must provide the following:

  • (Required) compute_config A cluster compute config for the cluster the service will run on. On the SDK, this can only be specified as compute_config_id. On the CLI, you may specify compute_config (the name of the cluster compute config) or cloud (the name of an Anyscale cloud) instead for convenience. This attribute is optional in the CLI. The SDK requires a compute_config_id which can be found on the Anyscale Console under Configurations. See the SDK example below.
  • (Required) cluster_env A cluster environment for the cluster the service will run on. On the SDK, this can only be specified as build_id. On the CLI, you may specify cluster_env (the name and version for the cluster environment, colon-separated; if you don't specify a version, the latest will be used). This attribute is optional in the CLI. The SDK requires a build_id which can be found on the Anyscale Console under Configurations. See the SDK example below.
  • (Optional) runtime_env A runtime environment containing your application code and dependencies.
  • (Required) entrypoint The entrypoint command that will be run on the cluster to deploy the service. Although it is generally a python script, the entry point can be any shell command.
  • (Required) healthcheck_url A health check that the platform will use to determine if your service is healthy. This should be an HTTP GET endpoint that returns status 200 if the service is healthy.
  • (Optional) access Access setting for the service. If public access is specified, the endpoints of the service can be queried from the public internet but will require a service access token. If private access specified, the endpoints of the service can only be queried from within your Anyscale cloud (where VPC and security groups can be specified). For private access services, the HTTP server for the serve deployment must listen on 0.0.0.0 (can be specified in http_options of serve.start). By default, all services will have public access.
  • (Optional) name Name of the service. Service names must be unique within projects.
  • (Optional) project_id The id of the project you want the service to deploy in. project_id can be found in the URL by navigating to the project in Anyscale Console.
note

Please make sure your cluster environment contains anyscale>=0.5.20 to get the correct behavior for querying your service for services run on non kubenetes clouds. This will be included in the default cluster environments for ray versions >= 0.12.0.

note

The working_dir option of the runtime_env can be specified in one of two ways.

The first option is to set working_dir to be a remote URI: either a zip file preexisting in an AWS S3 or Google Cloud Storage bucket, or a zip file accessible over HTTP/S (e.g., a GitHub download URL). The zip file must contain only a single top-level directory; see Remote URIs for details.

The second option is to set working_dir to be a local directory. This is currently only supported when using the Services CLI, not the SDK. If you do this, you are responsible for providing external storage: your runtime_env must include the field upload_path specifying an Amazon S3 or Google Storage bucket or subdirectory. Your local directory will then be uploaded to remote storage and downloaded by the cluster before running the service. External storage allows for Anyscale to start a new cluster with your working_dir in the case of a failure.

Example: runtime_env = {"working_dir": "/User/code", "upload_path": "s3://my-bucket/subdir"}

Whether using the first or second option, the cluster running the job must have permissions to download from the bucket or URL. For more on permissions, see our Cloud Access Overview, or Using a Local Directory with Anyscale Jobs and Services on GCP if using GCP.

These options can be specified in a configuration file:

my_production_service.yaml
name: "my-first-service"
project_id: "prj_7S7Os7XBvO6vdiVC1J0lgj"
compute_config: my-cluster-compute # You may specify `compute_config_id` or `cloud` instead
cluster_env: my-cluster-env:5 # You may specify `build_id` instead
runtime_env:
working_dir: "s3://my_bucket/my_service_files.zip"
entrypoint: "python deploy.py"
healthcheck_url: "/healthcheck"
access: "public"

Deploy

Your service can be deployed to Anyscale with the CLI or the Python SDK:

anyscale service deploy my_production_service.yaml \
--name my-production-service \
--description "A production service running on Anyscale."
note

Service names must be unique within a project. This enables referencing services by name and project rather than needing to fetch the underlying ID.

Monitor

You can check the status of a service on the Anyscale Console, or query it using the CLI or the Python SDK:

anyscale service list --service-id [SERVICE_ID]

Additionally, for each deployment, Anyscale will generate a dashboard in which you can monitor the health and activity of your deployment:

Service monitoring dashboard

The following graphs are automatically generated for you:

  • CPU utilization (cluster-wide)
  • Memory utilization (cluster-wide)
  • P95 latency (for queries)
  • Exceptions (exceptions per second)
  • QPS (queries per second)

Querying Service

Based on the access settings specified when starting the service, Anyscale will create a public or private url that can be used to query the endpoints in your service. The url and (optionally) authorization token can be found in the service read model or through the "Query" button in the Anyscale Console. The serve deployments table in the Anyscale Console also contains a link to the Fast API docs page for your deployment.

The service URL and token can be obtained from the Anyscale SDK as follows:

from anyscale import AnyscaleSDK

sdk = AnyscaleSDK()
service = sdk.get_service(service_id).result
service_url = service.url
service_token = service.token

The service can then be queried with:

import requests
resp = requests.get(service_url, headers={"Authorization": f"Bearer {service_token}"})
Private services don't require a service token, but can only be queried from within your Anyscale cloud.

Terminate

You can terminate a service using the CLI or the Python SDK:

anyscale service terminate --service-id [SERVICE_ID]

Example

You can try out production services using a public GitHub repo. The following serve_hello.py example is located on Anyscale's docs_examples repo:

serve_hello.py
from fastapi import FastAPI
from ray import serve

serve.start(detached=True)

app = FastAPI()

@serve.deployment(route_prefix="/")
@serve.ingress(app)
class HelloWorld:
@app.get("/")
def hello(self):
return f"Hello world!"

@app.get("/healthcheck")
def healthcheck(self):
return

HelloWorld.deploy()

You can define the following service:

serve_hello.yaml
runtime_env:
working_dir: "https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip"
entrypoint: "python serve_hello.py"
healthcheck_url: "/healthcheck"
access: "public"

You can run, monitor and terminate the service as follows:

# Deploy the service defined above using the CLI
anyscale service deploy serve_hello.yaml --name serve_hello
No cluster environment provided, setting default based on local Python and Ray versions.
No cloud or cluster compute config specified, using the default: cpt_js2dPghEYdJHafmTdUx9FsNf.
Service service_FZ87DmBwMnwdX6M14jzB8xms has been deployed. Current state of service: PENDING.
Query the status of the service with `anyscale service list --service-id service_FZ87DmBwMnwdX6M14jzB8xms`.
View the service in the UI at https://console.anyscale-staging.com/services/service_FZ87DmBwMnwdX6M14jzB8xms.

# Query the status of the service using the ID returned above
anyscale service list --service-id service_FZ87DmBwMnwdX6M14jzB8xms
View your services in the UI at https://console.anyscale-staging.com/services

Name: cli-job-2021-12-05T18:57:17.203774
Id: service_FZ87DmBwMnwdX6M14jzB8xms
Cost (dollars): 0
Project name: my-project
Cluster name: cluster_for_service_FZ87DmBwMnwdX6M14jzB8xms
Current state: RUNNING
Creator: user
Entrypoint: python serve_hello.py
Access: public
URL: https://serve-ses-cwp158hcjjenebspv7vgvy7w.anyscale-staging-k8wcxpg-0005.anyscale-test-staging.com/
Token: jK-3NSIIO7dviU50Ccd4RmP3YJLK_iQ9XxDqNUuf6Lg

# Terminate the service
anyscale service terminate --service-id service_FZ87DmBwMnwdX6M14jzB8xms
Service service_FZ87DmBwMnwdX6M14jzB8xms has begun terminating...
Current state of service: RUNNING. Goal state of service: TERMINATED
Query the status of the service with `anyscale service list --service-id service_FZ87DmBwMnwdX6M14jzB8xms`.

# Get the status again
anyscale service list --service-id service_FZ87DmBwMnwdX6M14jzB8xms
View your services in the UI at https://console.anyscale-staging.com/services

Name: cli-job-2021-12-05T18:57:17.203774
Id: service_FZ87DmBwMnwdX6M14jzB8xms
Cost (dollars): 0
Project name: my-project
Cluster name: cluster_for_service_FZ87DmBwMnwdX6M14jzB8xms
Current state: TERMINATED
Creator: user
Entrypoint: python serve_hello.py
Access: public
URL: https://serve-ses-cwp158hcjjenebspv7vgvy7w.anyscale-staging-k8wcxpg-0005.anyscale-test-staging.com/
Token: jK-3NSIIO7dviU50Ccd4RmP3YJLK_iQ9XxDqNUuf6Lg