Skip to main content

Production Services

Anyscale supports production services that are submitted as a standalone package containing a Ray Serve application. The service will be fully managed by the platform, providing fault tolerance, declarative upgrades, monitoring, and alerting.

Services consist of a single, long-running Ray cluster with a persistent DNS name that may be restarted over time.

If you're still in the development phase, you can refer to this tutorial for an example on how to develop Ray services interactively.

Using Production Services

YAML Definition

To create a production service, you must provide the following:

  • compute_config A cluster compute config for the cluster the service will run on.
    • On the SDK, this can be specified as compute_config_id or compute_config (a one-off compute config). This is required, and only one of these fields can be specified.
    • On the CLI, you may specify compute_config (the name of the cluster compute config or a one-off) or cloud (the name of an Anyscale cloud) instead for convenience. Both attributes are optional. If neither attribute is specified, the service will use a default compute config that is associated with the default cloud. See the SDK example below.
  • (Required) cluster_env A cluster environment for the cluster the service will run on. On the SDK, this can only be specified as build_id. On the CLI, you may specify cluster_env (the name and version for the cluster environment, colon-separated; if you don't specify a version, the latest will be used). This attribute is optional in the CLI. The SDK requires a build_id which can be found on the Anyscale Console under Configurations. See the SDK example below.
  • (Optional) runtime_env A runtime environment containing your application code and dependencies.
  • (Required) entrypoint The entrypoint command that will be run on the cluster to deploy the service. Although it is generally a python script, the entry point can be any shell command.
  • (Required) healthcheck_url A health check that the platform will use to determine if your service is healthy. This should be an HTTP GET endpoint that returns status 200 if the service is healthy.
  • (Optional) access Access setting for the service. This is only available for kubernetes clouds (eg: GCP). If public access is specified, the endpoints of the service can be queried from the public internet but will require a service access token. If private access specified, the endpoints of the service can only be queried from within your Anyscale cloud (where VPC and security groups can be specified). For private access services, the HTTP server for the serve deployment must listen on 0.0.0.0 (can be specified in http_options of serve.start). By default, all services will have public access.
  • (Optional) name Name of the service. Service names must be unique within projects.
  • (Optional) project_id The id of the project you want the service to deploy in. project_id can be found in the URL by navigating to the project in Anyscale Console.
note

Please make sure your cluster environment contains anyscale>=0.5.20 to get the correct behavior for querying your service for services run on non kubenetes clouds. This will be included in the default cluster environments for ray versions >= 0.12.0.

note

The working_dir option of the runtime_env can be specified in one of two ways.

The first option is to set working_dir to be a remote URI: either a zip file preexisting in an AWS S3 or Google Cloud Storage bucket, or a zip file accessible over HTTP/S (e.g., a GitHub download URL). The zip file must contain only a single top-level directory; see Remote URIs for details.

The second option is to set working_dir to be a local directory. If you do this, you are responsible for providing external storage: your runtime_env must include the field upload_path specifying an Amazon S3 or Google Storage bucket or subdirectory. Your local directory will then be uploaded to remote storage and downloaded by the cluster before running the service. External storage allows for Anyscale to start a new cluster with your working_dir in the case of a failure.

Example: runtime_env = {"working_dir": "/User/code", "upload_path": "s3://my-bucket/subdir"}

Whether using the first or second option, the cluster running the job must have permissions to download from the bucket or URL. For more on permissions, see our Cloud Access Overview, or Using a Local Directory with Anyscale Jobs and Services on GCP if using GCP.

The upload_path field is not available in OSS Ray. See Runtime Environments in Anyscale Production Jobs and Services for more differences with runtime_env when using Production Jobs and Services.

These options can be specified in a configuration file:

my_production_service.yaml
name: "my-first-service"
project_id: "prj_7S7Os7XBvO6vdiVC1J0lgj"
compute_config: my-cluster-compute # You may specify `compute_config_id` or `cloud` instead
# Alternatively, a one-off compute config
# compute_config:
# cloud_id: cld_4F7k8814aZzGG8TNUGPKnc
# region: us-west-2
# head_node_type:
# name: head
# instance_type: m5.large
# worker_node_types: []
cluster_env: my-cluster-env:5 # You may specify `build_id` instead
runtime_env:
working_dir: "s3://my_bucket/my_service_files.zip"
entrypoint: "python deploy.py"
healthcheck_url: "/healthcheck"
access: "public"

Deploy

Use the CLI or SDK to deploy your service to Anyscale.

anyscale service deploy my_production_service.yaml \
--name my-production-service \
--description "A production service running on Anyscale."
note

Service names must be unique within a project. This enables referencing services by name and project rather than needing to fetch the underlying ID.

Upgrade

If you want to upgrade your service, update your configuration accordingly. Use the CLI or SDK to upgrade your service to Anyscale using the new configuration with the specific service name and project id. For example, to update the entrypoint config from python deploy.py to echo update && python deploy.py:

my_production_service_update.yaml
name: "my-first-service"
project_id: "prj_7S7Os7XBvO6vdiVC1J0lgj"
compute_config: my-cluster-compute # You may specify `compute_config_id` or `cloud` instead
cluster_env: my-cluster-env:5 # You may specify `build_id` instead
runtime_env:
working_dir: "s3://my_bucket/my_service_files.zip"
entrypoint: "echo update && python deploy.py"
healthcheck_url: "/healthcheck"
access: "public"
anyscale service deploy my_production_service_update.yaml \
--name my-production-service \
--description "A production service running on Anyscale."
note

The first time you deploy a service, Anyscale will create it. Subsequent deployments will update the existing service.

note

Services do not support updating their access. To change the access, please do either of the following:

  • start a new service.
  • terminate the service first and start it again.

Monitor

You can check the status of a service on the Anyscale Console, or query it using the CLI or the Python SDK:

anyscale service list --service-id [SERVICE_ID]

Additionally, for each deployment, Anyscale will generate a dashboard in which you can monitor the health and activity of your deployment:

Service monitoring dashboard

The following graphs are automatically generated for you:

  • CPU utilization (cluster-wide)
  • Memory utilization (cluster-wide)
  • P95 latency (for queries)
  • Exceptions (exceptions per second)
  • QPS (queries per second)

Querying Service

Based on the access settings specified when starting the service, Anyscale will create a public or private url that can be used to query the endpoints in your service. The url and (optionally) authorization token can be found in the service read model or through the "Query" button in the Anyscale Console. The serve deployments table in the Anyscale Console also contains a link to the Fast API docs page for your deployment.

The service URL and token can be obtained from the Anyscale SDK as follows:

from anyscale import AnyscaleSDK

sdk = AnyscaleSDK()
service = sdk.get_service(service_id).result
service_url = service.url
service_token = service.token

The service can then be queried with:

import requests
resp = requests.get(service_url, headers={"Authorization": f"Bearer {service_token}"})
Private services don't require a service token, but can only be queried from within your Anyscale cloud.
note

The timeout for service calls is currently 1 hour. If a single request exceeds this timeout, it will fail. Please reach out if this does not work for you.

Terminate

You can terminate a service using the CLI or the Python SDK:

anyscale service terminate --service-id [SERVICE_ID]