Skip to main content

Introduction to Anyscale Services

Deploy your machine learning apps into production with Anyscale Services for scalability, fault tolerance, high availability, and zero downtime upgrades.


⏱️ Time to complete: 10 min

Prerequisite: Intro to Workspaces

After implementing and testing your machine learning workloads, it’s time to move them into production. An Anyscale Service packages your application code, dependencies, and compute configurations, deploying them behind a REST endpoint for easy integration and scalability.

This interactive example takes you through a common development to production workflow with services:

  • Development
    • Develop a service in a workspace.
    • Run the app in a workspace.
    • Send a test request.
  • Production
    • Deploy as an Anyscale Service.
    • Check the status of the service.
    • Query the service.
    • Monitor the service.
    • Configure scaling.
    • Update the service.
    • Terminate the service.

Development

Start by writing your machine learning service using Ray Serve, an open source distributed serving library for building online inference APIs.

Develop a service in a workspace

This example begins in an Anyscale Workspace, which is a fully managed development environment connected to a Ray cluster. Look at the following simple Ray Serve app created in main.py:

import logging
import requests
from fastapi import FastAPI
from ray import serve

fastapi = FastAPI()
logger = logging.getLogger("ray.serve")

@serve.deployment
@serve.ingress(fastapi)
class FastAPIDeployment:
# FastAPI automatically parses the HTTP request.
@fastapi.get("/hello")
def say_hello(self, name: str) -> str:
logger.info("Handling request!")
return f"Hello {name}!"

my_app = FastAPIDeployment.bind()

The following is a breakdown of this code that integrates Ray Serve with Fast API:

  • It defines a FastAPI app named fastapi and a logger named ray.serve.
  • @serve.deployment decorates the class FastAPIDeployment, indicating it's a Ray Serve deployment.
  • @serve.ingress(fastapi) marks fastapi as the entry point for incoming requests to this deployment.
  • say_hello handles GET requests to the /hello endpoint in FastAPI, taking a name parameter, printing a log message, and returning a greeting.
  • FastAPIDeployment.bind() binds the deployment to Ray Serve, making it ready to handle requests.

Run the app in a workspace

Execute the command below to run the Ray Serve app on the workspace cluster. This command takes in an import path to the deployment formatted as module:application.

!serve run main:my_app --non-blocking

Note: This command blocks and streams logs to the console for easier debugging in development. To terminate this service and continue with this example, either click the stop button in the notebook or Ctrl-C in the terminal.

Send a test request

Your app is accessible through localhost:8000 by default. Run the following to send a GET request to the /hello endpoint with query parameter name set to “Theodore.”

import requests

print(requests.get("http://localhost:8000/hello", params={"name": "Theodore"}).json())

Production

To move into production, use Anyscale Services to deploy your Ray Serve app to a new separate cluster without any code modifications. Anyscale handles the scaling, fault tolerance, and load balancing of your services to ensure uninterrupted service across node failures, heavy traffic, and rolling updates.

Deploy as an Anyscale Service

Use the following to deploy my_service in a single command:

!anyscale service deploy main:my_app --name=my_service

Note: This Anyscale Service pulls the associated dependencies, compute config, and service config from the workspace. To define these explicitly, you can deploy from a config.yaml file using the -f flag. See ServiceConfig reference for details.

Check the status of the service

To get the status of my_service, run the following:

!anyscale service status --name=my_service

Query the service

When you deploy, you expose the service to a publicly accessible IP address which you can send requests to.

In the preceding cell’s output, copy your API_KEY and BASE_URL. As an example, the values look like the following:

  • API_KEY: NMv1Dq3f2pDxWjj-efKKqMUk9UO-xfU3Lo5OhpjAHiI
  • BASE_URL: https://my-service-jrvwy.cld-w3gg9qpy7ft3ayan.s.anyscaleuserdata.com/

Fill in the following placeholder values for the BASE_URL and API_KEY in the following Python requests object:

import requests

BASE_URL = "" # PASTE HERE
API_KEY = "" # PASTE HERE

def send_request(name: str) -> str:
response: requests.Response = requests.get(
f"{BASE_URL}/hello",
params={"name": name},
headers={
"Authorization": f"Bearer {API_KEY}",
},
)
response.raise_for_status()
return response.content
print(send_request("Theodore"))

Monitor the service

To view the service, navigate to 🏠 > Services > my_service. On this page, inspect key metrics, events, and logs. With Anyscale’s monitoring dashboards, you can track performance and adjust configurations as needed without deep diving into infrastructure management. See Monitor a service.

By clicking on the Running service, you can view the status of deployments and how many replicas each contains. For example, your FastAPIDeployment has 1 replica.

In the Logs, you can search for the message “Handling request!” to view each request for easier debugging.

Configure scaling

Each Ray Serve deployment has one replica by default. There is one worker process running the model and serving requests.

As a quick example to scale out from one to autoscaling replicas, modify the original service script main.py. Add the num_replicas argument to the @serve.deployment decorator as follows:

import requests
from fastapi import FastAPI
from ray import serve

fastapi = FastAPI()

- @serve.deployment
+ @serve.deployment(num_replicas=auto)
@serve.ingress(fastapi)
class FastAPIDeployment:
# FastAPI automatically parses the HTTP request.
@fastapi.get("/hello")
def say_hello(self, name: str) -> str:
return f"Hello {name}!"

my_app = FastAPIDeployment.bind()

Note: This approach is a way to quickly modify scale for this example. As a best practice in production, define autoscaling behavior in the ServiceConfig contained in a config.yaml file. The number of worker nodes that Anyscale launches dynamically scales up and down in response to traffic and is scoped by the overall cluster compute config you define.

Update the service

To deploy the update, execute the following command to trigger a staged rollout of the new service with zero downtime:

!anyscale service deploy main:my_app --name=my_service

In the service overview page, you can monitor the status of the update and see Ray Serve shut down the previous cluster.

Note: Using this command triggers an automatic rollout which gradually shifts traffic from the previous cluster, or primary version, to the incoming cluster, or canary version. To learn more about configuring rollout behavior, see Update a service.

Terminate the service

To tear down the service cluster, run the following command:

!anyscale service terminate --name=my_service

Summary

In this example, you learned the basics of Anyscale Services:

  • Develop a service in a workspace.
    • Run the app in a workspace.
    • Send a test request.
  • Deploy as an Anyscale Service.
    • Query the service.
    • Check the status of the service.
    • Monitor the service.
    • Configure scaling.
    • Update the service.
    • Terminate the service.