Skip to main content

Restricting Internet Traffic to Ray Serve Deployments

Running Ray Serve on Anyscale allows for easy access to the deployment endpoints. Anyscale allows Ray Serve deployments to either (1) accept requests from the internet that contain an authentication token or (2) only accept requests from within your Anyscale cloud.

Enable traffic from public internet

To enable authenticated traffic to Serve endpoints from the public internet, set user_service_access="public" when creating a cluster through the SDK or select the "public" box in the "Access" section when starting a cluster from the UI. All clusters started from the Ray client API will only accept authenticated traffic from the public internet.

The serve endpoints URLs can be found in the

  • Cluster SDK read model: The url will be in the user_service_url field. The authentication token can be found under user_service_token and should be specified in the Authorization Bearer headers when making a request.
  • API docs link: The serve deployments table contain API docs links for each deployment. This will authenticate your request and redirect you to the OpenAPI docs (or anything that is deployed under <deployment_route>/docs). If the Serve deployment is a FastAPI app, the interactive docs will be autogenerated here.

Please make sure your cluster environment contains anyscale>=0.5.20 to get the correct behavior for querying your service for services run on non kubenetes Anyscale cloud. This will be included in the default cluster environments for ray versions >= 0.12.0.

Restrict traffic to within the Anyscale cloud


Currently private serve deployments are only available in kubernetes Anyscale clouds (eg: GCP).

Traffic from the public internet can be restricted by setting user_service_access="private" when creating a cluster through the SDK or select the "private" box in the "Access" section when starting a cluster from the UI. Only traffic coming from within the Anyscale cloud (where the where VPC and security groups can be specified) will be accepted by the deployment. For private access, the HTTP server for the serve deployment must listen on (can be specified in http_options of serve.start). The serve endpoint URL can be found in the user_service_url field of the SDK read model. An authentication token is not required for private serve deployments.

Querying endpoints

Public Ray Serve endpoints require an authentication token. Before making a request to a Serve endpoints, use this method to obtain the url and token from the Anyscale SDK:

from anyscale import AnyscaleSDK

sdk = AnyscaleSDK()
cluster = sdk.get_cluster(cluster_id).result
user_service_url = cluster.user_service_url
user_service_token = cluster.user_service_token

The service can then be queried with:

import requests
resp = requests.get(user_service_url, headers={"Authorization": f"Bearer {user_service_token}"})

Private services don't require a service token, but can only be queried from within your Anyscale cloud.

The Serve authentication token is unique to a cluster and can be shared with users who otherwise do not have access to Anyscale without exposing other cluster services. The API docs url can also be shared with other users and contains the authentication token. Making a request to this URL routes to <deployment_route>/docs and places the SERVE_TOKEN in the browser cookies to authenticate subsequent requests.

Editing privacy settings for existing clusters

You cannot update the value for user_service_access on a running cluster. You must stop and recreate a cluster in order to change its access settings.