Skip to main content

Multi-zone services

The main motivation for multi-zone services is high availability. It allows the service to tolerate single availability zone failure by running deployment replicas in multiple zones.

Enable multi-zone services

Step 1

Select the availability zones that the cluster can use. By default, it's Any, meaning Anyscale can use any zone. See the Compute config API to specify particular zones.

Step 2

When you're creating a new compute config, check Enable cross-zone scaling and copy the following into the Configurations passed to Cloud Provider for all nodes dialog to enable the feature:

{
"TagSpecifications": [
{
"ResourceType": "instance",
"Tags": [
{
"Key": "as-feature-enable-multi-az-serve",
"Value": "true"
}
]
}
]
}

The console should look like the following:

Multi-zone services config

Once you enable multi-zone services, Anyscale launches nodes across availability zones as evenly as possible. Ray Serve then spreads replicas across these zones.

You can see where a deployment replica runs by using the Ray dashboard with Ray 2.9.0+. Navigate to the Ray node where the replica runs, and view the anyscale.com/availability_zone node label.

Multi-zone services replica node Multi-zone services node label

To make use of multi-zone services, have at least 2 replicas per deployment.

Cost of multi-zone services

While multi-zone services improve availability, it's more expensive because cloud providers charge for data transfer cross-zone. For example, AWS charges per GB in each direction. Most of cross-zone traffic is between Serve proxies and replicas. In addition, Ray head node and worker nodes also exchange system messages with each other but the volume is low, in the tens of kilobits per second. To save cost, Ray Serve implements the zone-aware routing: Serve proxies prefer to forward requests to replicas in the same zone to avoid cross-zone traffic. This optimization can eliminate a significant amount of cross-zone traffic especially when the QPS is low.