Skip to main content

Multi-zone

Deprecated

Multi-zone is deprecated. Instead, use the Enable cross-zone scaling checkbox in the UI or the enable_cross_zone_scaling field in your compute configuration.

Compute capacity is often difficult to find, especially for specific instance types like those with GPUs. The Anyscale platform offers several features to configure your compute config for multiple zones.

Once configured, Anyscale uses various internal metrics, including instance type, zones requested, and other intelligence to increase the probability of success in provisioning the type of instances you want.

Multi-zone search

Usage

Step 1

Through the Anyscale Console, in the compute config screen, there's a drop down where zones are multi-selectable, as well as an additional Any option.

Simply select two or more zones (or Any if you want Anyscale to look across the entire region).

Multizone Configuration Cluster Wide

Step 2

In addition to the zone configuration, you have to add the following cluster-level Advanced configuration.

{
"TagSpecifications": [
{
"ResourceType": "instance",
"Tags": [
{
"Key": "as-feature-multi-zone",
"Value": "true"
}
]
}
]
}
If you prefer to use the Anyscale CLI or SDK, click here to view an example YAML.
cloud: anyscale_v2_default_cloud_vpn_us_east_2 # You may specify `cloud_id` instead
allowed_azs:
- us-east-2a
- us-east-2b
head_node_type:
name: head_node_type
instance_type: m5.2xlarge
worker_node_types:
- name: cpu_worker
instance_type: m5.4xlarge
min_workers: 2
max_workers: 10
use_spot: true
- name: gpu_worker
instance_type: g4dn.4xlarge
min_workers: 0
max_workers: 10
aws_advanced_configurations_json:
TagSpecifications:
- ResourceType: instance
Tags:
- Key: as-feature-multi-zone
Value: "true"

As you already create a compute configuration with the JSON block though the console, you can use that compute config in your Anyscale cluster, job or service YAML, such as compute_config: multi-az.

If you are using Anyscale SDK, you can either use cluster_compute_id with your compute configuration ID, or use cluster_compute_config as a dictionary based on ClusterComputeConfig.

Monitoring

The autoscaler logs help you identify where availability zones your worker nodes were launched in:

Multizone Event Logs

FAQ

Question: How does it work?

The Anyscale platform examines all variables in the Compute Config and uses them to make the right sequence of provisioning requests to the underlying cloud service provider. These variables include, but are not limited to:

  • Desired instance type for your head/worker nodes.
  • On-demand, spot, or spot with fallback to on-demand.
  • Regional/zonal instance type availability.

For example, GPUs are typically more difficult to acquire than pure CPU machines, the Anyscale platform tunes the above variables to seek GPU-based machines first. Then, depending on the success of provisioning and the zones configured, the Anyscale platform launches easier-to-acquire instance types in the same zone.

The exact algorithm is constantly evolving and may be different for every customer/use-case at any point in time.

If you require or are interested in an in-depth answer, reach out to your Anyscale account team to coordinate a deeper conversation.

Question: For clusters powering services, does it take into account dimensions such as replicas and traffic?

It doesn't. Application level resource utilization is considered as part of zone selection. The pattern of Any/Multizone configurations for Anyscale Services is different than batch inference, training, or fine-tuning jobs.

Question: What happens if I specify Any but with Multi-zone turned “off”?

Any controls which zones Anyscale searches for capacity in during initial cluster startup. Multi-zone controls how “sticky” your cluster is to the initial zone that the cluster starts in, specifically, where the head node is in.

With Multi-zone turned off, Anyscale only looks in the zone that your head node starts in, and all subsequent capacity is kept to the same zone. With Multi-zone on, the Anyscale platform has the ability to deploy worker nodes in the best zone with sufficient capacity.