On-Demand to Spot Fallback Compute Configs User Guide
Anyscale provides three types of worker instances: 1) On-demand
, 2) Spot
, and 3) Spot first, fall back to on-demand
.
On-demand
instance is a virtual machine that is acquired when needed, remains to be acquired by the user, and will only be released when the user desires.Spot
instance is a virtual machine that is acquired when needed, significantly cheaper than its on-demand equivalent, but may be preempted by the cloud provider with a short notice.Spot first, fall back to on-demand
is a market type that Anyscale provides, which attempts to acquire a spot instance first, and will fall back to acquire an on-demand instance when the spot instance is not available on the cloud provider.
One pain point of Spot first, fall back to on-demand
is spot instance availability - spot instances may not be available at the time of acquiring, thus on-demand instances is acquired. However, spot instance may become available later, but it cannot be utilized by the cluster. To solve this pain point, we introduce On-Demand to Spot Fallback.
On-Demand to Spot Fallback
This compute configuration periodically attempts to launch equivalent spot instances for workloads running on-demand instances. When a spot instance becomes available, Anyscale replaces the on-demand instance with the newly launched spot instance and migrates the existing workload without interruption to service.
When the cloud provider reclaims the spot instance, Anyscale preempts the eviction by starting an equivalent on-demand instance to replace it.
In this way, you're able to reap the steep discounts of spot instances while retaining the reliability of on-demand.
How to use
The feature is enabled for Ray 2.7+ and nightly images built after Aug 22, 2023.
Via the Console
Configuration for on-demand to spot fallback in the Anyscale Console is done completely via the Advanced Configurations input box. The JSON key value you want to specify is:
- AWS (EC2)
- GCP (GCE)
{
"TagSpecifications": [
{
"Tags": [
{
"Key": "as-feature-enable-fallback-to-spot",
"Value": "true"
}
],
"ResourceType": "instance"
}
]
}
{
"instance_properties": {
"labels": {
"as-feature-enable-fallback-to-spot": "true"
}
}
}
Via SDK
compute_config.yaml
cloud: anyscale_v2_default_cloud_vpn_us_east_2 # You may specify `cloud_id` instead
allowed_azs:
- us-east-2a
- us-east-2b
- us-east-2c
head_node_type:
name: head_node_type
instance_type: m5.2xlarge
worker_node_types:
- name: cpu_worker
instance_type: c5.8xlarge
min_workers: 0
max_workers: 10
use_spot: true
fallback_to_ondemand: true
- name: gpu_worker
instance_type: g4dn.8xlarge
min_workers: 0
max_workers: 10
use_spot: true
fallback_to_ondemand: true
aws:
TagSpecifications:
- ResourceType: instance
Tags:
- Key: as-feature-enable-fallback-to-spot
Value: true
Python SDK Code
import yaml
from anyscale.sdk.anyscale_client.models import CreateClusterCompute
from anyscale import AnyscaleSDK
sdk = AnyscaleSDK()
with open('compute_config.yaml') as f:
compute_configs = yaml.safe_load(f)
# If your config file contains `cloud`, use this to get the `cloud_id`
if "cloud" in compute_configs:
compute_configs["cloud_id"] = sdk.search_clouds(
{"name": {"equals": compute_configs["cloud"]}}
).results[0].id
del compute_configs["cloud"]
config=sdk.create_cluster_compute(CreateClusterCompute(
name="my-cluster-compute",
config=compute_configs
))