On-Demand to Spot Fallback Compute Configs User Guide
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Anyscale provides three types of worker instances: 1) On-demand
, 2) Spot
, and 3) Spot first, fall back to on-demand
.
On-demand
instance is a virtual machine that is acquired when needed, remains to be acquired by the user, and will only be released when the user desires.Spot
instance is a virtual machine that is acquired when needed, significantly cheaper than its on-demand equivalent, but may be preempted by the cloud provider with a short notice.Spot first, fall back to on-demand
is a market type that Anyscale provides, which attempts to acquire a spot instance first, and will fall back to acquire an on-demand instance when the spot instance is not available on the cloud provider.
One pain point of Spot first, fall back to on-demand
is spot instance availability - spot instances may not be available at the time of acquiring, thus on-demand instances is acquired. However, spot instance may become available later, but it cannot be utilized by the cluster. To solve this pain point, we introduce On-Demand to Spot Fallback.
On-Demand to Spot Fallback
This compute configuration periodically attempts to launch equivalent spot instances for workloads running on-demand instances. When a spot instance becomes available, Anyscale replaces the on-demand instance with the newly launched spot instance and migrates the existing workload without interruption to service.
When the cloud provider reclaims the spot instance, Anyscale preempts the eviction by starting an equivalent on-demand instance to replace it.
In this way, you're able to reap the steep discounts of spot instances while retaining the reliability of on-demand.
The feature is enabled for Ray 2.7+ and nightly images built after Aug 22, 2023.
Usage
Configuration for on-demand to spot fallback in the Anyscale Console is done completely via the Advanced Configurations input box. The JSON key value you want to specify is:
- AWS (EC2)
- GCP (GCE)
{
"TagSpecifications": [
{
"Tags": [
{
"Key": "as-feature-enable-fallback-to-spot",
"Value": "true"
}
],
"ResourceType": "instance"
}
]
}
{
"instance_properties": {
"labels": {
"as-feature-enable-fallback-to-spot": "true"
}
}
}
If you prefer to use the Anyscale CLI or SDK, click here to view an example YAML.
- AWS (EC2)
- GCP (GCE)
cloud: anyscale_v2_aws_useast2 # You may specify `cloud_id` instead
allowed_azs:
- us-east-2a
- us-east-2b
- us-east-2c
head_node_type:
name: head_node_type
instance_type: m5.2xlarge
worker_node_types:
- name: cpu_worker
instance_type: c5.8xlarge
min_workers: 0
max_workers: 10
use_spot: true
fallback_to_ondemand: true
- name: gpu_worker
instance_type: g4dn.8xlarge
min_workers: 0
max_workers: 10
use_spot: true
fallback_to_ondemand: true
aws_advanced_configurations_json:
TagSpecifications:
- ResourceType: instance
Tags:
- Key: as-feature-enable-fallback-to-spot
Value: "true"
cloud: anyscale_v2_gcp_uswest1 # You may specify `cloud_id` instead
allowed_azs:
- us-west1-a
- us-west1-b
- us-west1-c
head_node_type:
name: head_node_type
instance_type: n2-standard-32
worker_node_types:
- name: gpu_worker_1
instance_type: n1-standard-16-nvidia-t4-16gb-1
min_workers: 0
max_workers: 10
use_spot: true
fallback_to_ondemand: true
- name: gpu_worker_2
instance_type: n1-standard-16-nvidia-t4-16gb-4
min_workers: 0
max_workers: 10
use_spot: true
fallback_to_ondemand: true
gcp_advanced_configurations_json:
instance_properties:
labels:
as-feature-enable-fallback-to-spot: "true"
FAQ
Question: How often do you check for spot instance availability?
After an on-demand instance has been running for 60 minutes, the Anyscale smart instance manager checks for spot instances every 5 seconds.
Question: Will you replace all of the on-demand instances with spot instances as soon as they are available?
The smart instance manager is rate limited. Every 10 minutes, it replaces a maximum of 5 on-demand instances with spot instances.