Skip to main content

On-Demand to Spot Fallback Compute Configs User Guide

Anyscale provides three types of worker instances: 1) On-demand, 2) Spot, and 3) Spot first, fall back to on-demand.

  1. On-demand instance is a virtual machine that is acquired when needed, remains to be acquired by the user, and will only be released when the user desires.
  2. Spot instance is a virtual machine that is acquired when needed, significantly cheaper than its on-demand equivalent, but may be preempted by the cloud provider with a short notice.
  3. Spot first, fall back to on-demand is a market type that Anyscale provides, which attempts to acquire a spot instance first, and will fall back to acquire an on-demand instance when the spot instance is not available on the cloud provider.

One pain point of Spot first, fall back to on-demand is spot instance availability - spot instances may not be available at the time of acquiring, thus on-demand instances is acquired. However, spot instance may become available later, but it cannot be utilized by the cluster. To solve this pain point, we introduce On-Demand to Spot Fallback.

On-Demand to Spot Fallback

This compute configuration periodically attempts to launch equivalent spot instances for workloads running on-demand instances. When a spot instance becomes available, Anyscale replaces the on-demand instance with the newly launched spot instance and migrates the existing workload without interruption to service.

Multi-zone search

When the cloud provider reclaims the spot instance, Anyscale preempts the eviction by starting an equivalent on-demand instance to replace it.

Multi-zone search

In this way, you're able to reap the steep discounts of spot instances while retaining the reliability of on-demand.

How to use

The feature is enabled for Ray 2.7+ and nightly images built after Aug 22, 2023.

Via the Console

Configuration for on-demand to spot fallback in the Anyscale Console is done completely via the Advanced Configurations input box. The JSON key value you want to specify is:

{
"TagSpecifications": [
{
"Tags": [
{
"Key": "as-feature-enable-fallback-to-spot",
"Value": "true"
}
],
"ResourceType": "instance"
}
]
}

Via SDK

compute_config.yaml

cloud: anyscale_v2_default_cloud_vpn_us_east_2 # You may specify `cloud_id` instead
allowed_azs:
- us-east-2a
- us-east-2b
- us-east-2c
head_node_type:
name: head_node_type
instance_type: m5.2xlarge
worker_node_types:
- name: cpu_worker
instance_type: c5.8xlarge
min_workers: 0
max_workers: 10
use_spot: true
fallback_to_ondemand: true
- name: gpu_worker
instance_type: g4dn.8xlarge
min_workers: 0
max_workers: 10
use_spot: true
fallback_to_ondemand: true
aws:
TagSpecifications:
- ResourceType: instance
Tags:
- Key: as-feature-enable-fallback-to-spot
Value: true

Python SDK Code

import yaml

from anyscale.sdk.anyscale_client.models import CreateClusterCompute
from anyscale import AnyscaleSDK

sdk = AnyscaleSDK()

with open('compute_config.yaml') as f:
compute_configs = yaml.safe_load(f)

# If your config file contains `cloud`, use this to get the `cloud_id`
if "cloud" in compute_configs:
compute_configs["cloud_id"] = sdk.search_clouds(
{"name": {"equals": compute_configs["cloud"]}}
).results[0].id
del compute_configs["cloud"]

config=sdk.create_cluster_compute(CreateClusterCompute(
name="my-cluster-compute",
config=compute_configs
))