On-Demand to Spot Fallback Compute Configs User Guide

Anyscale provides three types of worker instances: 1) On-demand, 2) Spot, and 3) Spot first, fall back to on-demand.

On-demand instance is a virtual machine that is acquired when needed, remains to be acquired by the user, and will only be released when the user desires.
Spot instance is a virtual machine that is acquired when needed, significantly cheaper than its on-demand equivalent, but may be preempted by the cloud provider with a short notice.
Spot first, fall back to on-demand is a market type that Anyscale provides, which attempts to acquire a spot instance first, and will fall back to acquire an on-demand instance when the spot instance is not available on the cloud provider.

One pain point of Spot first, fall back to on-demand is spot instance availability - spot instances may not be available at the time of acquiring, thus on-demand instances is acquired. However, spot instance may become available later, but it cannot be utilized by the cluster. To solve this pain point, we introduce On-Demand to Spot Fallback.

On-Demand to Spot Fallback

This compute configuration periodically attempts to launch equivalent spot instances for workloads running on-demand instances. When a spot instance becomes available, Anyscale replaces the on-demand instance with the newly launched spot instance and migrates the existing workload without interruption to service.

When the cloud provider reclaims the spot instance, Anyscale preempts the eviction by starting an equivalent on-demand instance to replace it.

In this way, you're able to reap the steep discounts of spot instances while retaining the reliability of on-demand.

How to use

The feature is enabled for Ray 2.7+ and nightly images built after Aug 22, 2023.

Via the Console

Configuration for on-demand to spot fallback in the Anyscale Console is done completely via the Advanced Configurations input box. The JSON key value you want to specify is:

AWS (EC2)
GCP (GCE)

{
  "TagSpecifications": [
    {
      "Tags": [
        {
          "Key": "as-feature-enable-fallback-to-spot",
          "Value": "true"
        }
      ],
      "ResourceType": "instance"
    }
  ]
}

{
  "instance_properties": {
    "labels": {
      "as-feature-enable-fallback-to-spot": "true"
    }
  }
}

Via SDK

compute_config.yaml

cloud: anyscale_v2_default_cloud_vpn_us_east_2 # You may specify `cloud_id` instead
allowed_azs:
  - us-east-2a
  - us-east-2b
  - us-east-2c
head_node_type:
  name: head_node_type
  instance_type: m5.2xlarge
worker_node_types:
  - name: cpu_worker
    instance_type: c5.8xlarge
    min_workers: 0
    max_workers: 10
    use_spot: true
    fallback_to_ondemand: true
  - name: gpu_worker
    instance_type: g4dn.8xlarge
    min_workers: 0
    max_workers: 10
    use_spot: true
    fallback_to_ondemand: true
aws:
  TagSpecifications:
    - ResourceType: instance
      Tags:
        - Key: as-feature-enable-fallback-to-spot
          Value: true

Python SDK Code

import yaml

from anyscale.sdk.anyscale_client.models import CreateClusterCompute
from anyscale import AnyscaleSDK

sdk = AnyscaleSDK()

with open('compute_config.yaml') as f:
compute_configs = yaml.safe_load(f)

# If your config file contains `cloud`, use this to get the `cloud_id`
if "cloud" in compute_configs:
compute_configs["cloud_id"] = sdk.search_clouds(
{"name": {"equals": compute_configs["cloud"]}}
).results[0].id
del compute_configs["cloud"]

config=sdk.create_cluster_compute(CreateClusterCompute(
name="my-cluster-compute",
config=compute_configs
))

On-Demand to Spot Fallback Compute Configs User Guide

On-Demand to Spot Fallback​

How to use​

Via the Console​

Via SDK​

Python SDK Code​

On-Demand to Spot Fallback

How to use

Via the Console

Via SDK

Python SDK Code