Tutorial: Create and use an Anyscale-managed machine pool
This tutorial provides an overview of creating an Anyscale-managed machine pool and using it to run jobs and launch workspaces.
You can also configure customer-managed machine pools.
Requirements
Configuring and deploying machine pools is a highly privileged task. Completing the steps in this tutorial requires the following:
- An Anyscale organization with one or more Anyscale clouds.
- You must have the following permissions:
- Organization owner for your Anyscale organization.
- Cloud collaborator or cloud owner for your Anyscale cloud.
- You must authenticate the Anyscale CLI to your Anyscale organization using these credentials. See CLI configuration.
Step 1: Create a machine pool
Run the following command to create an empty machine pool named capacity-reservations
:
anyscale machine-pool create --name capacity-reservations
Step 2: Attach the machine pool to clouds
Machine pools are scoped to your Anyscale organization. You must attach an Anyscale cloud to your machine pool to use the machine pool in your workloads.
Use the following syntax to attach an Anyscale cloud to a machine pool:
anyscale machine-pool attach --name <pool-name> -- cloud <cloud-name>
For example, the following commands attach a machine pool named capacity-reservations
to clouds named dev-cloud
and prod-cloud
:
anyscale machine-pool attach --name capacity-reservations --cloud dev-cloud
anyscale machine-pool attach --name capacity-reservations --cloud prod-cloud
Step 3: Configure the machine pool
The machine pool configuration file specifies instance types, capacity reservation information, priorities, and quotas.
Use your preferred file editor to create a file named machine-pool-capacity-reservations.yaml
in your current directory and copy the following definition into the file:
# machine-pool-capacity-reservations.yaml
kind: ANYSCALE_MANAGED
machine_types:
- machine_type: RES-8CPU-32GB
# Recycle policy configures instance rotation for cloud deployments on VMs.
# For cloud deployments on Kubernetes, the pods rotate after each workload.
recycle_policy:
# Rotate instances after 10 workloads.
max_workloads: 10
# Rotate instances that have been running for
# more than 60 minutes (note: rotation only
# happens when workloads are completed).
rotation_interval: 60m
# Shut down idle instances after 10 minutes.
max_idle_duration: 10m
launch_templates:
- instance_type: "m5.2xlarge"
market_type: ON_DEMAND
advanced_instance_config:
CapacityReservationSpecification:
CapacityReservationTarget:
CapacityReservationId: <your-capacity-reservation-id>
# 'zones' is optional. If not specified, the scheduler
# will attempt to provision instances in any zone in the
# cloud that is requesting this machine pool instance.
zones:
- us-west-2a
- us-west-2b
partitions:
- name: job
# The number of instances that this partition should contain.
size: 6
# Each rule can define a selector, priority, and optional quota.
# Rules are evaluated in order, and the first rule that matches
# will be used.
#
# Selectors can be used to select workloads based on labels
# (these are Kubernetes-style selectors; see below for details).
#
# Priority is an integer that defines the priority of the rule.
# The default priority is 0.
#
# Quotas are optional, and define a limit on the number of machines
# that can be allocated to workloads that fall under this rule.
rules:
- selector: workload-type in (job)
priority: 20
- selector: workload-type in (workspace)
priority: 10
quota: 3
- name: workspace
size: 4
rules:
- selector: workload-type in (workspace)
priority: 20
Step 4: Apply the configuration
Run the following command to update the machine pool named capacity-reservations
to the configurations specified in the machine-pool-capacity-reservations.yaml
file:
anyscale machine-pool update --name capacity-reservations --spec-file machine-pool-capacity-reservations.yaml
Running this command initializes resources in your AWS account. To avoid unexpected AWS charges, make sure you scale down the pool.
Step 5: Verify the machine pool
Run the following command to confirm that the machine pool has initialized with the specified capacity:
anyscale machine-pool describe --name capacity-reservations
If you see an error indicating scheduler cache is not ready
, the scheduler is still initializing. Wait the duration specified in the error message before running the command again and then proceeding.
Step 6: Create a compute configuration and submit a job
In this step, you specify and compute configuration and simple script for Ray job, then submit it to your machine pool.
Create a file named ten-node-config.yaml
and copy the following definition into the file:
# ten-node-config.yaml
cloud: CLOUD_NAME
head_node:
instance_type: "m5.2xlarge"
worker_nodes:
- name: reserved-cpu
instance_type: RES-8CPU-32GB
min_nodes: 10
max_nodes: 10
flags:
cloud_deployment:
machine_pool: capacity-reservations
flags:
# If the cluster doesn't satisfy min_nodes for > 5 minutes while
# in a RUNNING state, then terminate the workload (and re-queue
# it if it's a job).
workload_recovering_timeout: 5m
# If the cluster doesn't satisfy min_nodes for > 24h while in
# a STARTING state, then terminate the workload.
workload_starting_timeout: 24h
Run the following command to create a compute configuration using this file in your Anyscale cloud:
anyscale compute-config create --name ten-node-config -f ten-node-config.yaml
Create a file named loop.py
and copy the following Ray code into the file:
# loop.py
import ray
import time
ray.init()
@ray.remote
def foo(i):
return i
for i in range(6000):
time.sleep(1)
print(f'Iteration {ray.get(foo.remote(i))}')
Run the following code to submit the job:
anyscale job submit --name ten-node-job --compute-config ten-node-config --working-dir=. -- python loop.py
Step 7: Create a workspace
In this step, you configure and create a workspace that requires four machines.
An example workspace configuration might be:
# four-node-config.yaml
cloud: CLOUD_NAME
head_node:
instance_type: "m5.2xlarge"
worker_nodes:
- name: reserved-cpu
instance_type: RES-8CPU-32GB
min_nodes: 4
max_nodes: 4
flags:
cloud_deployment:
machine_pool: capacity-reservations
Create the compute configuration:
anyscale compute-config create --name four-node-config -f four-node-config.yaml
Creating a workspace with this configuration evicts four of the ten machines assigned to the job. To confirm the current assigned resources, run the following command:
anyscale machine-pool describe --name capacity-reservations
Run the following command to create the workspace:
anyscale workspace_v2 create --name four-node-workspace --compute-config four-node-config
Run the following command to describe the pool again and observe the scheduler behavior:
anyscale machine-pool describe --name capacity-reservations
The output shows the allocation of machines from the pool to current workloads.
Step 8: Scale down and delete the machine pool
To clean up the demo, scale down the machine pool, detach it from your Anyscale cloud, and then delete it.
To scale down, create a file named machine-pool-cleanup.yaml
and copy the following definition into the file:
kind: ANYSCALE_MANAGED
machine_types: [] # Set the machines to an empty list
Run the following command to update the machine pool in your Anyscale organization:
anyscale machine-pool update --name capacity-reservations --spec-file machine-pool-cleanup.yaml
Detach the machiine pool from your Anyscale cloud using the anyscale machine-pool detach
command. The following example detaches the pool from two clouds named dev-cloud
and prod-cloud
:
anyscale machine-pool detach --name capacity-reservations --cloud dev-cloud
anyscale machine-pool detach --name capacity-reservations --cloud prod-cloud
Run the following command to delete the machine pool:
anyscale machine-pool delete --name capacity-reservations