System cluster
System cluster
This page provides details on the system cluster that powers Anyscale observability dashboards, including the task dashboard and actor dashboard.
What is the system cluster?
When you access a task or actor dashboard, Anyscale deploys a single node system cluster in your Anyscale cloud. A single system cluster monitors all workloads in all projects for an Anyscale cloud, but each cloud deployment requires a separate system cluster.
Anyscale uses compute and storage in your cloud provider account for the system cluster. The system cluster serves dashboard data to the Anyscale console without passing task or actor details through the Anyscale control plane.
Requirements and limitations
The following requirements and limitations exist:
- The system cluster is available on Anyscale clouds deployed to AWS or Google Cloud that use virtual machines, or on Kubernetes clouds (AKS, EKS, and GKE).
- To terminate the system cluster with the Anyscale CLI or SDK, you must use version 0.26.32 or later.
- The system cluster only captures metrics when enabled. Jobs or workspaces launched before you enable the system cluster don't report metrics, even if you enable the system cluster while they're running.
For Kubernetes clouds, node sizing and optional node reservation requirements also apply.
Kubernetes requirements
For Kubernetes clouds (AKS, EKS, and GKE), the system cluster requires a node with at least 7 allocatable CPUs and 31 GiB of allocatable memory. An 8 CPU and 32 GiB RAM node satisfies this requirement when Kubernetes system reservations are the only other workloads consuming node resources.
To reserve a dedicated node for the system cluster, apply a node taint and label. Anyscale pre-configures the system cluster with a toleration for the node.anyscale.com/type=system-cluster:NoSchedule taint, so the system cluster can schedule on the tainted node while other pods can't. With the label, Anyscale uses node affinity to target the reserved node. Without a taint and label, the system cluster runs on any node with sufficient available resources.
To taint a node for the system cluster, run the following command:
kubectl taint nodes <node-name> node.anyscale.com/type=system-cluster:NoSchedule
To label a node for the system cluster, run the following command:
kubectl label nodes <node-name> node.anyscale.com/type=system-cluster
Ray version requirements
Enabling the system cluster enables both the task and actor dashboards for a cloud. However, Ray version requirements still apply:
| Dashboard | Ray version |
|---|---|
| Task dashboard | Ray 2.49.0 or later |
| Actor dashboard | Ray 2.50.0 or later |
Metrics only display for workloads running supported Ray versions.
Costs
See the Anyscale pricing page for details on costs associated with the system cluster.
The following table indicates the compute requirements for each supported cloud deployment:
| Cloud deployment | Instance type / Requirements |
|---|---|
| AWS VM | m5.2xlarge |
| Google Cloud VM | n2-standard-8 |
| Kubernetes (AKS, EKS, GKE) | Pod with 7 CPUs and 31 GiB memory |
Anyscale stores the metrics the system cluster collects in the default cloud object storage location configured for your Anyscale cloud deployment. Anyscale doesn't delete data when the system cluster terminates.
Enable the system cluster
An organization admin can enable the system cluster using the Anyscale console or CLI. Enabling the system cluster for an Anyscale cloud enables both the task and actor dashboards for that cloud.
- Anyscale console
- CLI
To enable or disable the system cluster in the Anyscale console, complete the following steps:
- Log in to the Anyscale console.
- Click the user icon.
- Click Clouds.
- Click the name of your cloud.
- Click Settings.
- Under Observability > System cluster, toggle the system cluster on.
To enable the system cluster, run the following command:
anyscale cloud config update --cloud-id <cloud-id> --enable-system-cluster
To disable the system cluster and turn off the dashboards for a cloud, run the following command:
anyscale cloud config update --cloud-id <cloud-id> --disable-system-cluster
Monitor the system cluster
You can monitor the system cluster using the Anyscale cloud dashboard. See Dashboard.
Automatic termination
By default, the system cluster terminates after 8 hours if no users are actively viewing a task or actor dashboard in your cloud deployment.
Anyscale automatically restarts the system cluster when a user views a task or actor dashboard.
Terminate the system cluster
Cloud owners can manually terminate the system cluster for a cloud.
Terminating the system cluster shuts down the task and actor dashboards for all clusters in this cloud. This action doesn't delete data and Anyscale continues to export task and actor data for future dashboard usage.
- Anyscale console
- CLI
- SDK
To terminate the system cluster in the Anyscale console, complete the following steps:
- Log in to the Anyscale console.
- Click the user icon.
- Click Clouds.
- Click the name of your cloud.
- Click Settings.
- Under Observability > System cluster, click Terminate cluster. A confirmation dialog displays.
- Click Terminate to shut down the system cluster.
The following CLI example uses the wait option to block return until the system cluster successfully shuts down. If you don't specify the wait option, the command returns after initializing termination. Requires CLI version 0.26.32 or later.
anyscale cloud terminate-system-cluster --cloud-id <cloud-id> --wait
The following SDK example uses the wait option to block return until the system cluster successfully shuts down. If you don't specify the wait option, the command returns after initializing termination. Requires CLI version 0.26.32 or later.
import anyscale
anyscale.cloud.terminate_system_cluster("<cloud-id>", wait=True)