Anyscale task dashboard
This page provides details on using the Anyscale task dashboard for monitoring Ray tasks in your Anyscale jobs and workspaces.
This feature is in beta release. Using this feature has cost and performance implications.
The Anyscale task dashboard persists task details beyond the lifetime of the cluster for easy offline debugging. Anyscale uses compute and storage in your cloud provider account for the task dashboard, serving the dashboard to the Anyscale console without passing task details through the Anyscale control plane.
Anyscale has tested support for the task dashboard with millions of tasks. If you encounter limitations due to task count, contact Anyscale support.
Use the Anyscale task dashboard
The Anyscale task dashboard updates in near real-time and provides filtering and aggregate counts for tasks based on the following states:
State | Description |
---|---|
Finished | Finished tasks. |
Failed | Failed tasks. |
Running | Tasks actively running on your cluster. |
Pending dependencies | Tasks waiting for dependent tasks to complete. |
Pending scheduling | Submitted tasks waiting for scheduling on your cluster. |
The following table describes the information in each component of the task dashboard.
Dashboard component | Description |
---|---|
Task summary | A count of all tasks and tasks aggregated by state. |
Tasks by function | A view of all tasks by function name. |
Tasks by errors | A view of all errors raised by tasks. |
Tasks by jobs | A view of tasks by job ID. |
Task table | A detailed view of all tasks that includes options for filtering and search. |
The Task table displays details for each task using the following fields. Use the search and filter to limit displayed tasks.
Field | Description |
---|---|
ID | The unique ID for each task. |
Function name | The name of the function that triggered the task. |
State | The state of the task.This is the main field used for filtering the task table. |
State details | Additional details about the state of the task.For most task states, you can click the More details link under State details in the Task table to show error messages or cluster event logs. |
Duration | The amount of time the task ran. |
Start time | The date and time the task began. |
End time | The date and time the task finished. |
Required resources | The type and amount of resources required for the task. |
Node ID | The unique ID for the node where the task ran. |
Worker PID | The ID of the Ray worker process. |
Worker ID | The ID of the worker that ran the task. |
Attempt no. | Indicates the count of retry attempts.0 for tasks that succeed on the initial attempt. |
Session | The session in which the task ran. |
Job ID | The ID of the job where the task ran.Workspaces assign job IDs when you run code that triggers compute on your Ray cluster. |
Type | Indicates whether the task ran on an actor. |
Requirements and limitations
An organization admin must enable the task dashboard for an Anyscale cloud deployment. See Enable system cluster and task dashboard.
The following requirements and limitations exist:
- The task dashboard is available on Anyscale clouds deployed to AWS or Google Cloud that use virtual machines.
- The task dashboard isn't available for any clouds deployed using Kubernetes, including EKS and GKE.
- The task dashboard reports metrics for jobs and workspaces that use Ray version 2.40.0 and above.
- To terminate the system cluster with the Anyscale CLI or SDK, you must use version 0.26.32 or above.
- The task dashboard only captures metrics when the system cluster is enabled. Jobs or workspaces launched before the system cluster is enabled don't report metrics, even if you enable the system cluster while they're running.
Access the task dashboard
You can view the task dashboard in the Anyscale console for any job or workspace.
Complete the following steps to access the Anyscale task dashboard:
- Log in to the Anyscale console.
- Click Workspaces or Jobs.
- Click the name of a workspace or job.
- Click Ray Workloads.
- Click Tasks. The task dashboard appears.
Anyscale deploys a system cluster to power the task dashboard. If your cloud doesn't have an active system cluster running, a screen with the message Observability service is launching appears as the system cluster starts.
Task dashboard system cluster
When you access the task dashboard, Anyscale deploys a single node system cluster in the Anyscale cloud of your job or workspace. A single system cluster monitors all workloads in all projects for your Anyscale cloud, but each cloud deployment requires a separate system cluster. See the Anyscale pricing page for details on costs associated with this cluster.
The following table indicates the instance type used for each supported cloud deployment:
Cloud deployment | Instance type |
---|---|
AWS | m5.2xlarge |
Google Cloud | n2-standard-8 |
By default, the system cluster terminates after 8 hours if no users are actively viewing a task dashboard in your cloud deployment. You can use the Anyscale CLI or SDK to manually terminate the system cluster, but Anyscale automatically restarts the system cluster when a user views the task dashboard. See Terminate the system cluster.
Anyscale stores the metrics collected by the system cluster in the default cloud object storage location configured for your Anyscale cloud deployment. Anyscale doesn't delete data when the system cluster terminates.
Enable system cluster and task dashboard
An organization admin must enable the system cluster using the Anyscale CLI. Enabling the system cluster for an Anyscale cloud enables the task dashboard for that cloud.
To enable the system cluster, run the following command:
anyscale cloud config update --cloud-id <cloud-id> --enable-system-cluster
To disable the system cluster and turn off the task dashboard for a cloud, run the following command:
anyscale cloud config update --cloud-id <cloud-id> --disable-system-cluster
Monitor the system cluster
You can monitor the system cluster using the Anyscale cloud dashboard. See Dashboard.
Terminate the system cluster
Use the Anyscale CLI or SDK to terminate the system cluster. This requires version 0.26.32 or above.
The following examples use the wait
option to wait until termination completes before returning. If you don't specify the wait
option, the command returns after initializing termination.
- CLI
- SDK
anyscale cloud terminate-system-cluster --cloud-id <cloud-id> --wait
import anyscale
anyscale.cloud.terminate_system_cluster("<cloud-id>", wait=True)
Performance impact
Reporting task metrics to the task dashboard might introduce a small overhead to your Ray applications. This overhead might impact performance for workloads that push the upper limits for throughput and latency for Ray tasks.
If you notice performance degradation for your workload after enabling the task dashboard, disable the task dashboard for the Anyscale cloud containing the workload. Contact Anyscale support to learn about new features that might reduce performance impact.