Skip to main content

Anyscale task dashboard

This page provides details on using the Anyscale task dashboard for monitoring Ray tasks in your Anyscale jobs and workspaces.

important

This feature is in beta release. Using this feature has cost and performance implications.

The Anyscale task dashboard persists task details beyond the lifetime of the cluster for easy offline debugging. Anyscale uses compute and storage in your cloud provider account for the task dashboard, serving the dashboard to the Anyscale console without passing task details through the Anyscale control plane.

Anyscale has tested support for the task dashboard with millions of tasks. If you encounter limitations due to task count, contact Anyscale support.

Use the Anyscale task dashboard

The Anyscale task dashboard updates in near real-time and provides filtering and aggregate counts for tasks based on the following states:

StateDescription
FinishedFinished tasks.
FailedFailed tasks.
RunningTasks actively running on your cluster.
Pending dependenciesTasks waiting for dependent tasks to complete.
Pending schedulingSubmitted tasks waiting for scheduling on your cluster.

The following table describes the information in each component of the task dashboard.

Dashboard componentDescription
Task summaryA count of all tasks and tasks aggregated by state.
Tasks by functionA view of all tasks by function name.
Tasks by errorsA view of all errors raised by tasks.
Tasks by jobsA view of tasks by job ID.
Task tableA detailed view of all tasks that includes options for filtering and search.

The Task table displays details for each task using the following fields. Use the search and filter to limit displayed tasks.

FieldDescription
IDThe unique ID for each task.
Function nameThe name of the function that triggered the task.
StateThe state of the task.

This is the main field used for filtering the task table.
State detailsAdditional details about the state of the task.

For most task states, you can click the More details link under State details in the Task table to show error messages or cluster event logs.
DurationThe amount of time the task ran.
Start timeThe date and time the task began.
End timeThe date and time the task finished.
Required resourcesThe type and amount of resources required for the task.
Node IDThe unique ID for the node where the task ran.
Worker PIDThe ID of the Ray worker process.
Worker IDThe ID of the worker that ran the task.
Attempt no.Indicates the count of retry attempts.

0 for tasks that succeed on the initial attempt.
SessionThe session in which the task ran.
Job IDThe ID of the job where the task ran.

Workspaces assign job IDs when you run code that triggers compute on your Ray cluster.
TypeIndicates whether the task ran on an actor.

Requirements and limitations

An organization admin must enable the task dashboard for an Anyscale cloud deployment. See Enable system cluster and task dashboard.

The following requirements and limitations exist:

  • The task dashboard is available on Anyscale clouds deployed to AWS or Google Cloud that use virtual machines.
    • The task dashboard isn't available for any clouds deployed using Kubernetes, including EKS and GKE.
  • The task dashboard reports metrics for jobs and workspaces that use Ray version 2.40.0 and above.
  • To terminate the system cluster with the Anyscale CLI or SDK, you must use version 0.26.32 or above.
  • The task dashboard only captures metrics when the system cluster is enabled. Jobs or workspaces launched before the system cluster is enabled don't report metrics, even if you enable the system cluster while they're running.

Access the task dashboard

You can view the task dashboard in the Anyscale console for any job or workspace.

Complete the following steps to access the Anyscale task dashboard:

  1. Log in to the Anyscale console.
  2. Click Workspaces or Jobs.
  3. Click the name of a workspace or job.
  4. Click Ray Workloads.
  5. Click Tasks. The task dashboard appears.
important

Anyscale deploys a system cluster to power the task dashboard. If your cloud doesn't have an active system cluster running, a screen with the message Observability service is launching appears as the system cluster starts.

Task dashboard system cluster

When you access the task dashboard, Anyscale deploys a single node system cluster in the Anyscale cloud of your job or workspace. A single system cluster monitors all workloads in all projects for your Anyscale cloud, but each cloud deployment requires a separate system cluster. See the Anyscale pricing page for details on costs associated with this cluster.

The following table indicates the instance type used for each supported cloud deployment:

Cloud deploymentInstance type
AWSm5.2xlarge
Google Cloudn2-standard-8

By default, the system cluster terminates after 8 hours if no users are actively viewing a task dashboard in your cloud deployment. You can use the Anyscale CLI or SDK to manually terminate the system cluster, but Anyscale automatically restarts the system cluster when a user views the task dashboard. See Terminate the system cluster.

Anyscale stores the metrics collected by the system cluster in the default cloud object storage location configured for your Anyscale cloud deployment. Anyscale doesn't delete data when the system cluster terminates.

Enable system cluster and task dashboard

An organization admin must enable the system cluster using the Anyscale CLI. Enabling the system cluster for an Anyscale cloud enables the task dashboard for that cloud.

To enable the system cluster, run the following command:

anyscale cloud config update --cloud-id <cloud-id> --enable-system-cluster

To disable the system cluster and turn off the task dashboard for a cloud, run the following command:

anyscale cloud config update --cloud-id <cloud-id> --disable-system-cluster

Monitor the system cluster

You can monitor the system cluster using the Anyscale cloud dashboard. See Dashboard.

Terminate the system cluster

Use the Anyscale CLI or SDK to terminate the system cluster. This requires version 0.26.32 or above.

note

The following examples use the wait option to wait until termination completes before returning. If you don't specify the wait option, the command returns after initializing termination.

anyscale cloud terminate-system-cluster --cloud-id <cloud-id> --wait

Performance impact

Reporting task metrics to the task dashboard might introduce a small overhead to your Ray applications. This overhead might impact performance for workloads that push the upper limits for throughput and latency for Ray tasks.

If you notice performance degradation for your workload after enabling the task dashboard, disable the task dashboard for the Anyscale cloud containing the workload. Contact Anyscale support to learn about new features that might reduce performance impact.