Skip to main content

Train Dashboard

The Train dashboard streamlines the debugging of Ray Train workloads. This dashboard enables you to gain deeper insights into individual workers' progress, pinpoint stragglers, and identify bottlenecks for faster, more efficient training.

This dashboard should be the starting point for debugging any issue with your Train workload. It links to other pages of the Anyscale dashboard for more detailed information about the workload, such as logs, metrics, tasks, actors, or nodes.

Accessing the Train Dashboard

To access Train workload dashboards, click the Workloads tab in the Jobs or Workspaces page. Then, select the Train tab.

Train Dashboard

Train dashboard overview

The Train Dashboard provides a high-level overview of the training workload and its progress. It starts with a list of train runs which can have multiple attempts. Each Train attempt can have multiple train workers which is running your training code. New attempts are created whenever the Train run retries due to failure or scales up or down due to elastic training.

Train Run Details

Compatibility

The Train Dashboard only supports Ray 2.30.0 or later. For data persistence, the dashboard requires Ray 2.44.0 or later. Although Ray Train V2 is not required, using Ray Train V2 will provide a better debugging experience as Ray Train V2 supports controller logs and structured worker logs.