Skip to main content

Train dashboard profiling tools

On-demand GPU profiling

note

This feature requires Ray 2.47.0 and above. Use this feature to profile PyTorch training runs for PyTorch versions 2.0 and above.

The Train dashboard allows you to take an on-demand GPU profile of a PyTorch training run to generate a trace that shows a timeline of CPU and GPU operations. Use this trace to diagnose training bottlenecks and gain a better understanding of the computation and communication operations that are happening under the hood.

The following is an example of a trace visualization generated from on-demand GPU profiling on Anyscale. This trace can help identify bottlenecks on both the CPU and GPU sides by showing a timeline of CPU operations and GPU kernels. The example trace shows a collective all-reduce operation which is part of the Distributed Data Parallel algorithm for distributed training.

Example GPU trace

Configure GPU profiling for Anyscale

This feature relies on Dynolog. Complete the following steps to set up dependencies:

  1. Anyscale base images include Dynolog binaries for all Ray versions 2.47.0 and above. If you are building your own image and not extending an Anyscale base image, install the Dynolog binaries on your container image. See the installation instructions on the Dynolog repo.
  2. Set the KINETO_USE_DAEMON and KINETO_DAEMON_INIT_DELAY_S environment variables on the training workers. Here's how you can do this with Ray Train:
trainer = ray.train.torch.TorchTrainer(
...,
run_config=ray.train.RunConfig(
...,
worker_runtime_env={
"env_vars": {"KINETO_USE_DAEMON": "1", "KINETO_DAEMON_INIT_DELAY_S": "5"}
},
)
)

Collect a GPU profile

Complete the following steps to generate a profile for your GPU training worker.

  1. Navigate to an active Train run page.
  2. Click on the GPU Profiling button on one of the workers.
  3. Enter the profiling duration in the configuration window that appears.
  4. Wait for profiling to finish. The profiling result will be downloaded as a JSON file that can be viewed in chrome://tracing on a Chrome browser or Perfetto trace viewer.