Skip to main content

AudioDatasource API

info

If you want to access this feature, contact the Anyscale team.


AudioDatasource

A Datasource that reads audio files. Audio formats supported by ffmpeg are compatible, including .mp3, .wav, .flac, and .ogg.

info

To use AudioDatasource, you will need the decord library. Install it with your preferred package management system (e.g. pip install --user decord).


ray.data.read_datasource

ray.data.read_datasource(
datasource: AudioDatasource,
*,
paths: Union[str, List[str]],
sample_rate: Optional[int],
mono_audio: Optional[bool],
) -> Dataset

Read data from audio files in path into a Ray Dataset.

Parameters

  • paths: A file path or list of file paths to read audio files from.
  • sample_rate: The sample rate for reading audio; the default value is 44100Hz.
  • mono_audio: If true, use mono signal to read audio; if false, use the original layout. False by default.

Returns

A Ray Dataset that contains data from audio files as arrays of floats.

Examples

import ray
from ray.anyscale.data import AudioDatasource

audio_uri = [(
"s3://anonymous@air-example-data-2/6G-audio-data-LibriSpeech-train-clean-100-flac"
f"/train-clean-100/5022/29411/5022-29411-{n:04}.flac")
for n in range(10)
]
ds = ray.data.read_datasource(
AudioDatasource(),
paths=audio_uri,
sample_rate=48000,
mono_audio=True,
)