In music demixing, drums is conventionally treated as a single stem despite being an ensemble of percussive instruments in itself. At the same time, the availability of publicly-available datasets specifically tailored for drums demixing has been severely lacking. To address this gap, we present StemGMD, the first large-scale multi-kit audio dataset of isolated single-instrument drum stems. Each audio clip in StemGMD is synthesized from MIDI recordings of expressive drums performances taken from Magenta’s Groove Midi Dataset (GMD) using ten real-sounding sample libraries from Logic Pro X. Totaling over 1224 hours, StemGMD is the largest dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit, i.e., kick drum, snare, high tom, mid-low tom, floor tom, open hi-hat, closed hi-hat, ride cymbal, crash cymbal. StemGMD also contains single hits for each drum piece at varying velocities, and is inherently aligned with the MIDI content and metadata found in GMD, allowing for a broad range of applications such as Automatic Drum Transcription. Furthermore, having access to isolated audio stems enables a vast array of diverse data augmentation methods that draw inspiration from common music production practices. Alongside the dataset, we release a reference model built upon a bank of parallel U-Nets that separates five stems (kick drum, snare, tom-toms, hi-hat, cymbals) from a stereo drum mixture through spectro-temporal soft masking. Such a model is meant to serve as a baseline for future research and might complement existing music demixing models.

StemGMD: A Large-Scale Multi-Kit Audio Dataset for Deep Drums Demixing

Mezza, Alessandro Ilic;Giampiccolo, Riccardo;Bernardini, Alberto;Sarti, Augusto
2023-01-01

Abstract

In music demixing, drums is conventionally treated as a single stem despite being an ensemble of percussive instruments in itself. At the same time, the availability of publicly-available datasets specifically tailored for drums demixing has been severely lacking. To address this gap, we present StemGMD, the first large-scale multi-kit audio dataset of isolated single-instrument drum stems. Each audio clip in StemGMD is synthesized from MIDI recordings of expressive drums performances taken from Magenta’s Groove Midi Dataset (GMD) using ten real-sounding sample libraries from Logic Pro X. Totaling over 1224 hours, StemGMD is the largest dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit, i.e., kick drum, snare, high tom, mid-low tom, floor tom, open hi-hat, closed hi-hat, ride cymbal, crash cymbal. StemGMD also contains single hits for each drum piece at varying velocities, and is inherently aligned with the MIDI content and metadata found in GMD, allowing for a broad range of applications such as Automatic Drum Transcription. Furthermore, having access to isolated audio stems enables a vast array of diverse data augmentation methods that draw inspiration from common music production practices. Alongside the dataset, we release a reference model built upon a bank of parallel U-Nets that separates five stems (kick drum, snare, tom-toms, hi-hat, cymbals) from a stereo drum mixture through spectro-temporal soft masking. Such a model is meant to serve as a baseline for future research and might complement existing music demixing models.
2023
File in questo prodotto:
File Dimensione Formato  
sdx_workshop_abstract.pdf

Accesso riservato

: Publisher’s version
Dimensione 131.25 kB
Formato Adobe PDF
131.25 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1274937
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact