In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this letter, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drum performances using ten real-sounding acoustic drum kits. Totaling 1224 h, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to considerably outperform state-of-the-art nonnegative spectro-temporal factorization methods.

Toward deep drum source separation

Mezza, Alessandro Ilic;Giampiccolo, Riccardo;Bernardini, Alberto;Sarti, Augusto
2024-01-01

Abstract

In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this letter, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drum performances using ten real-sounding acoustic drum kits. Totaling 1224 h, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to considerably outperform state-of-the-art nonnegative spectro-temporal factorization methods.
2024
Deep learning, Drums, Music decomposition, Source separation, U-Net
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0167865524001351-main.pdf

accesso aperto

: Publisher’s version
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1265962
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact