RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this letter, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drum performances using ten real-sounding acoustic drum kits. Totaling 1224 h, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to considerably outperform state-of-the-art nonnegative spectro-temporal factorization methods.

Toward deep drum source separation

Mezza, Alessandro Ilic;Giampiccolo, Riccardo;Bernardini, Alberto;Sarti, Augusto

2024-01-01

Abstract

In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this letter, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drum performances using ten real-sounding acoustic drum kits. Totaling 1224 h, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to considerably outperform state-of-the-art nonnegative spectro-temporal factorization methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo della rivista
	
				PATTERN RECOGNITION LETTERS
			
	Parole chiave
	
				Deep learning, Drums, Music decomposition, Source separation, U-Net
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0167865524001351-main.pdf accesso aperto : Publisher’s version Dimensione 1.04 MB Formato Adobe PDF Visualizza/Apri	1.04 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1265962

Citazioni

ND

ND

1

social impact