RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Traditionally, drum source separation has been tack-led using nonnegative spectro-temporal factorization methods. Only recently, deep learning showed unprecedented performance in separating five stems from a drum mixture, namely, kick drum, snare drum, toms, hi-hat, and cymbals. The literature, however, still lacks a thorough comparison of the techniques readily available in the context of music source separation. In this paper, we conduct a first benchmarking analysis of music demixing models tailored for deep drum source separation. We evaluate a range of state-of-the-art neural network architectures, including HT-Demucs, MDX23C, and BS-RoFormer, trained using StemGMD, a large-scale dataset of isolated single-instrument drum stems. Besides demonstrating that said architectures outperform the state-of-the-art method for drum source separation, we discuss their strengths and weaknesses, giving insights into their performance and ultimately offering valuable guidance for researchers and practitioners willing to develop drum demixing models for different applications, among which those related to music making, personalized listening, and online music education stand out.

Benchmarking Music Demixing Models for Deep Drum Source Separation

Mezza, Alessandro Ilic;Giampiccolo, Riccardo;Bernardini, Alberto;Sarti, Augusto

2024-01-01

Abstract

Traditionally, drum source separation has been tack-led using nonnegative spectro-temporal factorization methods. Only recently, deep learning showed unprecedented performance in separating five stems from a drum mixture, namely, kick drum, snare drum, toms, hi-hat, and cymbals. The literature, however, still lacks a thorough comparison of the techniques readily available in the context of music source separation. In this paper, we conduct a first benchmarking analysis of music demixing models tailored for deep drum source separation. We evaluate a range of state-of-the-art neural network architectures, including HT-Demucs, MDX23C, and BS-RoFormer, trained using StemGMD, a large-scale dataset of isolated single-instrument drum stems. Besides demonstrating that said architectures outperform the state-of-the-art method for drum source separation, we discuss their strengths and weaknesses, giving insights into their performance and ultimately offering valuable guidance for researchers and practitioners willing to develop drum demixing models for different applications, among which those related to music making, personalized listening, and online music education stand out.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				2024 IEEE 5th International Symposium on the Internet of Sounds (IS2)
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_IS2__Benchmarking_Music_Demixing_Models_for_Deep_Drum_Source_Separation.pdf Accesso riservato : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 646.36 kB Formato Adobe PDF Visualizza/Apri	646.36 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1274936

Citazioni

ND

0

ND

social impact