RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The use of deep learning for sound event localization and classification with Wireless Acoustic Sensor Networks (WASNs) is an emerging research area. However, current methods for sound event localization and classification exhibit limitations in perceiving extensive soundscapes. They are typically effective only for a fraction of the soundscapes and do not fully exploit the surrounding information. Moreover, in outdoor settings, the performance accuracy is susceptible to the adverse effects of signal attenuation and environmental noise. In this paper, we propose a deep learning-based method that integrates frequency, temporal, and spatial domain features with attention mechanisms to estimate the location and the class of sound sources using a WASN in an outdoor setting. We introduce soundmap features to capture spatial information across multiple frequency bands and time frames. Furthermore, we integrate attention mechanisms to learn channel-wise relationships and temporal dependencies within acoustic features. To evaluate the proposed method, we conduct experiments using simulated datasets with different levels of noise and sizes of the monitoring area, as well as different array and source positions. Moreover, we conduct a real-world experiment in an outdoor environment with dimensions of 100 m × 80 m. The experimental results demonstrate the superiority of the proposed method over state-of-the-art methods in both sound event classification and sound source localization tasks.

Sound Event Localization and Classification using Wireless Acoustic Sensor Networks in Outdoor Environments

Zhang, Dongzhe;Chen, Jianfeng;Bai, Jisheng;Wang, Mou;Shi, Dongyuan;Niu, Qixiang;Bernardini, Alberto

2025-01-01

Abstract

The use of deep learning for sound event localization and classification with Wireless Acoustic Sensor Networks (WASNs) is an emerging research area. However, current methods for sound event localization and classification exhibit limitations in perceiving extensive soundscapes. They are typically effective only for a fraction of the soundscapes and do not fully exploit the surrounding information. Moreover, in outdoor settings, the performance accuracy is susceptible to the adverse effects of signal attenuation and environmental noise. In this paper, we propose a deep learning-based method that integrates frequency, temporal, and spatial domain features with attention mechanisms to estimate the location and the class of sound sources using a WASN in an outdoor setting. We introduce soundmap features to capture spatial information across multiple frequency bands and time frames. Furthermore, we integrate attention mechanisms to learn channel-wise relationships and temporal dependencies within acoustic features. To evaluate the proposed method, we conduct experiments using simulated datasets with different levels of noise and sizes of the monitoring area, as well as different array and source positions. Moreover, we conduct a real-world experiment in an outdoor environment with dimensions of 100 m × 80 m. The experimental results demonstrate the superiority of the proposed method over state-of-the-art methods in both sound event classification and sound source localization tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo della rivista
	
				IEEE SENSORS JOURNAL
			
	Parole chiave
	
				Deep learning
Microphone array
Sound event localization and classification
Wireless acoustic sensor networks
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1299945

Citazioni

ND

0

ND

social impact