The use of deep learning for sound event localization and classification with Wireless Acoustic Sensor Networks (WASNs) is an emerging research area. However, current methods for sound event localization and classification exhibit limitations in perceiving extensive soundscapes. They are typically effective only for a fraction of the soundscapes and do not fully exploit the surrounding information. Moreover, in outdoor settings, the performance accuracy is susceptible to the adverse effects of signal attenuation and environmental noise. In this paper, we propose a deep learning-based method that integrates frequency, temporal, and spatial domain features with attention mechanisms to estimate the location and the class of sound sources using a WASN in an outdoor setting. We introduce soundmap features to capture spatial information across multiple frequency bands and time frames. Furthermore, we integrate attention mechanisms to learn channel-wise relationships and temporal dependencies within acoustic features. To evaluate the proposed method, we conduct experiments using simulated datasets with different levels of noise and sizes of the monitoring area, as well as different array and source positions. Moreover, we conduct a real-world experiment in an outdoor environment with dimensions of 100 m × 80 m. The experimental results demonstrate the superiority of the proposed method over state-of-the-art methods in both sound event classification and sound source localization tasks.
Sound Event Localization and Classification using Wireless Acoustic Sensor Networks in Outdoor Environments
Bernardini, Alberto
2025-01-01
Abstract
The use of deep learning for sound event localization and classification with Wireless Acoustic Sensor Networks (WASNs) is an emerging research area. However, current methods for sound event localization and classification exhibit limitations in perceiving extensive soundscapes. They are typically effective only for a fraction of the soundscapes and do not fully exploit the surrounding information. Moreover, in outdoor settings, the performance accuracy is susceptible to the adverse effects of signal attenuation and environmental noise. In this paper, we propose a deep learning-based method that integrates frequency, temporal, and spatial domain features with attention mechanisms to estimate the location and the class of sound sources using a WASN in an outdoor setting. We introduce soundmap features to capture spatial information across multiple frequency bands and time frames. Furthermore, we integrate attention mechanisms to learn channel-wise relationships and temporal dependencies within acoustic features. To evaluate the proposed method, we conduct experiments using simulated datasets with different levels of noise and sizes of the monitoring area, as well as different array and source positions. Moreover, we conduct a real-world experiment in an outdoor environment with dimensions of 100 m × 80 m. The experimental results demonstrate the superiority of the proposed method over state-of-the-art methods in both sound event classification and sound source localization tasks.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


