In this paper, Sound Source Localization (SSL) is explored as an approach to localize both human operators and machines emitting sound signals in a manufacturing workplace. In particular, a comprehensive analysis of the source localization ability of a state-of-the-art deep learning architecture in environments of increasing complexity is presented. Scenarios including single, dual, and multiple sound sources, in the form of both human and Computerized Numerical Control (CNC) machines, are investigated, as well as configurations with a mix of stationary and moving sources. Our work contributes to the extant literature by enriching previous research findings primarily devoted to single stationary sources. Furthermore, by focusing on the simultaneous and centralized detection of sources of different nature and type, it diverges from traditional SSL studies in manufacturing, which emphasize the localization of humans by robots in human–robot interaction, and presents a localization approach which enables a broader control over the workspace. For the localization task, a Convolutional LSTM architecture able to capture both spatial and temporal sound characteristics is also proposed, with each source assigned a dedicated model. Extensive experiments were carried out for each scenario in a simulated environment, where different levels of noise were also applied. The results showed the remarkable accuracy and robustness of the deep learning models when it comes to localizing single and dual stationary sources, as well as single moving sources. For multiple stationary and moving sources a general decline in the detection performance was observed, alongside a heightened sensitivity to noise.
ConvLSTM-based Sound Source Localization in a manufacturing workplace
Jalayer R.;Jalayer M.;Mor A.;Orsenigo C.;Vercellis C.
2024-01-01
Abstract
In this paper, Sound Source Localization (SSL) is explored as an approach to localize both human operators and machines emitting sound signals in a manufacturing workplace. In particular, a comprehensive analysis of the source localization ability of a state-of-the-art deep learning architecture in environments of increasing complexity is presented. Scenarios including single, dual, and multiple sound sources, in the form of both human and Computerized Numerical Control (CNC) machines, are investigated, as well as configurations with a mix of stationary and moving sources. Our work contributes to the extant literature by enriching previous research findings primarily devoted to single stationary sources. Furthermore, by focusing on the simultaneous and centralized detection of sources of different nature and type, it diverges from traditional SSL studies in manufacturing, which emphasize the localization of humans by robots in human–robot interaction, and presents a localization approach which enables a broader control over the workspace. For the localization task, a Convolutional LSTM architecture able to capture both spatial and temporal sound characteristics is also proposed, with each source assigned a dedicated model. Extensive experiments were carried out for each scenario in a simulated environment, where different levels of noise were also applied. The results showed the remarkable accuracy and robustness of the deep learning models when it comes to localizing single and dual stationary sources, as well as single moving sources. For multiple stationary and moving sources a general decline in the detection performance was observed, alongside a heightened sensitivity to noise.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.