In this article we present a methodology for source localization in reverberant environments from Generalized Cross Correlations (GCCs) computed between spatially distributed individual microphones. Reverberation tends to negatively affect localization based on Time Differences of Arrival (TDOAs), which become inaccurate due to the presence of spurious peaks in the GCC. We therefore adopt a data-driven approach based on a convolutional neural network, which, using the GCCs as input, estimates the source location in two steps. It first computes the Ray Space Transform (RST) from multiple arrays. The RST is a convenient representation of the acoustic rays impinging on the array in a parametric space, called Ray Space. Rays produced by a source are visualized in the RST as patterns, whose position is uniquely related to the source location. The second step consists of estimating the source location through a nonlinear fitting, which estimates the coordinates that best approximate the RST pattern obtained through the first step. It is worth noting that training can be accomplished on simulated data only, thus relaxing the need of actually deploying microphone arrays in the acoustic scene. The localization accuracy of the proposed techniques is similar to the one of SRP-PHAT, however our method demonstrates an increased robustness regarding different distributed microphones configurations. Moreover, the use of the RST as an intermediate representation makes it possible for the network to generalize to data unseen during training.
Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform
Comanducci L.;Borra F.;Bestagini P.;Antonacci F.;Tubaro S.;Sarti A.
2020-01-01
Abstract
In this article we present a methodology for source localization in reverberant environments from Generalized Cross Correlations (GCCs) computed between spatially distributed individual microphones. Reverberation tends to negatively affect localization based on Time Differences of Arrival (TDOAs), which become inaccurate due to the presence of spurious peaks in the GCC. We therefore adopt a data-driven approach based on a convolutional neural network, which, using the GCCs as input, estimates the source location in two steps. It first computes the Ray Space Transform (RST) from multiple arrays. The RST is a convenient representation of the acoustic rays impinging on the array in a parametric space, called Ray Space. Rays produced by a source are visualized in the RST as patterns, whose position is uniquely related to the source location. The second step consists of estimating the source location through a nonlinear fitting, which estimates the coordinates that best approximate the RST pattern obtained through the first step. It is worth noting that training can be accomplished on simulated data only, thus relaxing the need of actually deploying microphone arrays in the acoustic scene. The localization accuracy of the proposed techniques is similar to the one of SRP-PHAT, however our method demonstrates an increased robustness regarding different distributed microphones configurations. Moreover, the use of the RST as an intermediate representation makes it possible for the network to generalize to data unseen during training.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.