Speech audio acquisitions exhibit different quality and reverberation properties depending on the recording setup and environment. For this reason, it is expected that speech analysis systems that work correctly on certain audio recordings may fail on others acquired in different acoustic contexts. Therefore, to be able to tell whether a track under analysis shares the same acoustic characteristics of a reference one may be useful to understand if it can be successfully processed by a given speech analysis system. Alternatively, in a forensic scenario, an estimate of acoustic parameter similarity between two tracks can be used to verify whether the recordings have been likely acquired in the same environment or not. In this work, we propose two methods to estimate acoustic parameter similarity between a speech recording under analysis and a reference one. The first method relies on the estimation of channel-based acoustic indicators that are then compared to extract a similarity measure. The second method directly learns a parameter similarity measure through siamese neural networks.
A DATA-DRIVEN APPROACH FOR ACOUSTIC PARAMETER SIMILARITY ESTIMATION OF SPEECH RECORDING
Borrelli C.;Bestagini P.;Antonacci F.;Sarti A.;Tubaro S.
2022-01-01
Abstract
Speech audio acquisitions exhibit different quality and reverberation properties depending on the recording setup and environment. For this reason, it is expected that speech analysis systems that work correctly on certain audio recordings may fail on others acquired in different acoustic contexts. Therefore, to be able to tell whether a track under analysis shares the same acoustic characteristics of a reference one may be useful to understand if it can be successfully processed by a given speech analysis system. Alternatively, in a forensic scenario, an estimate of acoustic parameter similarity between two tracks can be used to verify whether the recordings have been likely acquired in the same environment or not. In this work, we propose two methods to estimate acoustic parameter similarity between a speech recording under analysis and a reference one. The first method relies on the estimation of channel-based acoustic indicators that are then compared to extract a similarity measure. The second method directly learns a parameter similarity measure through siamese neural networks.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.