In recent years, numerous techniques to manipulate multimedia data and generate hyper-realistic synthetic content have been presented. These inauthentic data are hazardous as they can lead to numerous threats and dangers when misused. This has led the forensic community to propose multiple approaches to tackle both detection and attribution problems. Solving the detection problem consists in determining whether some given data is genuine or false. Solving the attribution problem consists in determining which specific technique has been used to manipulate or generate the observed data. In this paper we address the attribution problem on synthetic speech. We consider a set of methods initially proposed for synthetic speech detection, and adapt them to identify which speech generation algorithm has been used to synthesize a speech track. Our goal is to sample the versatility of these systems and verify how far the detection and attribution tasks are from each other. We test the models in a closed-set scenario and compare their performance with that of a well-established baseline. Moreover, we propose different solutions to address the task in an open-set situation. The encouraging results show that the considered methods can provide a representation of the input signal that is meaningful for both detection and attribution.

Exploring the Synthetic Speech Attribution Problem Through Data-Driven Detectors

Salvi D.;Bestagini P.;Tubaro S.
2022-01-01

Abstract

In recent years, numerous techniques to manipulate multimedia data and generate hyper-realistic synthetic content have been presented. These inauthentic data are hazardous as they can lead to numerous threats and dangers when misused. This has led the forensic community to propose multiple approaches to tackle both detection and attribution problems. Solving the detection problem consists in determining whether some given data is genuine or false. Solving the attribution problem consists in determining which specific technique has been used to manipulate or generate the observed data. In this paper we address the attribution problem on synthetic speech. We consider a set of methods initially proposed for synthetic speech detection, and adapt them to identify which speech generation algorithm has been used to synthesize a speech track. Our goal is to sample the versatility of these systems and verify how far the detection and attribution tasks are from each other. We test the models in a closed-set scenario and compare their performance with that of a well-established baseline. Moreover, we propose different solutions to address the task in an open-set situation. The encouraging results show that the considered methods can provide a representation of the input signal that is meaningful for both detection and attribution.
2022
2022 IEEE International Workshop on Information Forensics and Security, WIFS 2022
979-8-3503-0967-6
Attribution
Audio
Deepfake
Forensics
Speech
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1233398
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact