Text-to-music (TTM) models have recently revolutionized the automatic music generation research field, specifically by being able to generate music that sounds more plausible than all previous state-of-the-art models and by lowering the technical proficiency needed to use them. For these reasons, they have readily started to be adopted for commercial uses and music production practices. This widespread diffusion of TTMs poses several concerns regarding copyright violation and rightful attribution, posing the need of serious consideration of them by the audio forensics community. In this paper, we tackle the problem of detection and attribution of TTM-generated data. We propose a dataset, FakeMusicCaps, that contains several versions of the music-caption pairs dataset MusicCaps regenerated via several state-of-the-art TTM techniques. We evaluate the proposed dataset by performing initial experiments regarding the detection and attribution of TTM-generated audio considering both closed-set and open-set classification.

FakeMusicCaps: A Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models

Comanducci L.;Bestagini P.;Tubaro S.
2025-01-01

Abstract

Text-to-music (TTM) models have recently revolutionized the automatic music generation research field, specifically by being able to generate music that sounds more plausible than all previous state-of-the-art models and by lowering the technical proficiency needed to use them. For these reasons, they have readily started to be adopted for commercial uses and music production practices. This widespread diffusion of TTMs poses several concerns regarding copyright violation and rightful attribution, posing the need of serious consideration of them by the audio forensics community. In this paper, we tackle the problem of detection and attribution of TTM-generated data. We propose a dataset, FakeMusicCaps, that contains several versions of the music-caption pairs dataset MusicCaps regenerated via several state-of-the-art TTM techniques. We evaluate the proposed dataset by performing initial experiments regarding the detection and attribution of TTM-generated audio considering both closed-set and open-set classification.
2025
audio forensics
DeepFake
music generation
text-to-music
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1298826
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact