Reliable tools for automatic genre classification (AGC) are highly sought-after by the television industry for the promise of saving the cost of manually annotating the ever-growing catalogs of media content providers. Metadata are indeed vital for a variety of tasks, including data analytics, database navigation, and recommender systems. In recent years, however, only a few works have focused on TV genre classification, possibly due to the lack of publicly available datasets of broadcast media. To bridge this gap and foster future research, we present ITTV, a manually annotated dataset of Italian TV programs gathered on YouTube. From this, we propose a novel AGC method based on deep audio features that rely on the well-established "Look, Listen and Learn" paradigm. Evaluated on ITTV, the proposed method is shown to provide state-of-the-art results, outperforming recent audio-based AGC methods.

Automatic TV genre classification based on visually-conditioned deep audio features

A. I. Mezza;A. Sarti
2023-01-01

Abstract

Reliable tools for automatic genre classification (AGC) are highly sought-after by the television industry for the promise of saving the cost of manually annotating the ever-growing catalogs of media content providers. Metadata are indeed vital for a variety of tasks, including data analytics, database navigation, and recommender systems. In recent years, however, only a few works have focused on TV genre classification, possibly due to the lack of publicly available datasets of broadcast media. To bridge this gap and foster future research, we present ITTV, a manually annotated dataset of Italian TV programs gathered on YouTube. From this, we propose a novel AGC method based on deep audio features that rely on the well-established "Look, Listen and Learn" paradigm. Evaluated on ITTV, the proposed method is shown to provide state-of-the-art results, outperforming recent audio-based AGC methods.
2023
2023 31st European Signal Processing Conference (EUSIPCO)
9789464593600
File in questo prodotto:
File Dimensione Formato  
20230612021425_821935_1253.pdf

Accesso riservato

Descrizione: Automatic TV Genre Classification Based on Visually-Conditioned Deep Audio Features
: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 583.36 kB
Formato Adobe PDF
583.36 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1261010
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact