Reliable tools for automatic genre classification (AGC) are highly sought-after by the television industry for the promise of saving the cost of manually annotating the ever-growing catalogs of media content providers. Metadata are indeed vital for a variety of tasks, including data analytics, database navigation, and recommender systems. In recent years, however, only a few works have focused on TV genre classification, possibly due to the lack of publicly available datasets of broadcast media. To bridge this gap and foster future research, we present ITTV, a manually annotated dataset of Italian TV programs gathered on YouTube. From this, we propose a novel AGC method based on deep audio features that rely on the well-established "Look, Listen and Learn" paradigm. Evaluated on ITTV, the proposed method is shown to provide state-of-the-art results, outperforming recent audio-based AGC methods.
Automatic TV genre classification based on visually-conditioned deep audio features
A. I. Mezza;A. Sarti
2023-01-01
Abstract
Reliable tools for automatic genre classification (AGC) are highly sought-after by the television industry for the promise of saving the cost of manually annotating the ever-growing catalogs of media content providers. Metadata are indeed vital for a variety of tasks, including data analytics, database navigation, and recommender systems. In recent years, however, only a few works have focused on TV genre classification, possibly due to the lack of publicly available datasets of broadcast media. To bridge this gap and foster future research, we present ITTV, a manually annotated dataset of Italian TV programs gathered on YouTube. From this, we propose a novel AGC method based on deep audio features that rely on the well-established "Look, Listen and Learn" paradigm. Evaluated on ITTV, the proposed method is shown to provide state-of-the-art results, outperforming recent audio-based AGC methods.File | Dimensione | Formato | |
---|---|---|---|
20230612021425_821935_1253.pdf
Accesso riservato
Descrizione: Automatic TV Genre Classification Based on Visually-Conditioned Deep Audio Features
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
583.36 kB
Formato
Adobe PDF
|
583.36 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.