RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In this paper we present an effective approach which addresses the issue of speech/music discrimination. Our architecture focuses on the matter from the scope of improving the performance of a speech recognition system by excluding the processing of information which is not speech. Multiresolution analysis is applied to the input signal while the most significant statistical features are calculated over a predefined texture size. These characteristics are then modeled using a state of the art technique for probability density function estimation, Gaussian mixture models (GMM). A classification scheme consisting of a conventional maximum likelihood decision methodology constitutes the next step of our implementation. Despite the fact that our system is based solely on wavelet signal processing, it demonstrated very good performance achieving 91.8% recognition rate. © 2008 Springer-Verlag Berlin Heidelberg.

Speech/music discrimination based on discrete wavelet transform

NTALAMPIRAS, STAVROS;Fakotakis, Nikos

2008-01-01

Abstract

In this paper we present an effective approach which addresses the issue of speech/music discrimination. Our architecture focuses on the matter from the scope of improving the performance of a speech recognition system by excluding the processing of information which is not speech. Multiresolution analysis is applied to the input signal while the most significant statistical features are calculated over a predefined texture size. These characteristics are then modeled using a state of the art technique for probability density function estimation, Gaussian mixture models (GMM). A classification scheme consisting of a conventional maximum likelihood decision methodology constitutes the next step of our implementation. Despite the fact that our system is based solely on wavelet signal processing, it demonstrated very good performance achieving 91.8% recognition rate. © 2008 Springer-Verlag Berlin Heidelberg.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2008
			
	Titolo del libro
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	ISBN (International Standard Book Number)
	
				3540878807
3540878807
			
	Parole chiave
	
				Computer audition; Content-based audio classification; Discrete wavelet transform; Gaussian mixture model; Computer Science (all); Theoretical Computer Science
			
	Appare nelle tipologie:
	
				02.1 Contributo in Volume

File in questo prodotto:

File	Dimensione	Formato
03 SETN08.pdf Accesso riservato : Publisher’s version Dimensione 220.82 kB Formato Adobe PDF Visualizza/Apri	220.82 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1004326

Citazioni

ND

6

ND

ND

social impact