RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Recent advances in immersive media and virtual environments have highlighted the crucial role of spatial audio in enhancing perceptual realism, enabling true six-degrees-of-freedom experiences in applications such as virtual reality, augmented reality, and advanced teleconferencing. Nonetheless, accurately reconstructing the direct sound, diffuse field, and early reflections, crucial for spatial depth and realism, remains challenging, especially with a limited number of measurements. To address this challenge, we introduce Parametric modeling of Direct sound, Early reflections, and Reverberation (ParaDER), a unified parametric framework that explicitly separates and reconstructs direct sound, early reflections, and diffuse reverberation from a minimal set of measurements. In the first stage, sound sources are localized by solving a sparse, regularized optimization problem that yields low-order spherical-harmonic coefficients and thus captures the direct component of the field. The second stage follows image-source theory: the estimated room impulse response is segmented into a small number of early reflections, each of which is modeled as an image source whose position and amplitude are fitted to the segmented data. This explicit treatment preserves the temporal and spatial characteristics of early reflections, which are critical for accurate depth perception and localization cues. In the final stage, the estimated direct and early reflection components are analytically propagated to virtual microphone positions, and the remaining energy is synthesized as diffuse reverberation under an isotropic assumption. Because the entire pipeline is low-order, comprising only a few source and image-source parameters, ParaDER can reconstruct a spatial sound field with few physical microphones, reducing memory and computation compared to both other parametric methods and non-parametric approaches. Extensive evaluations in 100 simulated shoebox rooms confirm that ParaDER markedly improves reconstruction accuracy. When compared to a state of the art parametric model, the normalized mean-squared error of the acoustic metrics shows that the estimate of the early reflections improves the accuracy. Also, subjective listening tests on a real conference room dataset, show that our method yields higher mean MUSHRA scores for both speech and music. Listeners consistently report clearer spatial cues, more precise localization, and a more natural timbre, demonstrating that explicit modeling of early reflections confers perceptually significant benefits.

Parametric virtual microphone techniques for sound field reconstruction with early reflection modeling

Greco, Gioele;Pezzoli, Mirco;Antonacci, Fabio;Sarti, Augusto

2025-01-01

Abstract

Recent advances in immersive media and virtual environments have highlighted the crucial role of spatial audio in enhancing perceptual realism, enabling true six-degrees-of-freedom experiences in applications such as virtual reality, augmented reality, and advanced teleconferencing. Nonetheless, accurately reconstructing the direct sound, diffuse field, and early reflections, crucial for spatial depth and realism, remains challenging, especially with a limited number of measurements. To address this challenge, we introduce Parametric modeling of Direct sound, Early reflections, and Reverberation (ParaDER), a unified parametric framework that explicitly separates and reconstructs direct sound, early reflections, and diffuse reverberation from a minimal set of measurements. In the first stage, sound sources are localized by solving a sparse, regularized optimization problem that yields low-order spherical-harmonic coefficients and thus captures the direct component of the field. The second stage follows image-source theory: the estimated room impulse response is segmented into a small number of early reflections, each of which is modeled as an image source whose position and amplitude are fitted to the segmented data. This explicit treatment preserves the temporal and spatial characteristics of early reflections, which are critical for accurate depth perception and localization cues. In the final stage, the estimated direct and early reflection components are analytically propagated to virtual microphone positions, and the remaining energy is synthesized as diffuse reverberation under an isotropic assumption. Because the entire pipeline is low-order, comprising only a few source and image-source parameters, ParaDER can reconstruct a spatial sound field with few physical microphones, reducing memory and computation compared to both other parametric methods and non-parametric approaches. Extensive evaluations in 100 simulated shoebox rooms confirm that ParaDER markedly improves reconstruction accuracy. When compared to a state of the art parametric model, the normalized mean-squared error of the acoustic metrics shows that the estimate of the early reflections improves the accuracy. Also, subjective listening tests on a real conference room dataset, show that our method yields higher mean MUSHRA scores for both speech and music. Listeners consistently report clearer spatial cues, more precise localization, and a more natural timbre, demonstrating that explicit modeling of early reflections confers perceptually significant benefits.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo della rivista
	
				EURASIP JOURNAL ON AUDIO, SPEECH AND MUSIC PROCESSING
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
s13636-025-00437-y_reference.pdf Accesso riservato : Pre-Print (o Pre-Refereeing) Dimensione 2.86 MB Formato Adobe PDF Visualizza/Apri	2.86 MB	Adobe PDF	Visualizza/Apri
s13636-025-00437-y.pdf accesso aperto Descrizione: Article : Publisher’s version Dimensione 4.01 MB Formato Adobe PDF Visualizza/Apri	4.01 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1301661

Citazioni

ND

1

1

ND

social impact