Recent advances in immersive media and virtual environments have highlighted the crucial role of spatial audio in enhancing perceptual realism, enabling true six-degrees-of-freedom experiences in applications such as virtual reality, augmented reality, and advanced teleconferencing. Nonetheless, accurately reconstructing the direct sound, diffuse field, and early reflections, crucial for spatial depth and realism, remains challenging, especially with a limited number of measurements. To address this challenge, we introduce Parametric modeling of Direct sound, Early reflections, and Reverberation (ParaDER), a unified parametric framework that explicitly separates and reconstructs direct sound, early reflections, and diffuse reverberation from a minimal set of measurements. In the first stage, sound sources are localized by solving a sparse, regularized optimization problem that yields low-order spherical-harmonic coefficients and thus captures the direct component of the field. The second stage follows image-source theory: the estimated room impulse response is segmented into a small number of early reflections, each of which is modeled as an image source whose position and amplitude are fitted to the segmented data. This explicit treatment preserves the temporal and spatial characteristics of early reflections, which are critical for accurate depth perception and localization cues. In the final stage, the estimated direct and early reflection components are analytically propagated to virtual microphone positions, and the remaining energy is synthesized as diffuse reverberation under an isotropic assumption. Because the entire pipeline is low-order, comprising only a few source and image-source parameters, ParaDER can reconstruct a spatial sound field with few physical microphones, reducing memory and computation compared to both other parametric methods and non-parametric approaches. Extensive evaluations in 100 simulated shoebox rooms confirm that ParaDER markedly improves reconstruction accuracy. When compared to a state of the art parametric model, the normalized mean-squared error of the acoustic metrics shows that the estimate of the early reflections improves the accuracy. Also, subjective listening tests on a real conference room dataset, show that our method yields higher mean MUSHRA scores for both speech and music. Listeners consistently report clearer spatial cues, more precise localization, and a more natural timbre, demonstrating that explicit modeling of early reflections confers perceptually significant benefits.
Parametric virtual microphone techniques for sound field reconstruction with early reflection modeling
Greco, Gioele;Pezzoli, Mirco;Antonacci, Fabio;Sarti, Augusto
2025-01-01
Abstract
Recent advances in immersive media and virtual environments have highlighted the crucial role of spatial audio in enhancing perceptual realism, enabling true six-degrees-of-freedom experiences in applications such as virtual reality, augmented reality, and advanced teleconferencing. Nonetheless, accurately reconstructing the direct sound, diffuse field, and early reflections, crucial for spatial depth and realism, remains challenging, especially with a limited number of measurements. To address this challenge, we introduce Parametric modeling of Direct sound, Early reflections, and Reverberation (ParaDER), a unified parametric framework that explicitly separates and reconstructs direct sound, early reflections, and diffuse reverberation from a minimal set of measurements. In the first stage, sound sources are localized by solving a sparse, regularized optimization problem that yields low-order spherical-harmonic coefficients and thus captures the direct component of the field. The second stage follows image-source theory: the estimated room impulse response is segmented into a small number of early reflections, each of which is modeled as an image source whose position and amplitude are fitted to the segmented data. This explicit treatment preserves the temporal and spatial characteristics of early reflections, which are critical for accurate depth perception and localization cues. In the final stage, the estimated direct and early reflection components are analytically propagated to virtual microphone positions, and the remaining energy is synthesized as diffuse reverberation under an isotropic assumption. Because the entire pipeline is low-order, comprising only a few source and image-source parameters, ParaDER can reconstruct a spatial sound field with few physical microphones, reducing memory and computation compared to both other parametric methods and non-parametric approaches. Extensive evaluations in 100 simulated shoebox rooms confirm that ParaDER markedly improves reconstruction accuracy. When compared to a state of the art parametric model, the normalized mean-squared error of the acoustic metrics shows that the estimate of the early reflections improves the accuracy. Also, subjective listening tests on a real conference room dataset, show that our method yields higher mean MUSHRA scores for both speech and music. Listeners consistently report clearer spatial cues, more precise localization, and a more natural timbre, demonstrating that explicit modeling of early reflections confers perceptually significant benefits.| File | Dimensione | Formato | |
|---|---|---|---|
|
s13636-025-00437-y_reference.pdf
Accesso riservato
:
Pre-Print (o Pre-Refereeing)
Dimensione
2.86 MB
Formato
Adobe PDF
|
2.86 MB | Adobe PDF | Visualizza/Apri |
|
s13636-025-00437-y.pdf
accesso aperto
Descrizione: Article
:
Publisher’s version
Dimensione
4.01 MB
Formato
Adobe PDF
|
4.01 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


