Recent studies have applied Denoising Diffusion Probabilistic Models (DDPMs) to recommender systems, reporting notable improvements. However, several reproducibility studies have shown that claims asserting the superiority of new methods are frequently not substantiated by rigorous evidence, as they often rely on non-reproducible experimental protocols, weak or untuned baselines, and questionable evaluation practices. This extended abstract presents key findings from the manuscript “Diffusion Recommender Models and the Illusion of Progress: A Concerning Study of Reproducibility and a Conceptual Mismatch” which investigates whether the reported advancements of diffusion-based models in recommendation are supported by rigorous and reproducible experimental evaluation. The study re-executes the experiments of four DDPM-based models presented at SIGIR 2023 and 2024, revealing substantial methodological issues and limited reproducibility. In addition, it highlights a conceptual mismatch between the generative nature of DDPMs and the deterministic requirements of offline evaluation, underscoring the need for a reconsideration of evaluation procedures for generative models.

Diffusion Models for Recommendation: Reproducibility and Conceptual Mismatch

Benigni M.;Ferrari Dacrema M.;
2025-01-01

Abstract

Recent studies have applied Denoising Diffusion Probabilistic Models (DDPMs) to recommender systems, reporting notable improvements. However, several reproducibility studies have shown that claims asserting the superiority of new methods are frequently not substantiated by rigorous evidence, as they often rely on non-reproducible experimental protocols, weak or untuned baselines, and questionable evaluation practices. This extended abstract presents key findings from the manuscript “Diffusion Recommender Models and the Illusion of Progress: A Concerning Study of Reproducibility and a Conceptual Mismatch” which investigates whether the reported advancements of diffusion-based models in recommendation are supported by rigorous and reproducible experimental evaluation. The study re-executes the experiments of four DDPM-based models presented at SIGIR 2023 and 2024, revealing substantial methodological issues and limited reproducibility. In addition, it highlights a conceptual mismatch between the generative nature of DDPMs and the deterministic requirements of offline evaluation, underscoring the need for a reconsideration of evaluation procedures for generative models.
2025
CEUR Workshop Proceedings: 15th Italian Information Retrieval Workshop, IIR 2025
Diffusion Models
Evaluation
Recommender Systems
Reproducibility
File in questo prodotto:
File Dimensione Formato  
diffusion-models-for-recommendation-reproducibility-and-conceptual-mismatch.pdf

accesso aperto

: Publisher’s version
Dimensione 207.03 kB
Formato Adobe PDF
207.03 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1307673
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact