Methodological Issues in Recommender Systems Research (Extended Abstract)

Ferrari Dacrema, Maurizio; Cremonesi, Paolo; Jannach, Dietmar

doi:10.24963/ijcai.2020/650

The development of continuously improved machine learning algorithms for personalized item ranking lies at the core of today's research in the area of recommender systems. Over the years, the research community has developed widely-agreed best practices for comparing algorithms and demonstrating progress with offline experiments. Unfortunately, we find this accepted research practice can easily lead to phantom progress due to the following reasons: limited reproducibility, comparison with complex but weak and non-optimized baseline algorithms, over-generalization from a small set of experimental configurations. To assess the extent of such problems, we analyzed 18 research papers published recently at top-ranked conferences. Only 7 were reproducible with reasonable effort, and 6 of them could often be outperformed by relatively simple heuristic methods, e.g., nearest neighbors. In this paper, we discuss these observations in detail, and reflect on the related fundamental problem of over-reliance on offline experiments in recommender systems research.