Enhancing building energy performance has become an essential goal, particularly as building energy management systems (BEMSs) increasingly rely on high-quality data and reliable predictive models. Although machine learning (ML) models have been widely applied to building energy prediction, optimisation, and management, their reliability in practice is often constrained by data preprocessing rather than algorithm selection. Existing studies often emphasise algorithmic development while providing limited systematic investigation of preprocessing practices, leading to methodological misconceptions and reduced robustness of ML-driven building energy management. As a novel contribution, this article presents a systematic review of 73 scientific articles published from 2020 to 2025 in the field of preprocessing practices. To this goal, a three-step data preprocessing workflow is organised, comprising data analysis, data preparation, and feature engineering. The strengths, limitations, and recurring misconceptions of preprocessing techniques adopted in the analysed studies are synthesised, with emphasis on their impact on prediction accuracy, interpretability, and model robustness. As a result, this review reframes the data preprocessing stage as a decision-making process in which data analysis and the energy improvement task constrain and inform subsequent data preparation and feature engineering steps to address building energy performance enhancement tasks.

Data Preprocessing Techniques for Machine Learning Towards Improving Building Energy Performance: A Systematic Review

Mu W.;Cardelli R.;Ferrari S.
2026-01-01

Abstract

Enhancing building energy performance has become an essential goal, particularly as building energy management systems (BEMSs) increasingly rely on high-quality data and reliable predictive models. Although machine learning (ML) models have been widely applied to building energy prediction, optimisation, and management, their reliability in practice is often constrained by data preprocessing rather than algorithm selection. Existing studies often emphasise algorithmic development while providing limited systematic investigation of preprocessing practices, leading to methodological misconceptions and reduced robustness of ML-driven building energy management. As a novel contribution, this article presents a systematic review of 73 scientific articles published from 2020 to 2025 in the field of preprocessing practices. To this goal, a three-step data preprocessing workflow is organised, comprising data analysis, data preparation, and feature engineering. The strengths, limitations, and recurring misconceptions of preprocessing techniques adopted in the analysed studies are synthesised, with emphasis on their impact on prediction accuracy, interpretability, and model robustness. As a result, this review reframes the data preprocessing stage as a decision-making process in which data analysis and the energy improvement task constrain and inform subsequent data preparation and feature engineering steps to address building energy performance enhancement tasks.
2026
building energy management systems
building energy performance
data preprocessing
machine learning
systematic review
File in questo prodotto:
File Dimensione Formato  
energies-19-01561.pdf

accesso aperto

Dimensione 4.28 MB
Formato Adobe PDF
4.28 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1312639
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact