Enhancing building energy performance has become an essential goal, particularly as building energy management systems (BEMSs) increasingly rely on high-quality data and reliable predictive models. Although machine learning (ML) models have been widely applied to building energy prediction, optimisation, and management, their reliability in practice is often constrained by data preprocessing rather than algorithm selection. Existing studies often emphasise algorithmic development while providing limited systematic investigation of preprocessing practices, leading to methodological misconceptions and reduced robustness of ML-driven building energy management. As a novel contribution, this article presents a systematic review of 73 scientific articles published from 2020 to 2025 in the field of preprocessing practices. To this goal, a three-step data preprocessing workflow is organised, comprising data analysis, data preparation, and feature engineering. The strengths, limitations, and recurring misconceptions of preprocessing techniques adopted in the analysed studies are synthesised, with emphasis on their impact on prediction accuracy, interpretability, and model robustness. As a result, this review reframes the data preprocessing stage as a decision-making process in which data analysis and the energy improvement task constrain and inform subsequent data preparation and feature engineering steps to address building energy performance enhancement tasks.
Data Preprocessing Techniques for Machine Learning Towards Improving Building Energy Performance: A Systematic Review
Mu W.;Cardelli R.;Ferrari S.
2026-01-01
Abstract
Enhancing building energy performance has become an essential goal, particularly as building energy management systems (BEMSs) increasingly rely on high-quality data and reliable predictive models. Although machine learning (ML) models have been widely applied to building energy prediction, optimisation, and management, their reliability in practice is often constrained by data preprocessing rather than algorithm selection. Existing studies often emphasise algorithmic development while providing limited systematic investigation of preprocessing practices, leading to methodological misconceptions and reduced robustness of ML-driven building energy management. As a novel contribution, this article presents a systematic review of 73 scientific articles published from 2020 to 2025 in the field of preprocessing practices. To this goal, a three-step data preprocessing workflow is organised, comprising data analysis, data preparation, and feature engineering. The strengths, limitations, and recurring misconceptions of preprocessing techniques adopted in the analysed studies are synthesised, with emphasis on their impact on prediction accuracy, interpretability, and model robustness. As a result, this review reframes the data preprocessing stage as a decision-making process in which data analysis and the energy improvement task constrain and inform subsequent data preparation and feature engineering steps to address building energy performance enhancement tasks.| File | Dimensione | Formato | |
|---|---|---|---|
|
energies-19-01561.pdf
accesso aperto
Dimensione
4.28 MB
Formato
Adobe PDF
|
4.28 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


