In gene expression analysis, the high dimensionality and limited sample size often lead to instability and overfitting of predictive models. While feature selection algorithms are commonly used to identify the most predictive genes, traditional approaches tend to focus solely on quantitative contributions, which can limit the discovery of deeper biological insights. To address this, we propose a novel wrapper-based approach that integrates prior biological knowledge into the feature selection process. Our approach extends standard forward feature selection by iteratively adding the most promising gene while ensuring it provides biological value, computed from prior knowledge derived from publicly available data sources. Additionally, we apply the same concept to backward selection, iteratively removing features that contribute the least to the predictive performance while providing limited additional biological information.

Forward and Backward Feature Selection Guided by Prior Biological Knowledge for Enhanced Interpretability

Mongardi, Sofia;Cascianelli, Silvia;Masseroli, Marco
2025-01-01

Abstract

In gene expression analysis, the high dimensionality and limited sample size often lead to instability and overfitting of predictive models. While feature selection algorithms are commonly used to identify the most predictive genes, traditional approaches tend to focus solely on quantitative contributions, which can limit the discovery of deeper biological insights. To address this, we propose a novel wrapper-based approach that integrates prior biological knowledge into the feature selection process. Our approach extends standard forward feature selection by iteratively adding the most promising gene while ensuring it provides biological value, computed from prior knowledge derived from publicly available data sources. Additionally, we apply the same concept to backward selection, iteratively removing features that contribute the least to the predictive performance while providing limited additional biological information.
2025
Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2024
9783031897030
9783031897047
biological interpretability
classification
feature selection
genomics
Machine learning
File in questo prodotto:
File Dimensione Formato  
C50_CIBB_2024_LNBI_2025_233-247.pdf

Accesso riservato

: Publisher’s version
Dimensione 638.22 kB
Formato Adobe PDF
638.22 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1310282
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact