Reducing the high dimensionality of the original feature space through the use of feature selection algorithms is crucial in gene-expression-based predictive tasks to potentially improve performance and provide a better understanding of each feature’s power and biological meaning. Feature selection approaches like LASSO and other embedded techniques select small subsets of relevant features based solely on their quantitative contribution and predictive power, often leading to the selection of features with limited biological relevance. This work aims to provide a wide exploratory analysis of LASSO feature selection to assess the effects of different hyper-parameters on the selection of the most relevant features and their corresponding biological significance. Then, it introduces a new approach that can guide LASSO in the selection of the features by considering their predictive power as well as their biological relevance. With this intention, this work proposes a novel Gene Information Score to estimate each gene’s biological relevance and shows its use in enhancing the feature selection.

Enhancing Functional Interpretability in Gene Expression Analysis Through Biologically-Guided Feature Selection

Mongardi, Sofia;Cascianelli, Silvia;Masseroli, Marco
2025-01-01

Abstract

Reducing the high dimensionality of the original feature space through the use of feature selection algorithms is crucial in gene-expression-based predictive tasks to potentially improve performance and provide a better understanding of each feature’s power and biological meaning. Feature selection approaches like LASSO and other embedded techniques select small subsets of relevant features based solely on their quantitative contribution and predictive power, often leading to the selection of features with limited biological relevance. This work aims to provide a wide exploratory analysis of LASSO feature selection to assess the effects of different hyper-parameters on the selection of the most relevant features and their corresponding biological significance. Then, it introduces a new approach that can guide LASSO in the selection of the features by considering their predictive power as well as their biological relevance. With this intention, this work proposes a novel Gene Information Score to estimate each gene’s biological relevance and shows its use in enhancing the feature selection.
2025
Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2023
9783031907135
9783031907142
biological interpretability
classification
feature selection
genomics
Machine learning
File in questo prodotto:
File Dimensione Formato  
C39_CIBB_2023_LNBI_2025_293-307.pdf

Accesso riservato

: Publisher’s version
Dimensione 486.49 kB
Formato Adobe PDF
486.49 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1310284
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact