Feature selection techniques are widely used in prediction tasks based on gene- expression data to reduce the high dimensionality of the original feature space and understand the predictive importance and biological meaning of each feature. LASSO and equivalent em- bedded techniques select small subsets of relevant features based solely on their quantitative contribution and predictive power, which can lead to the selection of features with limited bio- logical relevance. In this work, first we provide a wide exploratory analysis of LASSO feature selection to evaluate the effects of different hyper-parameters on the selection of predictive fea- ture subsets and on their corresponding biological significance. Then, we propose a preliminary approach to guide LASSO in the selection of the features, by considering their biological rele- vance besides their predictive power. To this aim, we define a novel Gene Information Score to estimate each gene biological relevance, and show its use to enhance the feature selection.

Biologically-driven feature selection for improved functional interpretability of gene expression data analysis

S. Mongardi;S. Cascianelli;M. Masseroli
2023-01-01

Abstract

Feature selection techniques are widely used in prediction tasks based on gene- expression data to reduce the high dimensionality of the original feature space and understand the predictive importance and biological meaning of each feature. LASSO and equivalent em- bedded techniques select small subsets of relevant features based solely on their quantitative contribution and predictive power, which can lead to the selection of features with limited bio- logical relevance. In this work, first we provide a wide exploratory analysis of LASSO feature selection to evaluate the effects of different hyper-parameters on the selection of predictive fea- ture subsets and on their corresponding biological significance. Then, we propose a preliminary approach to guide LASSO in the selection of the features, by considering their biological rele- vance besides their predictive power. To this aim, we define a novel Gene Information Score to estimate each gene biological relevance, and show its use to enhance the feature selection.
2023
Proceedings of the 18th International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2023
Machine learning
Feature selection
Biological interpretability
Genomics
Classification
File in questo prodotto:
File Dimensione Formato  
Camera_ready_CIBB_2023_Short_Paper.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 208.67 kB
Formato Adobe PDF
208.67 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1249557
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact