Feature selection techniques are widely used in prediction tasks based on gene- expression data to reduce the high dimensionality of the original feature space and understand the predictive importance and biological meaning of each feature. LASSO and equivalent em- bedded techniques select small subsets of relevant features based solely on their quantitative contribution and predictive power, which can lead to the selection of features with limited bio- logical relevance. In this work, first we provide a wide exploratory analysis of LASSO feature selection to evaluate the effects of different hyper-parameters on the selection of predictive fea- ture subsets and on their corresponding biological significance. Then, we propose a preliminary approach to guide LASSO in the selection of the features, by considering their biological rele- vance besides their predictive power. To this aim, we define a novel Gene Information Score to estimate each gene biological relevance, and show its use to enhance the feature selection.
Biologically-driven feature selection for improved functional interpretability of gene expression data analysis
S. Mongardi;S. Cascianelli;M. Masseroli
2023-01-01
Abstract
Feature selection techniques are widely used in prediction tasks based on gene- expression data to reduce the high dimensionality of the original feature space and understand the predictive importance and biological meaning of each feature. LASSO and equivalent em- bedded techniques select small subsets of relevant features based solely on their quantitative contribution and predictive power, which can lead to the selection of features with limited bio- logical relevance. In this work, first we provide a wide exploratory analysis of LASSO feature selection to evaluate the effects of different hyper-parameters on the selection of predictive fea- ture subsets and on their corresponding biological significance. Then, we propose a preliminary approach to guide LASSO in the selection of the features, by considering their biological rele- vance besides their predictive power. To this aim, we define a novel Gene Information Score to estimate each gene biological relevance, and show its use to enhance the feature selection.File | Dimensione | Formato | |
---|---|---|---|
Camera_ready_CIBB_2023_Short_Paper.pdf
Accesso riservato
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
208.67 kB
Formato
Adobe PDF
|
208.67 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.