Statistical analyses offered by standard Genome-Wide Association Studies (GWAS) practices still suffer from the missing heritability problem: genetic variants, when considered individually, are not directly associated with the aetiology of the disease. In the spectrum of complex diseases, the genetic component is one among many factors; yet, it could unravel key biological knowledge that could open further medical investigation. Machine learning (ML) models are able to offer a non-hypothesis-driven pattern discovery, which could solve the lack of non-linearity modeling characterizing GWAS analyses. Still, the high-dimensional genomics features involved in these analyses are not suitable for ML due to the curse of dimensionality, requiring feature selection techniques to reduce the feature space. This study aims at a preliminary benchmark of the supervised Relevance-Redundancy (ReRa) feature selection method using a public genotype dataset of Parkinson’s disease patients. Obtained results demonstrated that the combination of Fisher exact test and ReRa emerged as the most performing feature selection method among the tested ones.
Supervised Relevance-Redundancy assessment for feature selection in high-dimensional genotype data.
Tome' S;Cascianelli S;Masseroli M
2024-01-01
Abstract
Statistical analyses offered by standard Genome-Wide Association Studies (GWAS) practices still suffer from the missing heritability problem: genetic variants, when considered individually, are not directly associated with the aetiology of the disease. In the spectrum of complex diseases, the genetic component is one among many factors; yet, it could unravel key biological knowledge that could open further medical investigation. Machine learning (ML) models are able to offer a non-hypothesis-driven pattern discovery, which could solve the lack of non-linearity modeling characterizing GWAS analyses. Still, the high-dimensional genomics features involved in these analyses are not suitable for ML due to the curse of dimensionality, requiring feature selection techniques to reduce the feature space. This study aims at a preliminary benchmark of the supervised Relevance-Redundancy (ReRa) feature selection method using a public genotype dataset of Parkinson’s disease patients. Obtained results demonstrated that the combination of Fisher exact test and ReRa emerged as the most performing feature selection method among the tested ones.File | Dimensione | Formato | |
---|---|---|---|
CIBB_2024_paper_41.pdf
Accesso riservato
:
Publisher’s version
Dimensione
281.64 kB
Formato
Adobe PDF
|
281.64 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.