Statistical analyses offered by standard Genome-Wide Association Studies (GWAS) practices still suffer from the missing heritability problem: genetic variants, when considered individually, are not directly associated with the aetiology of the disease. In the spectrum of complex diseases, the genetic component is one among many factors; yet, it could unravel key biological knowledge that could open further medical investigation. Machine learning (ML) models are able to offer a non-hypothesis-driven pattern discovery, which could solve the lack of non-linearity modeling characterizing GWAS analyses. Still, the high-dimensional genomics features involved in these analyses are not suitable for ML due to the curse of dimensionality, requiring feature selection techniques to reduce the feature space. This study aims at a preliminary benchmark of the supervised Relevance-Redundancy (ReRa) feature selection method using a public genotype dataset of Parkinson’s disease patients. Obtained results demonstrated that the combination of Fisher exact test and ReRa emerged as the most performing feature selection method among the tested ones.

Supervised Relevance-Redundancy assessment for feature selection in high-dimensional genotype data.

Tome' S;Cascianelli S;Masseroli M
2024-01-01

Abstract

Statistical analyses offered by standard Genome-Wide Association Studies (GWAS) practices still suffer from the missing heritability problem: genetic variants, when considered individually, are not directly associated with the aetiology of the disease. In the spectrum of complex diseases, the genetic component is one among many factors; yet, it could unravel key biological knowledge that could open further medical investigation. Machine learning (ML) models are able to offer a non-hypothesis-driven pattern discovery, which could solve the lack of non-linearity modeling characterizing GWAS analyses. Still, the high-dimensional genomics features involved in these analyses are not suitable for ML due to the curse of dimensionality, requiring feature selection techniques to reduce the feature space. This study aims at a preliminary benchmark of the supervised Relevance-Redundancy (ReRa) feature selection method using a public genotype dataset of Parkinson’s disease patients. Obtained results demonstrated that the combination of Fisher exact test and ReRa emerged as the most performing feature selection method among the tested ones.
2024
Proceedings of the 19th International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics
machine learning, feature selection, genotype, supervised learning
File in questo prodotto:
File Dimensione Formato  
CIBB_2024_paper_41.pdf

Accesso riservato

: Publisher’s version
Dimensione 281.64 kB
Formato Adobe PDF
281.64 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1273683
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact