Statistical analyses offered by standard Genome-Wide Association Studies (GWAS) practices still suffer from the missing heritability problem: genetic variants, when considered individually, are not directly associated with the aetiology of the disease. In the spectrum of complex diseases, the genetic component is one among many factors; yet, it could unravel key biological knowledge that could open further medical investigation. Machine learning (ML) models are able to offer a non-hypothesis-driven pattern discovery, which could solve the lack of non-linearity modeling characterizing GWAS analyses. Still, the high-dimensional genomics features involved in these analyses are not suitable for ML due to the curse of dimensionality, requiring feature selection techniques to reduce the feature space. This study aims at a preliminary benchmark of the supervised Relevance-Redundancy (ReRa) feature selection method using a public genotype dataset of Parkinson’s disease patients. Obtained results demonstrated that the combination of Fisher exact test and ReRa emerged as the most performing feature selection method among the tested ones.

Benchmark Study on Supervised Relevance-Redundancy Assessment for Feature Selection in Genomic Data

Tome' S;Cascianelli S;Masseroli M
2025-01-01

Abstract

Statistical analyses offered by standard Genome-Wide Association Studies (GWAS) practices still suffer from the missing heritability problem: genetic variants, when considered individually, are not directly associated with the aetiology of the disease. In the spectrum of complex diseases, the genetic component is one among many factors; yet, it could unravel key biological knowledge that could open further medical investigation. Machine learning (ML) models are able to offer a non-hypothesis-driven pattern discovery, which could solve the lack of non-linearity modeling characterizing GWAS analyses. Still, the high-dimensional genomics features involved in these analyses are not suitable for ML due to the curse of dimensionality, requiring feature selection techniques to reduce the feature space. This study aims at a preliminary benchmark of the supervised Relevance-Redundancy (ReRa) feature selection method using a public genotype dataset of Parkinson’s disease patients. Obtained results demonstrated that the combination of Fisher exact test and ReRa emerged as the most performing feature selection method among the tested ones.
2025
Computational Intelligence Methods for Bioinformatics and Biostatistics. 19th International Meeting, CIBB 2024, Benevento, Italy, September 4–6, 2024, Revised Selected Papers
978-3-031-89704-7
machine learning, feature selection, genotype, supervised learning
File in questo prodotto:
File Dimensione Formato  
978-3-031-89704-7-1.pdf

Accesso riservato

Dimensione 33.84 MB
Formato Adobe PDF
33.84 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1273683
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact