Genetic heterogeneity poses a significant challenge in understanding complex diseases, as variations in the genetic makeup of individuals can lead to diverse disease manifestations and treatment responses. Here, we propose a three-stage data science methodology designed to systematically explore and analyze the genetic heterogeneity of a given disease, particularly focusing on critical patient subgroups. The proposed approach consists of a feature engineering phase, where various feature space options are devised and compared, a supervised learning framework for accurately classifying the patient subgroup of interest, and a final stage devoted to feature prioritization to identify gene variants with predictive and, potentially, therapeutic value. To this final aim, our methodology includes feature importance analysis and further exploration of clinically relevant and actionable genes involved in mutational events contributing to patient differentiation. As an application use case, we apply this methodology to investigate the mutational landscape of the critical subgroup of Triple-Negative Breast Cancer patients, demonstrating its validity in uncovering significant gene variants with possible therapeutic implications. This three-stage methodology offers a robust approach for advancing research into disease genetic heterogeneity and contributing to improving personalized treatment for patients.

Three-Stage Data Science Methodology to Explore Genetic Heterogeneity of Diseases

Cascianelli, Silvia;Masseroli, Marco
2025-01-01

Abstract

Genetic heterogeneity poses a significant challenge in understanding complex diseases, as variations in the genetic makeup of individuals can lead to diverse disease manifestations and treatment responses. Here, we propose a three-stage data science methodology designed to systematically explore and analyze the genetic heterogeneity of a given disease, particularly focusing on critical patient subgroups. The proposed approach consists of a feature engineering phase, where various feature space options are devised and compared, a supervised learning framework for accurately classifying the patient subgroup of interest, and a final stage devoted to feature prioritization to identify gene variants with predictive and, potentially, therapeutic value. To this final aim, our methodology includes feature importance analysis and further exploration of clinically relevant and actionable genes involved in mutational events contributing to patient differentiation. As an application use case, we apply this methodology to investigate the mutational landscape of the critical subgroup of Triple-Negative Breast Cancer patients, demonstrating its validity in uncovering significant gene variants with possible therapeutic implications. This three-stage methodology offers a robust approach for advancing research into disease genetic heterogeneity and contributing to improving personalized treatment for patients.
2025
Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2024
9783031897030
9783031897047
Actionability
Feature engineering
Feature importance
Gene variant prioritization
Supervised models
File in questo prodotto:
File Dimensione Formato  
C49_CIBB_2024_LNBI_2025_150-164.pdf

Accesso riservato

: Publisher’s version
Dimensione 1.31 MB
Formato Adobe PDF
1.31 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1310283
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact