Genetic heterogeneity poses a significant challenge in understanding complex diseases, as variations in the genetic makeup of individuals can lead to diverse disease manifestations and treatment responses. Here, we propose a three-stage data science methodology designed to systematically explore and analyze the genetic heterogeneity of a given disease, particularly focusing on critical patient subgroups. The proposed approach consists of a feature engineering phase, where various feature space options are devised and compared, a supervised learning framework for accurately classifying the patient subgroup of interest, and a final stage devoted to feature prioritization to identify gene variants with predictive and, potentially, therapeutic value. To this final aim, our methodology includes feature importance analysis and further exploration of clinically relevant and actionable genes involved in mutational events contributing to patient differentiation. As an application use case, we apply this methodology to investigate the mutational landscape of the critical subgroup of Triple-Negative Breast Cancer patients, demonstrating its validity in uncovering significant gene variants with possible therapeutic implications. This three-stage methodology offers a robust approach for advancing research into disease genetic heterogeneity and contributing to improving personalized treatment for patients.
Three-Stage Data Science Methodology to Explore Genetic Heterogeneity of Diseases
Cascianelli, Silvia;Masseroli, Marco
2025-01-01
Abstract
Genetic heterogeneity poses a significant challenge in understanding complex diseases, as variations in the genetic makeup of individuals can lead to diverse disease manifestations and treatment responses. Here, we propose a three-stage data science methodology designed to systematically explore and analyze the genetic heterogeneity of a given disease, particularly focusing on critical patient subgroups. The proposed approach consists of a feature engineering phase, where various feature space options are devised and compared, a supervised learning framework for accurately classifying the patient subgroup of interest, and a final stage devoted to feature prioritization to identify gene variants with predictive and, potentially, therapeutic value. To this final aim, our methodology includes feature importance analysis and further exploration of clinically relevant and actionable genes involved in mutational events contributing to patient differentiation. As an application use case, we apply this methodology to investigate the mutational landscape of the critical subgroup of Triple-Negative Breast Cancer patients, demonstrating its validity in uncovering significant gene variants with possible therapeutic implications. This three-stage methodology offers a robust approach for advancing research into disease genetic heterogeneity and contributing to improving personalized treatment for patients.| File | Dimensione | Formato | |
|---|---|---|---|
|
C49_CIBB_2024_LNBI_2025_150-164.pdf
Accesso riservato
:
Publisher’s version
Dimensione
1.31 MB
Formato
Adobe PDF
|
1.31 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


