Multi-site neuroimaging datasets are difficult to integrate due to confounding site effects. ComBat has been a widely used statistical-based model for this type of harmonization. Nevertheless, it suffers some drawbacks which prevent its application in external validation frameworks in machine learning (ML) analyses. First, ComBat relies on all current sites to estimate model parameters, leading to the necessity of re-fitting the model when data from new unseen sites is added. Then, it requires the inclusion of biological information of interest (e.g. diagnosis) in the model fitting process, which is incompatible with the harmonization of samples with unknown outcomes, a necessary condition to develop AI applications based on predictive models to be employed in clinical settings. In this work, we propose to solve the former issues by employing modified ComBat (M-ComBat) in a normative framework (NM-ComBat). To assess its harmonization efficacy, we compared four different ComBat variations, including its standard application (S-ComBat), M-ComBat, and the normative variation of the latter, NS-ComBat and NM-ComBat, in harmonizing a multi-site functional connectivity (FC) dataset. Our results show that NM-ComBat enabled the successful harmonization of external datasets, successfully eliminating site effects from data, while preserving biological covariates of interest, such as age, sex, and diagnosis. These results paved the way for the application of ComBat in ML analysis and external validation frameworks, contributing to the generalizability of developed models and their potential clinical applicability.
Multi-site External Sets Harmonization with M-ComBat: An Application to Functional Connectivity in a Normative Framework
Sampaio, Inês Won;Tassi, Emma;Bianchi, Anna M.;Maggioni, Eleonora
2024-01-01
Abstract
Multi-site neuroimaging datasets are difficult to integrate due to confounding site effects. ComBat has been a widely used statistical-based model for this type of harmonization. Nevertheless, it suffers some drawbacks which prevent its application in external validation frameworks in machine learning (ML) analyses. First, ComBat relies on all current sites to estimate model parameters, leading to the necessity of re-fitting the model when data from new unseen sites is added. Then, it requires the inclusion of biological information of interest (e.g. diagnosis) in the model fitting process, which is incompatible with the harmonization of samples with unknown outcomes, a necessary condition to develop AI applications based on predictive models to be employed in clinical settings. In this work, we propose to solve the former issues by employing modified ComBat (M-ComBat) in a normative framework (NM-ComBat). To assess its harmonization efficacy, we compared four different ComBat variations, including its standard application (S-ComBat), M-ComBat, and the normative variation of the latter, NS-ComBat and NM-ComBat, in harmonizing a multi-site functional connectivity (FC) dataset. Our results show that NM-ComBat enabled the successful harmonization of external datasets, successfully eliminating site effects from data, while preserving biological covariates of interest, such as age, sex, and diagnosis. These results paved the way for the application of ComBat in ML analysis and external validation frameworks, contributing to the generalizability of developed models and their potential clinical applicability.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.