The main objective of this work is to develop machine learning models for the prediction of patient outcome in nephrology care as well as to validate and optimize the models with a feature selection approach. Cardiovascular events are a major cause of morbidity and mortality in hemodialysis (HD) patients and have an incidence of 20% in the first year of renal replacement therapy. Real data routinely collected during HD administration were extracted from the Fresenius Medical Care database EuCliD (39 independent variables) and used to develop a random forest predictive model to forecast cardiovascular events in the first year of HD treatment. Two feature selection methods were applied. Results of these models in an independent cohort of patients showed a significant predictive ability. The authors’ results were obtained with a random forest built on 6 variables only (AUC: 77.1% ± 2.9%; MCE: 31.6% ± 3.5%), identified by the variable importance out of bag (OOB) estimate.
Mining Medical Data to Develop Clinical Decision Making Tools in Hemodialysis: Prediction of Cardiovascular Events and Feature Selection using a Random Forest Approach
ION TITAPICCOLO, JASMINE;FERRARIO, MANUELA;CERUTTI, SERGIO;SIGNORINI, MARIA GABRIELLA
2011-01-01
Abstract
The main objective of this work is to develop machine learning models for the prediction of patient outcome in nephrology care as well as to validate and optimize the models with a feature selection approach. Cardiovascular events are a major cause of morbidity and mortality in hemodialysis (HD) patients and have an incidence of 20% in the first year of renal replacement therapy. Real data routinely collected during HD administration were extracted from the Fresenius Medical Care database EuCliD (39 independent variables) and used to develop a random forest predictive model to forecast cardiovascular events in the first year of HD treatment. Two feature selection methods were applied. Results of these models in an independent cohort of patients showed a significant predictive ability. The authors’ results were obtained with a random forest built on 6 variables only (AUC: 77.1% ± 2.9%; MCE: 31.6% ± 3.5%), identified by the variable importance out of bag (OOB) estimate.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.