Alzheimer’s Disease (AD) is a progressive neurodegenerative disease that has no cure. Early detection is critical to slow its development, but the diagnosis process is lengthy and costly. Computer-Aided Dementia Detection through Natural Language Processing is emerging as a viable solution for an early diagnosis. Many works in the literature use transcripts of the conversations from the famous DementiaBank dataset to train and test Machine Learning models to detect Dementia automatically. However, the reproducibility and comparability of previous results have been a significant problem in this research domain. We propose a set of curated features, a modular and extensible Feature Extraction framework, and a Performance Evaluation framework to solve these problems. We then evaluated the baseline performance of 12 Machine Learning algorithms over three different tasks: Regression, Binary Classification, and Multiclass Classification with 3, 4, and 5 classes. The top performer model was the Gradient Boosted Decision Trees, achieving an RMSE of 4.3 for the Regression task, an Accuracy of 0.78 for the Binary classification task, and an Accuracy of respectively 0.63, 0.64, and 0.49 for the 3, 4, and 5 classes Multiclass Classification tasks.
Computer-Aided Dementia Detection: How Informative Are Your Features?
G. W. Di Donato;M. D. Santambrogio
2022-01-01
Abstract
Alzheimer’s Disease (AD) is a progressive neurodegenerative disease that has no cure. Early detection is critical to slow its development, but the diagnosis process is lengthy and costly. Computer-Aided Dementia Detection through Natural Language Processing is emerging as a viable solution for an early diagnosis. Many works in the literature use transcripts of the conversations from the famous DementiaBank dataset to train and test Machine Learning models to detect Dementia automatically. However, the reproducibility and comparability of previous results have been a significant problem in this research domain. We propose a set of curated features, a modular and extensible Feature Extraction framework, and a Performance Evaluation framework to solve these problems. We then evaluated the baseline performance of 12 Machine Learning algorithms over three different tasks: Regression, Binary Classification, and Multiclass Classification with 3, 4, and 5 classes. The top performer model was the Gradient Boosted Decision Trees, achieving an RMSE of 4.3 for the Regression task, an Accuracy of 0.78 for the Binary classification task, and an Accuracy of respectively 0.63, 0.64, and 0.49 for the 3, 4, and 5 classes Multiclass Classification tasks.File | Dimensione | Formato | |
---|---|---|---|
Computer-Aided_Dementia_Detection_How_Informative_Are_Your_Features.pdf
Accesso riservato
:
Publisher’s version
Dimensione
750.84 kB
Formato
Adobe PDF
|
750.84 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.