Alzheimer’s Disease (AD) is a progressive neurodegenerative disease that has no cure. Early detection is critical to slow its development, but the diagnosis process is lengthy and costly. Computer-Aided Dementia Detection through Natural Language Processing is emerging as a viable solution for an early diagnosis. Many works in the literature use transcripts of the conversations from the famous DementiaBank dataset to train and test Machine Learning models to detect Dementia automatically. However, the reproducibility and comparability of previous results have been a significant problem in this research domain. We propose a set of curated features, a modular and extensible Feature Extraction framework, and a Performance Evaluation framework to solve these problems. We then evaluated the baseline performance of 12 Machine Learning algorithms over three different tasks: Regression, Binary Classification, and Multiclass Classification with 3, 4, and 5 classes. The top performer model was the Gradient Boosted Decision Trees, achieving an RMSE of 4.3 for the Regression task, an Accuracy of 0.78 for the Binary classification task, and an Accuracy of respectively 0.63, 0.64, and 0.49 for the 3, 4, and 5 classes Multiclass Classification tasks.

Computer-Aided Dementia Detection: How Informative Are Your Features?

G. W. Di Donato;M. D. Santambrogio
2022-01-01

Abstract

Alzheimer’s Disease (AD) is a progressive neurodegenerative disease that has no cure. Early detection is critical to slow its development, but the diagnosis process is lengthy and costly. Computer-Aided Dementia Detection through Natural Language Processing is emerging as a viable solution for an early diagnosis. Many works in the literature use transcripts of the conversations from the famous DementiaBank dataset to train and test Machine Learning models to detect Dementia automatically. However, the reproducibility and comparability of previous results have been a significant problem in this research domain. We propose a set of curated features, a modular and extensible Feature Extraction framework, and a Performance Evaluation framework to solve these problems. We then evaluated the baseline performance of 12 Machine Learning algorithms over three different tasks: Regression, Binary Classification, and Multiclass Classification with 3, 4, and 5 classes. The top performer model was the Gradient Boosted Decision Trees, achieving an RMSE of 4.3 for the Regression task, an Accuracy of 0.78 for the Binary classification task, and an Accuracy of respectively 0.63, 0.64, and 0.49 for the 3, 4, and 5 classes Multiclass Classification tasks.
2022
Proceedings of 2022 IEEE 7th Forum on Research and Technologies for Society and Industry Innovation (RTSI)
978-1-6654-9739-8
978-1-6654-9740-4
File in questo prodotto:
File Dimensione Formato  
Computer-Aided_Dementia_Detection_How_Informative_Are_Your_Features.pdf

Accesso riservato

: Publisher’s version
Dimensione 750.84 kB
Formato Adobe PDF
750.84 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1231638
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact