DOI QR코드

DOI QR Code

Harnessing the Power of Voice: A Deep Neural Network Model for Alzheimer's Disease Detection

  • Chan-Young Park (Department of Neurology, Chung-Ang University College of Medicine) ;
  • Minsoo Kim (Research and Development, Baikal AI Inc.) ;
  • YongSoo Shim (Department of Neurology, Eunpyeong St. Mary's Hospital, The Catholic University of Korea) ;
  • Nayoung Ryoo (Department of Neurology, Eunpyeong St. Mary's Hospital, The Catholic University of Korea) ;
  • Hyunjoo Choi (Department of Communication Disorders, Korea Nazarene University) ;
  • Ho Tae Jeong (Department of Neurology, Chung-Ang University College of Medicine) ;
  • Gihyun Yun (Research and Development, Baikal AI Inc.) ;
  • Hunboc Lee (Research and Development, Baikal AI Inc.) ;
  • Hyungryul Kim (Research and Development, Baikal AI Inc.) ;
  • SangYun Kim (Department of Neurology, Seoul National University College of Medicine and Seoul National University Bundang Hospital) ;
  • Young Chul Youn (Department of Neurology, Chung-Ang University College of Medicine)
  • Received : 2023.10.11
  • Accepted : 2023.12.08
  • Published : 2024.01.31

Abstract

Background and Purpose: Voice, reflecting cerebral functions, holds potential for analyzing and understanding brain function, especially in the context of cognitive impairment (CI) and Alzheimer's disease (AD). This study used voice data to distinguish between normal cognition and CI or Alzheimer's disease dementia (ADD). Methods: This study enrolled 3 groups of subjects: 1) 52 subjects with subjective cognitive decline; 2) 110 subjects with mild CI; and 3) 59 subjects with ADD. Voice features were extracted using Mel-frequency cepstral coefficients and Chroma. Results: A deep neural network (DNN) model showed promising performance, with an accuracy of roughly 81% in 10 trials in predicting ADD, which increased to an average value of about 82.0%±1.6% when evaluated against unseen test dataset. Conclusions: Although results did not demonstrate the level of accuracy necessary for a definitive clinical tool, they provided a compelling proof-of-concept for the potential use of voice data in cognitive status assessment. DNN algorithms using voice offer a promising approach to early detection of AD. They could improve the accuracy and accessibility of diagnosis, ultimately leading to better outcomes for patients.

Keywords

Acknowledgement

This research was supported by grants from the Ministry of SMEs and Startups (Project Number: S3079103) and the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (Project Number: NRF2017S1A6A3A01078538).

References

  1. Selkoe DJ, Lansbury PJ. Alzheimer's disease is the most common neurodegenerative disorder. In: Siegel GJ, Agranoff BW, Albers RW, Fisher SK, Uhler MD, editors. Basic Neurochemistry: Molecular, Cellular and Medical Aspects. 6th ed. Philadelphia: Lippincott-Raven, 1999.
  2. World Health Organization. Dementia. Key Facts. Vol 2023. Geneva: World Health Organization, 2023.
  3. Knopman DS, DeKosky ST, Cummings JL, Chui H, Corey-Bloom J, Relkin N, et al. Practice parameter: diagnosis of dementia (an evidence-based review). Report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology 2001;56:1143-1153. https://doi.org/10.1212/WNL.56.9.1143
  4. Waldemar G, Dubois B, Emre M, Georges J, McKeith IG, Rossor M, et al. Recommendations for the diagnosis and management of Alzheimer's disease and other disorders associated with dementia: EFNS guideline. Eur J Neurol 2007;14:e1-e26. https://doi.org/10.1111/j.1468-1331.2006.01605.x
  5. Konig A, Satt A, Sorin A, Hoory R, Toledo-Ronen O, Derreumaux A, et al. Automatic speech analysis for the assessment of patients with predementia and Alzheimer's disease. Alzheimers Dement (Amst) 2015;1:112-124. https://doi.org/10.1016/j.dadm.2014.11.012
  6. Themistocleous C, Eckerstrom M, Kokkinakis D. Voice quality and speech fluency distinguish individuals with mild cognitive impairment from healthy controls. PLoS One 2020;15:e0236009.
  7. Garrard P, Maloney LM, Hodges JR, Patterson K. The effects of very early Alzheimer's disease on the characteristics of writing by a renowned author. Brain 2005;128:250-260. https://doi.org/10.1093/brain/awh341
  8. Mahon E, Lachman ME. Voice biomarkers as indicators of cognitive changes in middle and later adulthood. Neurobiol Aging 2022;119:22-35. https://doi.org/10.1016/j.neurobiolaging.2022.06.010
  9. Houde JF, Jordan MI. Sensorimotor adaptation of speech I: compensation and adaptation. J Speech Lang Hear Res 2002;45:295-310. https://doi.org/10.1044/1092-4388(2002/023)
  10. Purcell DW, Munhall KG. Compensation following real-time manipulation of formants in isolated vowels. J Acoust Soc Am 2006;119:2288-2297. https://doi.org/10.1121/1.2173514
  11. Houde JF, Nagarajan SS, Sekihara K, Merzenich MM. Modulation of the auditory cortex during speech: an MEG study. J Cogn Neurosci 2002;14:1125-1138. https://doi.org/10.1162/089892902760807140
  12. Braak H, Braak E. Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol 1991;82:239-259. https://doi.org/10.1007/BF00308809
  13. Dubois B, Feldman HH, Jacova C, Dekosky ST, Barberger-Gateau P, Cummings J, et al. Research criteria for the diagnosis of Alzheimer's disease: revising the NINCDS-ADRDA criteria. Lancet Neurol 2007;6:734-746. https://doi.org/10.1016/S1474-4422(07)70178-3
  14. First MB, Pincus HA. The DSM-IV Text Revision: rationale and potential impact on clinical practice. Psychiatr Serv 2002;53:288-292. https://doi.org/10.1176/appi.ps.53.3.288
  15. Jahng S, Na DL, Kang Y. Constructing a composite score for the Seoul Neuropsychological Screening Battery-Core. Dement Neurocogn Disord 2015;14:137-142. https://doi.org/10.12779/dnd.2015.14.4.137
  16. Xue C, Karjadi C, Paschalidis IC, Au R, Kolachalama VB. Detection of dementia on voice recordings using deep learning: a Framingham Heart Study. Alzheimers Res Ther 2021;13:146.
  17. ELman JL; JL EL. Effects of frequency-shifted feedback on the pitch of vocal productions. J Acoust Soc Am 1981;70:45-50. https://doi.org/10.1121/1.386580
  18. Jones JA, Munhall KG. Perceptual calibration of F0 production: evidence from feedback perturbation. J Acoust Soc Am 2000;108:1246-1251. https://doi.org/10.1121/1.1288414
  19. Houde JF, Jordan MI. Sensorimotor adaptation in speech production. Science 1998;279:1213-1216. https://doi.org/10.1126/science.279.5354.1213
  20. Pisotta I, Molinari M. Cerebellar contribution to feedforward control of locomotion. Front Hum Neurosci 2014;8:475.
  21. Houde JF, Nagarajan SS. Speech production as state feedback control. Front Hum Neurosci 2011;5:82.
  22. Konig A, Satt A, Sorin A, Hoory R, Derreumaux A, David R, et al. Use of speech analyses within a mobile application for the assessment of cognitive impairment in elderly people. Curr Alzheimer Res 2018;15:120-129. https://doi.org/10.2174/1567205014666170829111942
  23. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-444. https://doi.org/10.1038/nature14539
  24. Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 1980;28:357-366. https://doi.org/10.1109/TASSP.1980.1163420
  25. Tzanetakis G, Cook P. Musical genre classification of audio signals. IEEE Trans Speech Audio Process 2002;10:293-302. https://doi.org/10.1109/TSA.2002.800560
  26. Muller M, Ewert S. Chroma toolbox: Matlab implementations for extracting variants of Chroma-based audio features. In: Proceedings of the International Society for Music Information Retrieval Conference; 2011 October 24-28; Miami. [place unknown]: International Society for Music Information Retrieval, 2011.