Alzheimer's disease recognition from spontaneous speech using large language models

Jeong-Uk Bang;Seung-Hoon Han;Byung-Ok Kang;

doi:10.4218/etrij.2023-0356

ETRI Journal

Volume 46 Issue 1
/
Pages.96-105
/
2024
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Alzheimer's disease recognition from spontaneous speech using large language models

Jeong-Uk Bang (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Seung-Hoon Han (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Byung-Ok Kang (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)

Received : 2023.08.27
Accepted : 2023.12.20
Published : 2024.02.20

https://doi.org/10.4218/etrij.2023-0356 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

We propose a method to automatically predict Alzheimer's disease from speech data using the ChatGPT large language model. Alzheimer's disease patients often exhibit distinctive characteristics when describing images, such as difficulties in recalling words, grammar errors, repetitive language, and incoherent narratives. For prediction, we initially employ a speech recognition system to transcribe participants' speech into text. We then gather opinions by inputting the transcribed text into ChatGPT as well as a prompt designed to solicit fluency evaluations. Subsequently, we extract embeddings from the speech, text, and opinions by the pretrained models. Finally, we use a classifier consisting of transformer blocks and linear layers to identify participants with this type of dementia. Experiments are conducted using the extensively used ADReSSo dataset. The results yield a maximum accuracy of 87.3% when speech, text, and opinions are used in conjunction. This finding suggests the potential of leveraging evaluation feedback from language models to address challenges in Alzheimer's disease recognition.

Keywords

Acknowledgement

This research was supported by the National Research Council of Science & Technology (NST) grant by the Korea government (MSIT) (No. CAP21052 300)

References

R. Li and Y. Liu, Physical activity and prevention of Alzheimer's disease, J. Sport Health Sci. 5 (2016), 381-382. https://doi.org/10.1016/j.jshs.2016.10.008
M. F. Folstein, S. E. Folstein, and P. R. McHugh, "Mini-mental state": a practical method for grading the cognitive state of patients for the clinician, J. Psychiatr. Res. 12 (1975), 189-198. https://doi.org/10.1016/0022-3956(75)90026-6
Z. S. Nasreddine, N. A. Phillips, V. Bedirian, S. Charbonneau, V. Whitehead, I. Collin, J. L. Cummings, and H. Chertkow, The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment, J. Am. Geriatr. Soc. 53 (2005), 695-699. https://doi.org/10.1111/j.1532-5415.2005.53221.x
S. Chen, D. Stromer, H. A. Alabdalrahim, S. Schwab, M. Weih, and A. Maier, Automatic dementia screening and scoring by applying deep learning on clock-drawing tests, Sci. Rep. 10 (2020), DOI 10.1038/s41598-020-74710-9
I. Vigo, L. Coelho, and S. Reis, Speech-and language-based classification of Alzheimer's disease: a systematic review, Bioengineering 9 (2022), DOI 10.3390/bioengineering9010027
S. Dong and H.-B. Jeon, Feature analysis and evaluation for estimation of mild cognitive impairment from the spontaneous speech of Korean, (Proc. International Congress on Acoustics, Gyeongju, Republic of Korea), 2022.
E. Hussain, M. Hasan, S. Z. Hassan, T. H. Azmi, M. A. Rahman, and M. Z. Parvez, Deep learning based binary classification for Alzheimer's disease detection using brain MRI images, (15th IEEE Conference on Industrial Electronics and Applications-ICIEA, Kristiansand, Norway), 2020, pp. 1115-1120.
S. de la Fuente Garcia, C. W. Ritchie, and S. Luz, Artificial intelligence, speech, and language processing approaches to monitoring Alzheimer's disease: a systematic review, J. Alzheimers Dis. 78 (2020), 1547-1574. https://doi.org/10.3233/JAD-200888
J. Chen, J. Ye, F. Tang, and J. Zhou, Automatic detection of Alzheimer's disease using spontaneous speech only, (Proceedings of Interspeech, Brno, Czechia), 2021, pp. 3830-3834.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, (31st International Conference on Neural Information Processing Systems-NIPS, Long Beach, CA, USA), 2017, pp. 6000-6010.
S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney, Alzheimer's dementia recognition through spontaneous speech: the ADReSS Challenge, (Proceedings of Interspeech, Shanghai, China), 2020, pp. 2172-2176.
S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney, Detecting cognitive decline using speech only: the ADReSSo Challenge, (Proceedings of Interspeech, Brno, Czechia), 2021, pp. 3780-3784.
S. Luz, F. Haider, D. Fromm, I. Lazarou, I. Kompatsiaris, and B. MacWhinney, Multilingual Alzheimer's dementia recognition through spontaneous speech: a signal processing grand challenge, arXiv preprint (2023), DOI 10.48550/arXiv.2301.05562
H. Goodglass and E. Kaplan, The Boston diagnostic aphasia examination, Lea & Febinger, Philadelphia, 1983.
Balagopalan Aparna and J. Novikova, Comparing acoustic-based approaches for Alzheimer's disease detection, arXiv preprint (2021), DOI 10.48550/arXiv.2106.01555
L. Gauder, L. Pepino, L. Ferrer, and P. Riera, Alzheimer disease recognition using speech-based embeddings from pre-trained models, (Proceedings of Interspeech, Brno, Czechia), 2021, pp. 3795-3799.
Y. Zhu, A. Obyat, X. Liang, J. A. Batsis, and R. M. Roth, Wavbert: exploiting semantic and non-semantic speech using Wav2vec and BERT for dementia detection,(Proceedings of Interspeech, Brno, Czechia), 2021, pp. 3790-3794.
Z. S. Syed, M. S. Syed, M. Lech, and E. Pirogova, Tackling the ADRESSO Challenge 2021: the MUET-RMIT system for Alzheimer's dementia recognition from spontaneous speech, (Proceedings of Interspeech, Brno, Czechia), 2021, 3815-3819.
N. Wang, Y. Cao, S. Hao, Z. Shao, and K. P. Subbalakshmi, Modular multi-modal attention network for Alzheimer's disease detection using patient audio and language data, (Proceedings of Interspeech, Brno, Czechia), 2021, pp. 3835-3839.
R. Pappagari, J. Cho, S. Joshi, L. Moro-Velazquez, P. Zelasko, J. Villalba, and N. Dehak, Automatic detection and assessment of Alzheimer's disease using speech and language technologies in low-resource scenarios, (Proceedings of Interspeech, Brno, Czechia), 2021, pp. 3825-3829.
P. A. Perez-Toro, S. P. Bayerl, T. Arias-Vergara, J. C. Vasquez-Correa, P. Klumpp, M. Schuster, E. Noth, J. R. Orozco-Arroyave, and K. Riedhammer, Influence of the interviewer on the automatic assessment of Alzheimer's disease in the context of the ADReSSo Challenge, (Proceedings of Interspeech, Brno, Czechia), 2021, pp. 3785-3789.
Y. Pan, B. Mirheidari, J. M. Harris, J. C. Thompson, M. Jones, J. S. Snowden, D. Blackburn, and H. Christensen, Using the outputs of different automatic speech recognition paradigms for acoustic-and BERT-based Alzheimer's dementia detection through spontaneous speech, (Proceedings of Interspeech, Bron, Czechia), 2021, pp. 3810-3814.
A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, Wav2vec 2.0: a framework for self-supervised learning of speech representations, In Advances in neural information processing systems, 2020, 12449-12460.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, (Proceedings of NAACL-HLT, Minneapolis, MN, USA), 2019, pp. 4171-4186.
OpenAI, ChatGPT, 2023, Available from: https://chat.openai.com/chat, [last accessed July 2023].
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, Robust speech recognition via large-scale weak supervision, (International Conference on Machine Learning, Honolulu, HI, USA), 2023, pp. 28492-28518.
J. Shor, A. Jansen, R. Maor, O. Lang, O. Tuval, F. D. Quitry, M. Tagliasacchi, I. Shavitt, D. Emanuel, and Y. Haviv, Towards learning a universal non-semantic representation of speech, arXiv preprint (2020), 10.48550/arXiv.2002.12764
X. Li, S. Dalmia, J. Li, M. Lee, P. Littell, J. Yao, A. Anastasopoulos, D. R. Mortensen, G. Neubig, A. W. Black, and F. Metze, Universal phone recognition with a multilingual allophone system, (IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain), 2020, pp. 8249-8253.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, and A. Desmaison, PyTorch: an imperative style, high-performance deep learning library, (Advances in Neural Information Processing Systems, Vancouver, Canada, 2019.
OpenAI, GPT-4 technical report, arXiv preprint (2023), arXiv: 2303.08774.
T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, and J. Davison, Transformers: state-of-the-art natural language processing, 2020, pp. 38-45.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, and J. Vanderplas, Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011), 2825-2830.
B. MacWhinney, Tools for analyzing talk part 2: the CLAN program, Talkbank, 2017.
F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. Andre, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan, and K. P. Truong, The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing, IEEE Trans. Affect. Comput. 7 (2015), 190-202. https://doi.org/10.1109/TAFFC.2015.2457417

ETRI Journal

Alzheimer's disease recognition from spontaneous speech using large language models

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)