[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2017-0260

Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

Farhadipour, Aref (Department of Media Engineering, IRI Broadcast University)
Veisi, Hadi (Faculty of New Sciences and Technologies, University of Tehran)
Asgari, Mohammad (Department of Media Engineering, IRI Broadcast University)
Keyvanrad, Mohammad Ali (Department of Computer Engineering and Information Technology, Amirkabir University of Technology)

Publication Information

ETRI Journal / v.40, no.5, 2018 , pp. 643-652 More about this Journal

Abstract

Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%.

Keywords

deep belief network; deep neural network; dysarthria; MFCC; speaker identification;

Citations & Related Records

Reference

1	J. Sohn and W. Sung, A voice activity detector employing soft decision based noise spectrum adaptation, IEEE Int. Conf. Acoustics, Speech, Signal Process., Seattle, WA, USA, May 15, 1998, pp. 365-368.
2	S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process. 28 (1980), no. 4, 357-366. DOI
3	H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, J. Acoust. Soc. Am. 87 (1990), no. 4, 1738-1752. DOI
4	G. E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Comput. 14 (2002), no. 8, 1771-1800. DOI
5	N. Dehak et al., Front-end factor analysis for speaker verification, IEEE Trans. Audio Speech Lang. Process. 19 (2011), no. 4, 788-798. DOI
6	M. A. Keyvanrad and M. M. Homayounpour, A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet), arXiv preprint arXiv:1408.3264, 2014.
7	B. Schuller et al., A survey on perceived speaker traits: Personality, likability, pathology, and the first challenge, Comput. Speech Lang. 29 (2015), no. 1, 32. DOI
8	K. L. Kadi et al., Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge, Biocybernetics Biomed. Eng. 36 (2016), no. 1, 233-247. DOI
9	X. Menendez-Pidal et al., The nemours database of dysarthric speech, Proc. Int. Conf. Spoken Lang., Philadelphia, PA, USA, Oct. 3-6, 1996, pp. 1962-1965.
10	F. Rudzicz, A. K. Namasivayam, and T. Wolff, The TORGO database of acoustic and articulatory speech from speakers with dysarthria, Lang. Resour. Eval. 46 (2012), no. 4, 523-541. DOI
11	H. Kim et al., Dysarthric speech database for universal access research, Interspeech 2008 (2008), 1741-1744.
12	S. R. Shahamiri, B. Salim, and S. Salwah, A multi-views multilearners approach towards dysarthric speech recognition using multi-nets artificial neural networks, IEEE Trans. Neural Syst. Rehabil. Eng. 22 (2014), no. 5, 1053-1063. DOI
13	R. Palmer and P. Enderby, Methods of speech therapy treatment for stable dysarthria: A review, Int. J. Speech-Lang. Pathol. 9 (2007), no. 2, 140-153. DOI
14	S.-O. Caballero-Morales and F. Trujillo-Romero, Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition, Expert Syst. Applicat. 41 (2014), no. 3, 841-852. DOI
15	S. R. Shahamiri and S. S. B. Salim, Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach, Adv. Eng. Inform. 28 (2014), no. 1, 102-110. DOI
16	G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag. 29 (2012), no. 6, 82-97. DOI
17	Z.-H. Ling et al., Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, IEEE Signal Process. Mag. 32 (2015), no. 3, 35-52. DOI
18	F. Rudzicz, Articulatory knowledge in the recognition of dysarthric speech, IEEE Trans. Audio Speech Lang. Process. 19 (2011), no. 4, 947-960. DOI
19	F. Rudzicz, Production knowledge in the recognition of dysarthric speech, Ph.D. Thesis, Dept. Comput. Sci, Toronto University, Canada, 2011.
20	V. Poblete et al., A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification, Comput. Speech Lang. 31 (2015), no. 1, 1-27. DOI
21	M. J. Kim, Y. Kim, and H. Kim, Automatic intelligibility assessment of dysarthric speech using phonologically-structured sparse linear model, IEEE/ACM Trans. Audio, Speech, Lang. Process. 23 (2015), no. 4, 694-704. DOI
22	X.-L. Zhang and J. Wu, Deep belief networks based voice activity detection, IEEE Trans. Audio Speech Lang. Process. 21 (2013), no. 4, 697-710. DOI
23	T. Kinnunen and L. Haizhou, An overview of text-indepedent speaker recognition from features to supervectores, Speech Commun. 52 (2010), no. 1, 12-40. DOI