[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2019-0364

Real-time implementation and performance evaluation of speech classifiers in speech analysis-synthesis

Kumar, Sandeep (Department of Electronics and Communication Engineering, National Institute of Technology)

Publication Information

ETRI Journal / v.43, no.1, 2021 , pp. 82-94 More about this Journal

Abstract

In this work, six voiced/unvoiced speech classifiers based on the autocorrelation function (ACF), average magnitude difference function (AMDF), cepstrum, weighted ACF (WACF), zero crossing rate and energy of the signal (ZCR-E), and neural networks (NNs) have been simulated and implemented in real time using the TMS320C6713 DSP starter kit. These speech classifiers have been integrated into a linear-predictive-coding-based speech analysis-synthesis system and their performance has been compared in terms of the percentage of the voiced/unvoiced classification accuracy, speech quality, and computation time. The results of the percentage of the voiced/unvoiced classification accuracy and speech quality show that the NN-based speech classifier performs better than the ACF-, AMDF-, cepstrum-, WACF- and ZCR-E-based speech classifiers for both clean and noisy environments. The computation time results show that the AMDF-based speech classifier is computationally simple, and thus its computation time is less than that of other speech classifiers, while that of the NN-based speech classifier is greater compared with other classifiers.

Keywords

ACF; AMDF; Cepstrum; neural network; real-time system; speech classification; WACF; ZCR-E;

Citations & Related Records

Reference

1	S. Ahmadi and A. Spanisa, Cepstrum-based pitch detection using a new statistical V/UV classification algorithm, IEEE Trans. Speech, Audio Process. 7 (1999), no. 3, 333-338. DOI
2	A. Mousa, Speech segmentation in synthesized speech morphing using pitch shifting, Int. Arab J. Inf. Technol. 8 (2011), no. 2, 221-226.
3	S. Kumar, S. K. Singh, and S. Bhattacharya, Performance evaluation of a ACF-AMDF based pitch detection scheme in real time, Int. J. Speech Technol. 18 (2015), no. 4, 521-527. DOI
4	Y. Faycal and M. Bensebti, Comparative performance study of several features for voiced/ Non-voiced classification, Int. Arab J. Inf. Technol. 11 (2014), no. 3, 293-299.
5	R. G. Bachu et al., Voiced/Unvoiced decision for speech signals based on zero-crossing rate and energy, Advanced Techniques in Computing Sciences and Software Engineering, K. Elleithy (eds), Springer, Dordrecht, Netherlands, 2010, pp. 279-282.
6	L. Janer, J. J. Bonet, and E. L. Solano, Pitch detection and voiced/unvoiced decision algorithm based on wavelet transforms, in Proc. Int. Conf. Spoken Language Process. (Philadelphia, PA, USA), Oct. 1996, pp. 1209-1212.
7	S. Kumar et al., Performance evaluation of a wavelet-based pitch detection scheme, Int. J. Speech Technol. 16 (2013), no. 4, 431-417. DOI
8	K. M. Hassan, E. Hamid, and K. I. Molla, A method for voiced/unvoiced classification of noisy speech by analyzing time-domain features of spectrogram image, Sci. J. Circuits, Syst. Signal Process. 6 (2017), no. 2, 11-17. DOI
9	B. S. Atal and L. R. Rabiner, A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Acoust., Speech, Signal Process. 24 (1976), no. 3, 201-212. DOI
10	J. K. Shah et al., Robust voiced/unvoiced classification using novel features and Gaussian mixture model, 2004, available at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.618.2362&rep=rep1&type=pdf
11	Y. Qi and B. R. Hunt, Voiced-unvoiced-silence classification of speech using hybrid features and a network classifier, IEEE Trans, Speech, Audio Process. 1 (1993), no. 2, 250-255. DOI
12	T. Drugman et al., Traditional machine learning for pitch detection, IEEE Signal Process. Lett. 25 (2018), no. 11, 1745-1749. DOI
13	S. Bagavathi and S. I. Padma, Neural network based voiced and unvoiced classification using EGG and MFCC feature, Int. Research J. Eng. Technol. 4 (2017), no. 4, 1934-1937.
14	A. Bendiksen and K. Steiglitz, Neural networks for voiced/unvoiced speech classification, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process. (Albuquerque, NM, USA), Apr. 1990, pp. 521-524.
15	B. H. Juang and L. R. Rabiner, Spectral representations for speech recognition by neural networks-A tutorial, in Proc. Neural Netw. Signal Process. II Proc. IEEE Workshop (Helsingoer, Denmark), 1992, pp. 214-222.
16	G. Sun et al., The complexity analysis of voiced and unvoiced speech signal based on sample entropy, in Proc. Int. Conf. Math. Comput. Sci. Industry (Corfu, Greece), Aug. 2017, pp. 26-29.
17	M. Sharma and R. Mammone, Automatic speech segmentation using neural tree networks, in Proc. IEEE Workshop Neural Netw. Signal Process. (Cambridge, MA, USA), 1995, pp. 282-290.
18	K. Khaldi, A. Boudraa, and M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement, IET Signal Process. 10 (2016), no. 1, 69-80. DOI
19	Z. Ali and M. Talha, Innovative method for unsupervised voice activity detection and classification of audio segments, IEEE Access 6 (2018), 15494-15504. DOI
20	K. Struwe, Voiced-unvoiced classification of speech using a neural network trained with LPC coefficients, in Proc. Int. Conf. Contr., Artif. Intell., Robot. Opt. (Prague, Czech Republic), May 2017, pp. 56-59.
21	C. Yeh and C. Zhuo, An efficient complexity reduction algorithm for G.729 speech codec, Comput. Math. Applicat. 64 (2012), no. 5, 887-896. DOI
22	S. S. Park, J. W. Shin, and N. S. Kim, Automatic speech segmentation with multiple statistical models, in Proc. INTERSPEECH 2006 - ICSLP (Pittsburgh, PA, USA), 2017, pp. 2066-2069.
23	S. Bhattacharya, S. K. Singh, and T. Abhinav, Performance evaluation of lpc and cepstral speech coder in simulation and in real time, in Proc. Int. Conf. Recent Adv. Inf. Technol. (Dhanbad, India), Mar. 2012, pp. 826-831.
24	S. Kumar, Performance evaluation of a novel AMDF-based pitch detection scheme, ETRI J. 38 (2016), no. 3, 425-434. DOI
25	G. Pirker et al., A pitch tracking corpus with evaluation on multipitch tracking scenario, in Proc. Interspeech - Int. Conf. Spoken Language Process. (Florence, Italy), 2011, pp. 1509-1512.
26	Y. Hu and P. Loizou, Subjective evaluation and comparision of speech enhancement algorithms, Speech Commun. 49 (2007), no. 7-8, 588-601. DOI
27	J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-time processing of speech signal, Wiley, Piscataway, NJ, USA, 2000, pp. 570-579.
28	ITU-T P.862, Perceptual evaluation of speech quality (PESQ), 2004.
29	S. Kumar, S. Bhattacharya, and P. Patel, A new pitch detection scheme based on ACF and AMDF, in Proc. IEEE Int. Conf. Adv. Commun., Contr. Comput. Technol. (Ramanathapuram, India), 2014, pp. 1235-1240.
30	S. Kadambe, G. F. Boudreaux-Bartels, Application of the wavelet transform for pitch detection of speech signals, IEEE Trans. Inf. Theory 38 (1992), no. 2, 917-924. DOI