Browse > Article
http://dx.doi.org/10.22156/CS4SMB.2021.11.06.001

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition  

Oh, SangYeob (Division of Computer Engineering, Gachon University)
Publication Information
Journal of Convergence for Information Technology / v.11, no.6, 2021 , pp. 1-6 More about this Journal
Abstract
In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.
Keywords
DNN; GMM; Speech Feature Extraction; LMS; Noise Elimination;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 E. T. S. I. Standard. (2003). Speech Processing, Transmission and Quality aspects(STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI ES 202 050 v.1.1.3.
2 K. C. Wang & Y. H. Tsai. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In 2008 Second International Symposium on Universal Communication (pp. 423-428). DOI : 10.1109/ISUC.2008.55   DOI
3 C. S. Ahn & S. Y. Oh. (2012). Gaussian Model Optimization using Configuration Thread Control In CHMM Vocabulary Recognition. The Journal of Digital Policy and Management. 10(7), 167-172. DOI : 10.14400/JDPM.2012.10.7.167   DOI
4 J. Homer & I. Mareels. (2004). LS detection guided NLMS estimation of sparse system. Proceedings of the IEEE 2004 International Conference on Acoustic. Speech, and Signal Processing(ICASSP). Montreal, Quebec, Canada. DOI : 10.1109/ICASSP.2004.1326394   DOI
5 B. Sisman, J. Yamagishi, S. King & H. Li. (2020). An overview of voice conversion and its challenges: From statistical modeling to deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing.
6 Q. Li, J. Zheng, A.Tsai & Q. Zhou. (2002). Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing, 10(3), 146-157. DOI : 10.1109/TSA.2002.1001979   DOI
7 A. Arango, J. P'erez & B. Poblete. (2019). Hate Speech Detection is Not as Easy as You May Think, A Closer Look at Model Validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 45-54. Paris, France: Association for Computing Machinery. DOI : 10.1145/3331184.3331262   DOI
8 S. S. Aluru, B. Mathew, P. Saha & A. Mukherjee. (2020). Deep Learning Models for Multilingual Hate Speech Detection, arXiv preprint arXiv:2004.06465
9 K. Chung & S. Y. Oh. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629-635. DOI : 10.1007/s10586-015-0429-9   DOI
10 P. Scart & J. Filho, (2002). Speech enhancement based on a priori signal to noise estimation. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 2, pp. 629-632). IEEE.
11 S. Kamarth & P.Loizou. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In ICASSP (Vol. 4, pp. 44164-44164).
12 S. Y. Oh & K. Chung. (2018). Performance evaluation of silence-feature normalization model using cepstrum features of noise signals. Wireless Personal Communications, 98(4), 3287-3297. DOI : 10.1109/TASL.2007.911054   DOI
13 B. F. Wu & K. C. Wang. (2005). Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech and Audio Processing, 13(5), 762-775. DOI : 10.1109/TSA.2005.851909   DOI
14 Yi Hu & P. C. Loizou. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on audio, speech, and language processing, 16(1), 229-238.   DOI
15 S. Y. Oh. (2020). Speech Recognition Performance Improvement using a convergence of GMM Phoneme Unit parameter and Vocabulary Clustering. Journal of Convergence for Information Technology, 10(8), 35-39. DOI : 10.22156/CS4SMB.2020.10.08.035   DOI