[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.22156/CS4SMB.2021.11.06.001

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition

Oh, SangYeob (Division of Computer Engineering, Gachon University)

Publication Information

Journal of Convergence for Information Technology / v.11, no.6, 2021 , pp. 1-6 More about this Journal

Abstract

In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.

Keywords

DNN; GMM; Speech Feature Extraction; LMS; Noise Elimination;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	E. T. S. I. Standard. (2003). Speech Processing, Transmission and Quality aspects(STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI ES 202 050 v.1.1.3.
2	K. C. Wang & Y. H. Tsai. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In 2008 Second International Symposium on Universal Communication (pp. 423-428). DOI : 10.1109/ISUC.2008.55 DOI
3	C. S. Ahn & S. Y. Oh. (2012). Gaussian Model Optimization using Configuration Thread Control In CHMM Vocabulary Recognition. The Journal of Digital Policy and Management. 10(7), 167-172. DOI : 10.14400/JDPM.2012.10.7.167 DOI
4	J. Homer & I. Mareels. (2004). LS detection guided NLMS estimation of sparse system. Proceedings of the IEEE 2004 International Conference on Acoustic. Speech, and Signal Processing(ICASSP). Montreal, Quebec, Canada. DOI : 10.1109/ICASSP.2004.1326394 DOI
5	B. Sisman, J. Yamagishi, S. King & H. Li. (2020). An overview of voice conversion and its challenges: From statistical modeling to deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing.
6	Q. Li, J. Zheng, A.Tsai & Q. Zhou. (2002). Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing, 10(3), 146-157. DOI : 10.1109/TSA.2002.1001979 DOI
7	A. Arango, J. P'erez & B. Poblete. (2019). Hate Speech Detection is Not as Easy as You May Think, A Closer Look at Model Validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 45-54. Paris, France: Association for Computing Machinery. DOI : 10.1145/3331184.3331262 DOI
8	S. S. Aluru, B. Mathew, P. Saha & A. Mukherjee. (2020). Deep Learning Models for Multilingual Hate Speech Detection, arXiv preprint arXiv:2004.06465
9	K. Chung & S. Y. Oh. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629-635. DOI : 10.1007/s10586-015-0429-9 DOI
10	P. Scart & J. Filho, (2002). Speech enhancement based on a priori signal to noise estimation. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 2, pp. 629-632). IEEE.
11	S. Kamarth & P.Loizou. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In ICASSP (Vol. 4, pp. 44164-44164).
12	S. Y. Oh & K. Chung. (2018). Performance evaluation of silence-feature normalization model using cepstrum features of noise signals. Wireless Personal Communications, 98(4), 3287-3297. DOI : 10.1109/TASL.2007.911054 DOI
13	B. F. Wu & K. C. Wang. (2005). Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech and Audio Processing, 13(5), 762-775. DOI : 10.1109/TSA.2005.851909 DOI
14	Yi Hu & P. C. Loizou. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on audio, speech, and language processing, 16(1), 229-238. DOI
15	S. Y. Oh. (2020). Speech Recognition Performance Improvement using a convergence of GMM Phoneme Unit parameter and Vocabulary Clustering. Journal of Convergence for Information Technology, 10(8), 35-39. DOI : 10.22156/CS4SMB.2020.10.08.035 DOI

KSCI

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition 음성 인식을 위한 개선된 평균 예측 LMS 필터를 이용한 DNN 기반의 강인한 음성 특징 추출 및 신호 잡음 제거 기법

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition