[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.10.1510.0024

Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition

Lee, Sung-Joo (Software Research Laboratory, ETRI)
Kang, Byung-Ok (Software Research Laboratory, ETRI)
Jung, Ho-Young (Software Research Laboratory, ETRI)
Lee, Yun-Keun (Software Research Laboratory, ETRI)
Kim, Hyung-Soon (Department of Electronics Engineering, Pusan National University)

Publication Information

ETRI Journal / v.32, no.5, 2010 , pp. 801-809 More about this Journal

Abstract

This paper presents a statistical model-based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decision-directed Wiener filter, we combine a decision-directed method with an original spectrum reconstruction method and develop a new two-stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource-constrained automotive devices is considered, ETSI standard advance distributed speech recognition font-end (ETSI-AFE) can be an effective solution, and ETSI-AFE is also based on the decision-directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI-AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced.

Keywords

Speech enhancement; ETSI standard Aurora advanced front-end; two-stage mel-warped Wiener filter; clean spectrum reconstruction; Gaussian mixture model; speech recognition;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)
Times Cited By Web Of Science : 1 (Related Records In Web of Science)
Times Cited By SCOPUS : 3

Reference
Cited By KSCI

1	H. Sameti et al., "HMM-Based Strategies for Enhancement of Speech Signals Embedded in Nonstationary Noise," IEEE Trans. Speech Audio Process., vol. 6, Sept. 1998, pp. 445-455. DOI ScienceOn
2	Y. Ephraim, "Statistical-Model-Based Speech Enhancement Systems," Proc. IEEE, vol. 80, no. 10, Oct. 1992, pp. 1526- 1555. DOI ScienceOn
3	J. Wu et al., "A Noise-Robust ASR Front-End Using Wiener Filter Constructed from MMSE Estimation of Clean Speech and Noise," Proc. IEEE-ASRU Workshop, 2003, pp. 321-326.
4	T. Arakawa, M. Tsujikawa, and R. Isotani, "Model-Based Wiener Filter for Noise Robust Speech Recognition," Proc. ICASSP, 2006, pp. 537-540.
5	N. Wiener, The Extrapolation, Interpolation, and Smoothing of Stationary Time Series, Wiley: NY, 1949.
6	A. Kain and M. Macon, "Spectral Voice Conversion for Text- To-Speech Synthesis," Proc. ICASSP, 1998, pp. 285-288.
7	K. Park and H.S. Kim, "Narrowband to Wideband Conversion of Speech using GMM based Transformation," Proc. ICASSP, vol. 3, June 2000, pp. 1843-1846.
8	B. Kang, H. Jung, and Y. Lee, "Discriminative Noise Adaptive Training Approach for an Environment Migration," Proc. INTERSPEECH, Aug. 2007, pp. 2085-2089.
9	H. Jung, B. Kang, and Y. Lee, "Model Adaptation using Discriminative Noise Adaptive Approach for New Environments," ETRI J., vol. 30, no. 6, Dec. 2008, pp. 865-867. DOI ScienceOn
10	S. Lee et al., "A Commercial Car Navigation System Using Korean Large Vocabulary Automatic Speech Recognizer," Proc. APSIPA ASC, Oct. 2009, pp. 286-289.
11	S. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans., Acoustics, Speech, Signal Process., vol. 27, no. 2, Apr. 1979, pp. 113-120. DOI
12	Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121. DOI
13	Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. 33, no. 2, Apr. 1985, pp. 443-445. DOI
14	W. Wu and P. Chen, "Subband Kalman Filtering for Speech Enhancement," IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process., vol. 45, no. 8, Aug. 1998, pp. 1072-1083. DOI ScienceOn
15	J. Gibson, B. Koo, and S. Gray, "Filtering of Colored Noise for Speech Enhancement and Coding," IEEE Trans. Signal Process., vol. 39, no. 8, Aug. 1991, pp. 1732-1742. DOI ScienceOn
16	N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System," IEEE Trans. Speech Audio Process., vol. 7, no. 2, Mar. 1999, pp. 126- 137. DOI ScienceOn
17	Y. Gong, "Speech Recognition in Noisy Environments: a Survey," Speech Commun., vol. 16, no. 3, Apr. 1995, pp. 261-291. DOI ScienceOn
18	D. Macho et al., "Evaluation of a Noise-Robust DSR Front-End on Aurora Databases," Proc. ICSLP, Sept. 2002, pp. 17-20.
19	Y. Suh and H. Kim, "Feature Compensation Combining SNRDependent Feature Reconstruction and Class Histogram Equalization," ETRI J., vol. 30, no. 5, Oct. 2008, pp. 753-755. DOI ScienceOn
20	J. Lim and A. Oppenheim, "Enhancement and Bandwidth Compression of Noisy Speech," Proc. IEEE, vol. 67, no. 12, Dec. 1979, pp. 1586-1604. DOI
21	ETSI Std. Document, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm," ETSI ES 202 050 V1.1.1 (2002-10).
22	M. Cheng et al., "A Robust Front-End Algorithm for Distributed Speech Recognition," Proc. EUROSPEECH, 2001, pp. 425-428.
23	A. Agarwal and Y. Cheng, "Two-Stage Mel-Warped Wiener Filter for Robust Speech Recognition," Proc. IEEE-ASRU Workshop, 1999, pp. 12-15.

1	(2010) 말소리와 음성과학 수정된 MAP 적응 기법을 이용한 음성 데이터 자동 군집화 / 6 (1) , 77
3	(2010) ETRI journal Intra-and Inter-frame Features for Automatic Speech Recognition / 36 (3) , 514
12	(2014) IEEE/ACM transactions on audio, speech, and language processing Direction-of-Arrival Based SNR Estimation for Dual-Microphone Speech Enhancement / 22 (12) , 2207
1	(2010) 디지털융복합연구 Bayesian 기법의 모수 추정을 이용한 결정트리 상태 공유 모델링 / 13 (1) , 243
7	(2010) IET signal processing Hard component detection of transient noise and its removal using empirical mode decomposition and wavelet‐based predictive filter / 12 (7) , 907
2	(2019) ETRI journal Rank-weighted reconstruction feature for a robust deep neural network-based acoustic model / 41 (2) , 235
12	(2010) Sensors Convolutional Recurrent Neural Network-Based Event Detection in Tunnels Using Multiple Microphones / 19 (12) , 2695
15	(2010) Applied sciences Auditory Device Voice Activity Detection Based on Statistical Likelihood-Ratio Order Statistics / 10 (15) , 5026
20	(2020) Sensors Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction / 20 (20) , 5751

1	Automatic Clustering of Speech Data Using Modified MAP Adaptation Technique / [Ban, Sung Min;Kang, Byung Ok;Kim, Hyung Soon;] / Phonetics and Speech Sciences
2	Intra-and Inter-frame Features for Automatic Speech Recognition / [Lee, Sung Joo;Kang, Byung Ok;Chung, Hoon;Lee, Yunkeun;] / ETRI Journal