Browse > Article
http://dx.doi.org/10.4218/etrij.10.1510.0024

Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition  

Lee, Sung-Joo (Software Research Laboratory, ETRI)
Kang, Byung-Ok (Software Research Laboratory, ETRI)
Jung, Ho-Young (Software Research Laboratory, ETRI)
Lee, Yun-Keun (Software Research Laboratory, ETRI)
Kim, Hyung-Soon (Department of Electronics Engineering, Pusan National University)
Publication Information
ETRI Journal / v.32, no.5, 2010 , pp. 801-809 More about this Journal
Abstract
This paper presents a statistical model-based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decision-directed Wiener filter, we combine a decision-directed method with an original spectrum reconstruction method and develop a new two-stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource-constrained automotive devices is considered, ETSI standard advance distributed speech recognition font-end (ETSI-AFE) can be an effective solution, and ETSI-AFE is also based on the decision-directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI-AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced.
Keywords
Speech enhancement; ETSI standard Aurora advanced front-end; two-stage mel-warped Wiener filter; clean spectrum reconstruction; Gaussian mixture model; speech recognition;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
Times Cited By Web Of Science : 1  (Related Records In Web of Science)
Times Cited By SCOPUS : 3
연도 인용수 순위
1 H. Sameti et al., "HMM-Based Strategies for Enhancement of Speech Signals Embedded in Nonstationary Noise," IEEE Trans. Speech Audio Process., vol. 6, Sept. 1998, pp. 445-455.   DOI   ScienceOn
2 Y. Ephraim, "Statistical-Model-Based Speech Enhancement Systems," Proc. IEEE, vol. 80, no. 10, Oct. 1992, pp. 1526- 1555.   DOI   ScienceOn
3 J. Wu et al., "A Noise-Robust ASR Front-End Using Wiener Filter Constructed from MMSE Estimation of Clean Speech and Noise," Proc. IEEE-ASRU Workshop, 2003, pp. 321-326.
4 T. Arakawa, M. Tsujikawa, and R. Isotani, "Model-Based Wiener Filter for Noise Robust Speech Recognition," Proc. ICASSP, 2006, pp. 537-540.
5 N. Wiener, The Extrapolation, Interpolation, and Smoothing of Stationary Time Series, Wiley: NY, 1949.
6 A. Kain and M. Macon, "Spectral Voice Conversion for Text- To-Speech Synthesis," Proc. ICASSP, 1998, pp. 285-288.
7 K. Park and H.S. Kim, "Narrowband to Wideband Conversion of Speech using GMM based Transformation," Proc. ICASSP, vol. 3, June 2000, pp. 1843-1846.
8 B. Kang, H. Jung, and Y. Lee, "Discriminative Noise Adaptive Training Approach for an Environment Migration," Proc. INTERSPEECH, Aug. 2007, pp. 2085-2089.
9 H. Jung, B. Kang, and Y. Lee, "Model Adaptation using Discriminative Noise Adaptive Approach for New Environments," ETRI J., vol. 30, no. 6, Dec. 2008, pp. 865-867.   DOI   ScienceOn
10 S. Lee et al., "A Commercial Car Navigation System Using Korean Large Vocabulary Automatic Speech Recognizer," Proc. APSIPA ASC, Oct. 2009, pp. 286-289.
11 S. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans., Acoustics, Speech, Signal Process., vol. 27, no. 2, Apr. 1979, pp. 113-120.   DOI
12 Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121.   DOI
13 Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. 33, no. 2, Apr. 1985, pp. 443-445.   DOI
14 W. Wu and P. Chen, "Subband Kalman Filtering for Speech Enhancement," IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process., vol. 45, no. 8, Aug. 1998, pp. 1072-1083.   DOI   ScienceOn
15 J. Gibson, B. Koo, and S. Gray, "Filtering of Colored Noise for Speech Enhancement and Coding," IEEE Trans. Signal Process., vol. 39, no. 8, Aug. 1991, pp. 1732-1742.   DOI   ScienceOn
16 N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System," IEEE Trans. Speech Audio Process., vol. 7, no. 2, Mar. 1999, pp. 126- 137.   DOI   ScienceOn
17 Y. Gong, "Speech Recognition in Noisy Environments: a Survey," Speech Commun., vol. 16, no. 3, Apr. 1995, pp. 261-291.   DOI   ScienceOn
18 D. Macho et al., "Evaluation of a Noise-Robust DSR Front-End on Aurora Databases," Proc. ICSLP, Sept. 2002, pp. 17-20.
19 Y. Suh and H. Kim, "Feature Compensation Combining SNRDependent Feature Reconstruction and Class Histogram Equalization," ETRI J., vol. 30, no. 5, Oct. 2008, pp. 753-755.   DOI   ScienceOn
20 J. Lim and A. Oppenheim, "Enhancement and Bandwidth Compression of Noisy Speech," Proc. IEEE, vol. 67, no. 12, Dec. 1979, pp. 1586-1604.   DOI
21 ETSI Std. Document, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm," ETSI ES 202 050 V1.1.1 (2002-10).
22 M. Cheng et al., "A Robust Front-End Algorithm for Distributed Speech Recognition," Proc. EUROSPEECH, 2001, pp. 425-428.
23 A. Agarwal and Y. Cheng, "Two-Stage Mel-Warped Wiener Filter for Robust Speech Recognition," Proc. IEEE-ASRU Workshop, 1999, pp. 12-15.