Browse > Article
http://dx.doi.org/10.3837/tiis.2019.02.026

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator  

Park, Hyung Woo (Information and Telecommunication Engineering Department, Soongsil university)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.2, 2019 , pp. 962-977 More about this Journal
Abstract
The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.
Keywords
CELP; QCELP; Voice codec enhancement; SNR estimation;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Colin Breithaupt and Rainer Martin, "Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions," IEEE Trans. on Audio, Speech, and Language processing, Vol.19, No.2, pp.277-289, 2011.   DOI
2 D. Y. Zhao, W. Bastiaan Kleijn, A. Ypma and Bert de Vries, "Online Noise Estimation Using Stochastic-Gain HMM for Speech Enhancement," IEEE. Trans. on Audio, Speech and Language Processing, Vol. 16, No. 4, pp. 835-846, May. 2008.   DOI
3 A. J. Accardi and R. V. Cox., "A modular approach to speech enhancement with an application to speech coding," J. Acout. Soc. Am, Vol. 10, No. 3, pp. 1245, Sep. 2001.
4 E. Nemer, R. Goubran and S. Mahmoud, "Speech Enhancement Using Fourth-Order Cumulants and Time-Domain Optimal filters," Sixth European Conference on Speech Communication and Technology, 1999.
5 G. M. Davis, Noise Reduction in Speech Applications, CRC Press, Chapter 1, Chapter 6, 2002.
6 Gang Wang, Chunguang Li and Le Dong, "Noise Estimation Using Mean Square Cross Prediction Error for Speech Enhancement," IEEE Tran. on Circuits and Systems-I, Vol. 57, No. 7, 2010.
7 HeaKyung Jung, YuJin Kim, and JaeHo Chung, "Formant-broadened CMS Using the Log-spectrum Transformed from the Cepstrum," Journal of Acoustical Society of Korea, Vol.21, No.4, pp.361-373, 2002.
8 Y. H. Song, J. H. Ahn, and M. J. Bae, "On the noise detection from correlation of near pitch waveforms," GESTS Society, GESTS Int'l Trans. Computer Science and Engineering, Vol.44, No.1, pp.45-54, Jan. 2008.
9 L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. New Jersey: Englewood Cliffs, Prentice Hall, 1978.
10 MyungJin Bae, Sanghyo Lee, Digital Voice Analysis, Dongyoung press, 1998.
11 L. R. Rabiner and R. W. Schafer, Theory and Applications of Digital Speech Processing, PEARSON Education, 2011.
12 L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing, 2007.
13 M. BAE, H. YOON, S. ANN, "On Altering the Pitch of Speech Signals in Waveform Coding -Alteration Method by the LPC and Pitch Halving," Journal of the Acoustical Society of Korea, Vol.10, No.5, pp.11-19, 1991.
14 S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Tran. on Acoustics, Speech and Signal Processing, Vol.27, No.2, pp.133-120, 1979.
15 Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error short-time spectral amplitude estimator," IEEE. Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-32, no. 6, pp. 443-445, Apr. 1985.   DOI
16 Y. Ephraim and I. Cohen, "Recent advancements in speech enhancement," The Electrical Engineering Handbook, 3rd ed. Boca Raton, FL: CRC Press, to be published [Online].
17 S. Gazor and W. Zhang, "A soft voice activity detector based on a Laplacian-Gaussian model," IEEE. Trans. Speech Audio Processing, Vol. 11, No. 5, pp. 498-505, Sep. 2003.   DOI
18 S. Maithani and R. Tyagi, "Noise Characterization and Classification for Background Estimation," in Proc. of IEEE, International Conference on Signal Processing Communications and Networking, pp. 208-213, Jan. 2008.
19 Fant, C. G. M., Acoustic theory of speech production. Royal Institute of Technology, Division of Telegraphy - Telephony, Report No. 10 (Stockholm), 1958.
20 Sadaoki Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Tran. on ASSP, Vol.29, No.2, pp.254-272, 1981. https://doi.org/10.1109/tassp.1981.1163530   DOI
21 Hyung Woo Park, "Enhancement of the Variable Rate Vocoder with the Spectral SNR Estimate," KSII The 11th Asia Pacific International Conference on Information Science and Technology(APIC-IST), 2016.
22 Li, Jingchao. "A New Robust Signal Recognition Approach Based on Holder Cloud Features under Varying SNR Environment." KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, Vol.9, No.12, pp.4934-4949, 2015.
23 Li, Jingchao. "A Novel Recognition Algorithm Based on Holder Coefficient Theory and Interval Gray Relation Classifier." KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, Vo.9, No.11, pp.4573-4584, 2015.   DOI
24 Seong-Geon Bae, Hyung-Woo Park and Myung-Jin Bae, "On a New Enhancement of speech Signal using Non-uniform Sampling and Post Filter," Springer, LNCS, 2012.
25 W. Jiang, W. K. Lo and H. Meng, "A New Voice Activity Detection Method Using Maximized Sub-band SNR," IEEE. ICALIP2010, pp.80-84, 2010.
26 WangRae Jo and MyungJin Bae, "On a Fast Pitch detection using the Cepstrum Analysis," GESTS Society, GESTS International Transactions on Acoustic Science and Engineering, Vol.2, No.1, pp.1-8, 2004.
27 B. H. Juang, L. R. Rabiner, and J. G. Wilpon, "On the use of bandpass liftering in speech recognition," IEEE. Tran. on ASSP, Vol.35, pp.947-954, 1987.
28 A. M. Kondoz, Digital Speech-Coding for Low Bit Rate Communications Systems, John Wiley & Sons, 1994.
29 IS-733 draft, TIA/EIA.
30 D. W. Griffin and J. S. Lim, "Multiband Excitation Vocoder," IEEE. Tran. on Acoustics, Speech and Signal processing, Vol. 36, No. 8, August 1988.
31 IMBE VOCODER DESCRIPTION, Digital Voice System, 1993.
32 ITU-T Recommendation P.862, ITU-T.
33 Hyung Woo Park, Myung-Sook Kim and Myung-Jin Bae, "Improving Pitch Detection through Emphasized Harmonics in Time-Domain," Computer Applications for Database, Education, and Ubiquitous Computing. EL 2012, DTA 2012. Communications in Computer and Information Science, CCIS, vol 352, Springer, 2012.
34 Hong Kook Kim and Richard C. Rose, "Cepstrum-Domain Model Combination Based on Decomposition of Speech and Noise Using MMSE-LSA for ASR in Noisy Environments," IEEE Tran. ON Audio, Speech AND Language Processing, VOL. 17, NO. 4, pp. 704-713, 2009.   DOI
35 Hyung woo Park, A-Ra Khil and Myung-Jin Bae, "Pitch Detection based on Signal to Noise Ratio Estimation and Compensation for Continuous Speech Signal," Convergence and Hybrid Information Technology. ICHIT 2012. Communications in Computer and Information Science, CCIS, Vol.310, pp.767-774, Springer, 2012.
36 Hyung-Woo Park, Seong-Geon Bae and Myung-Jin Bae, "Pitch Gross Error Compensation in Continuous Speech," Springer, LNCS, 2012.
37 H.W Park and M.J Bae, "IMBE Model Based SNR Estimation of Continuous Speech Signals," The Acoustical Society of Korea, Vol.29, No.2, pp.148-153, 2010.
38 Myung-Suk Song, Chang-Heon Lee, Seok-Pil Lee and Hong-Goo Kang, "Performance Analysis of a Class of Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition," Journal of Acoustical Society of Korea, Vol.29, No.2E, pp.86-99, 2010.
39 Arun Narayanan and DeLiang Wang, "A CASA-Based System for Long-Term SNR Estimation," IEEE. Trans. on Audio, Speech, and Language processing, Vol. 20, No.9, 2012.
40 Colin Breithaupt and Rainer Martin, "DFT-based Speech Enhancement for Robust Automatic Speech Recognition," ITG-Fachtagung, Aachen, Germany, 2008.