[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5573/ieie.2017.54.5.85

Speech Enhancement using RNN Phoneme based VAD

Lee, Kang (Dept. of Electronic, Computer Information Engineering, Inha University)
Kang, Sang-Ick (Dept. of Electronic, Computer Information Engineering, Inha University)
Kwon, Jang-woo (Dept. of Electronic, Computer Information Engineering, Inha University)
Lee, Samgmin (Dept. of Electronic, Computer Information Engineering, Inha University)

Publication Information

Journal of the Institute of Electronics and Information Engineers / v.54, no.5, 2017 , pp. 85-89 More about this Journal

Abstract

In this papers, we apply high performance hardware and machine learning algorithm to build an advanced VAD algorithm for speech enhancement. Since speech is made of series of phoneme, using recurrent neural network (RNN) which consider previous data is proper method to build a speech model. It is impossible to study every noise in real world. So our algorithm is builded by phoneme based study. we detect voice present frames in noisy speech signal and make enhancement of the speech signal. Phoneme based RNN model shows advanced performance in speech signal which has high correlation among each frames. To verify the performance of proposed algorithm, we compare VAD result with label data and speech enhancement result in various noise environments with previous speech enhancement algorithm.

Keywords

RNN; GMM; Phoneme; VAD; MMSE;

Citations & Related Records

Reference

1	H. G. Hirsch, and D. Pearce. "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW). 2000.
2	Y. S. Park, H. S. Ahn, and S. M. Lee, "Speech Enhancement Based on Teager Energy and Speech Absence Probability in Noisy Environments." IEIE Journal-SP, vol. 49. no. 13, pp. 81-88, 2012. DOI
3	Y. Wang and D. Wang, "Towards scaling up classification-based speech separation," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 12, pp. 1381-1390, July 2013. DOI
4	D. Burshtein, and S. Gannot, "Speech enhancement using a mixture-maximum model," IEEE transactions on speech and audio processing, vol. 10, no. 6 pp. 341-351, 2002. DOI
5	Loizou, Philipos C. "Speech enhancement: theory and practice." CRC press, 2013.
6	A. W. Rix, J. G. Beerends, M. P. Hollier, P. Hekstra "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs." Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP'01). 2001 IEEE International Conference on. Vol. 2. IEEE, 2001.
7	J. S. Garofolo, L. F. Lamel, W. M. Fisher, and J. G. Fiscus, "TIMIT acoustic-phonetic continuous speech corpus," Linguistic data consortium, Philadelphia vol. 33, 1993.
8	C. Lopes, and F. Perdigao. "Phone recognition on the TIMIT database," Speech Technologies/Book 1, pp. 285-302, 2011.
9	A. Varga and H. J. Steeneken, "Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech communication, vol. 12, no. 3, pp. 247- 251, 1993. DOI
10	I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging," IEEE Transactions on speech and audio processing vol. 11, no. 5, pp. 466-475, 2003. DOI

KSCI

Speech Enhancement using RNN Phoneme based VAD 음소기반의 순환 신경망 음성 검출기를 이용한 음성 향상

Speech Enhancement using RNN Phoneme based VAD