Browse > Article
http://dx.doi.org/10.5573/ieie.2017.54.5.85

Speech Enhancement using RNN Phoneme based VAD  

Lee, Kang (Dept. of Electronic, Computer Information Engineering, Inha University)
Kang, Sang-Ick (Dept. of Electronic, Computer Information Engineering, Inha University)
Kwon, Jang-woo (Dept. of Electronic, Computer Information Engineering, Inha University)
Lee, Samgmin (Dept. of Electronic, Computer Information Engineering, Inha University)
Publication Information
Journal of the Institute of Electronics and Information Engineers / v.54, no.5, 2017 , pp. 85-89 More about this Journal
Abstract
In this papers, we apply high performance hardware and machine learning algorithm to build an advanced VAD algorithm for speech enhancement. Since speech is made of series of phoneme, using recurrent neural network (RNN) which consider previous data is proper method to build a speech model. It is impossible to study every noise in real world. So our algorithm is builded by phoneme based study. we detect voice present frames in noisy speech signal and make enhancement of the speech signal. Phoneme based RNN model shows advanced performance in speech signal which has high correlation among each frames. To verify the performance of proposed algorithm, we compare VAD result with label data and speech enhancement result in various noise environments with previous speech enhancement algorithm.
Keywords
RNN; GMM; Phoneme; VAD; MMSE;
Citations & Related Records
연도 인용수 순위
  • Reference
1 H. G. Hirsch, and D. Pearce. "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW). 2000.
2 Y. S. Park, H. S. Ahn, and S. M. Lee, "Speech Enhancement Based on Teager Energy and Speech Absence Probability in Noisy Environments." IEIE Journal-SP, vol. 49. no. 13, pp. 81-88, 2012.   DOI
3 Y. Wang and D. Wang, "Towards scaling up classification-based speech separation," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 12, pp. 1381-1390, July 2013.   DOI
4 D. Burshtein, and S. Gannot, "Speech enhancement using a mixture-maximum model," IEEE transactions on speech and audio processing, vol. 10, no. 6 pp. 341-351, 2002.   DOI
5 Loizou, Philipos C. "Speech enhancement: theory and practice." CRC press, 2013.
6 A. W. Rix, J. G. Beerends, M. P. Hollier, P. Hekstra "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs." Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP'01). 2001 IEEE International Conference on. Vol. 2. IEEE, 2001.
7 J. S. Garofolo, L. F. Lamel, W. M. Fisher, and J. G. Fiscus, "TIMIT acoustic-phonetic continuous speech corpus," Linguistic data consortium, Philadelphia vol. 33, 1993.
8 C. Lopes, and F. Perdigao. "Phone recognition on the TIMIT database," Speech Technologies/Book 1, pp. 285-302, 2011.
9 A. Varga and H. J. Steeneken, "Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech communication, vol. 12, no. 3, pp. 247- 251, 1993.   DOI
10 I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging," IEEE Transactions on speech and audio processing vol. 11, no. 5, pp. 466-475, 2003.   DOI