DOI QR코드

DOI QR Code

Improvement of Speech/Music Classification Based on RNN in EVS Codec for Hearing Aids

EVS 코덱에서 보청기를 위한 RNN 기반의 음성/음악 분류 성능 향상

  • Received : 2017.04.20
  • Accepted : 2017.04.27
  • Published : 2017.05.31

Abstract

In this paper, a novel approach is proposed to improve the performance of speech/music classification using the recurrent neural network (RNN) in the enhanced voice services (EVS) of 3GPP for hearing aids. Feature vectors applied to the RNN are selected from the relevant parameters of the EVS for efficient speech/music classification. The performance of the proposed algorithm is evaluated under various conditions and large speech/music data. The proposed algorithm yields better results compared with the conventional scheme implemented in the EVS.

본 논문에서는 recurrent neural network (RNN)을 이용하여 보청기 시스템을 위한 기존의 3GPP enhanced voice services (EVS) 코덱의 음성/음악 분류 성능을 향상시키는 방법을 제시한다. 구체적으로, EVS의 음성/음악 분류 알고리즘에서 사용된 특징벡터만을 사용하여 효과적으로 RNN을 구성한 분류기법을 제시한다. 다양한 음악장르 및 잡음 환경에 대해 시스템의 성능을 평가한 결과 RNN을 이용하였을 때 기존의 EVS의 방법보다 우수한 음성/음악 분류 성능을 보였다.

Keywords

References

  1. Y Gao, E Shlomot, and A Benyassine, "The SMV algorithm selected by TIA and 3GPP2 for CDMA applications," IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 709-712, 2001.
  2. 3GPP2 Spec., Source-Controlled Variable-Rate Multimedia Wideband Speech Codec (VMR-WB), Service Option 62 and 63 for Spread Spectrum Systems, 3GPP2-C.S0052-A, v.1.0, Apr. 2005.
  3. 3GPP Spec., Codec for Enhanced Voice Services (EVS); Detailed Algorithm Description, TS 26.445, v.12.0.0, 2014.
  4. J. Saunders, "Real-time discrimination of broadcast speech/music," IEEE Int. Conf. Acoustics, Speech, and Processing, vol. 2, pp. 993996, May 1996.
  5. W. Q. Wang, W. Gao, and D. W. Ying, "A fast and robust speech/music discrimination approach," Int. Conf. Information, Communications, and Signal Processing, vol. 3, pp. 1325-1329, 2003.
  6. J. H. Song, K. H. Lee, J. H. Chang, J. K. Kim, and N. S. Kim, "Analysis and Improvement of Speech/Music Classification for 3GPP2 SMV Based on GMM," IEEE Signal Process. Lett., vol.15, pp.103-106, 2008. https://doi.org/10.1109/LSP.2007.911184
  7. C. LIM and J.-H. CHANG, "Improvement of SVM-Based Speech/Music Classification Using Adaptive Kernel Technique," IEICE TRANSACTIONS on Information and Systems, vol.95, no. 3, pp.888-891, 2012.
  8. V .Malenovsky ,T. Vaillancourt, W. Zhe, K. Choo, and V. Atti, "Two-Stage Speech/Music Classifier with Decision Smoothing and Sharpening in the EVS Codec," IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp.5718-5722, 2015.
  9. S. Karneback, "Discrimination between speech and music based on a low frequency modulation feature," European Conf. on Seech Comm. and Technology, pp. 1891-1984, 2001.
  10. A. P. Dempster, N. M. Laird and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. Royal Statiscal Soc., vol. 39, no. 1, pp. 1-38, 1977.
  11. Y. Gal and Z. Ghahramani, "Dropout as a bayesian approximation: Representing model uncertainty in deep learning," arXiv preprint arXiv:1506.02142, 2015.
  12. W. M. Fisher, G. R. Doddington, and K. M. Goudie-Marshall, "The DARPA speech recognition research database: Specification and status," DARPA Workshop Speech Recognition, pp. 93-99, 1986.