DOI QR코드

DOI QR Code

Analysis and Implementation of Speech/Music Classification for 3GPP2 SMV Based on GMM

3GPP2 SMV의 실시간 음성/음악 분류 성능 향상을 위한 Gaussian Mixture Model의 적용

  • 송지현 (인하대학교 전자전기공학부) ;
  • 이계환 (인하대학교 전자전기공학부) ;
  • 장준혁 (인하대학교 전자전기공학부)
  • Published : 2007.11.30

Abstract

In this letter, we propose a novel approach to improve the performance of speech/music classification for the selectable mode vocoder(SMV) of 3GPP2 using the Gaussian mixture model(GMM) which is based on the expectation-maximization(EM) algorithm. We first present an effective analysis of the features and the classification method adopted in the conventional SMV. And then feature vectors which are applied to the GMM are selected from relevant Parameters of the SMV for the efficient speech/music classification. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional scheme of the SMV.

본 논문에서는 음성 인식과 음악 인식에서 뛰어난 성능을 보이는 Expectation-Maximization(EM) 알고리즘 기반의 패턴인식기법인 가우시안 혼합모델(Gaussian Mixture Model, GMM)을 이용하여 기존의 3GPP2 Selectable Mode Vocoder(SMV)의 실시간 음성/음악 분류 성능을 향상 시키는 방법을 제안한다 SMV의 음성/음악 실시간 분류 알고리즘에서 사용된 특징벡터와 분류방법을 분석하고, 이를 기반으로 분류성능향상을 위해 패턴인식 알고리즘인 GMM을 도입한다. 구체적으로, SMV의 음성/음악 분류알고리즘에서 사용되어진 특징벡터만을 선택적으로 사용하여 효과적인 GMM을 구성한 실시간 분류기법이 제시되었다. SMV의 음성/음악 분류에 적용한 GMM의 성능 평가를 위해 SMV 원래의 분류알고리즘과 비교하였으며, 다양한 음악장르에 대해 시스템의 성능을 평가한 결과 GMM을 이용하였을 때 기존의 SMV의 방법보다 우수한 음성/음악 분류 성능을 보였다.

Keywords

References

  1. Y. Gao. E. Shlomot, A. Benyassine, J. Thyssen, Huan-yu Su, and C. Murgia, 'The SMV Algorithm Selected by TIA and 3GPP2 for COMA Applications,' Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 709-712, May 2001
  2. 3GPP2 Spec., 'Source-controlled variable-rate multimedia wideband speech codec (VMR-WB), service option 62 and 63 for spread spectrum systems,' 3GPP2-C.S0052-A, v.1.0, Apr. 2005
  3. J. Saunders, 'Real-time discrimination of broadcast speech/music,' Proc. IEEE International Conference on Acoustics, Speech, and Processing, 2, 993-996, May 1996
  4. W. Q. Wang, W. Gao, and D. W. Ying, 'A fast and robust speech/music Discrimination Approach,' Proc. International Conference on Information, Communications and Signal Processing, 3, 1325-1329, Dec. 2003
  5. D. A. Reynolds, T. F. Ouatieri. and R. B. Dunn, 'Speaker verification using adapted Gaussian mixture models,' Digital Signal Processing, 10, 19-41, Jan. 2000 https://doi.org/10.1006/dspr.1999.0361
  6. D. A. Reynolds, and R. C. Rose, 'Robust text- independent speaker identification using Gaussian mixture models,' IEEE Transactions on Speech and Audio Processing, 3, 72-83, Jan. 1995 https://doi.org/10.1109/89.365379
  7. J. Makinen, P. Ojala, and H. Toukomaa, 'Performance comparison of source controlled GSM AMR and SMV vocoders,' Proc. International Symposium on Intelligent Signal Processing and Communication Systems, 51-154, Nov. 2004
  8. C. V. Goudar, P. Rabha, M. Deshpande, and A. Rao, 'SMVLite: Reduced Complexity Selectable Mode Vocoder,' Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 1, 701-704, May 2006
  9. S. Craig Greer, and A. Dejaco, 'Standardization of the selectable mode vocoder,' Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 953-956, May 2001
  10. 3GPP2 Spec., 'Selectable Mode Vocoder (SMV) Service Option for Wideband Spread Spectrum Communication Systems,' 3GPP2-C.S0030-0, v3.0, Jan. 2004
  11. P. Kabal, R. Prakash and Ramachandran, 'The computation of line spectral frequencies using Chebyshey polynomials,' IEEE Trans. Acoustics, speech and signal processing, ASSP-34, (6) 1419-1426, Dec. 1986
  12. P. Vary and R. Martin, Digital Speech Transmission : enhancement, coding and error concealment, (182-187, 2006)
  13. A. R. Abu-El-Quran and R. A. Goubran, 'Pitch-based feature extraction for audio classification,' Proc. IEEE International Workshop on Haptic, Audio and Visual Environments and Their Applications, 43-47, Sep. 2003
  14. Y. D. Cho, and A. Kondoz, 'Analysis and improvement of a statistical model-based voice activity detector,' IEEE Signal Process. Lett., 8, 276-278, Oct. 2001 https://doi.org/10.1109/97.957270
  15. W. M. Fisher, G. R. Doddington and K. M. Goudie-Marshall, 'The DARPA speech recognition research database: Specifications and status,' Proc. DARPA Workshop Speech Recognition, 93-99, Feb. 1986