Improving SVM with Second-Order Conditional MAP for Speech/Music Classification

음성/음악 분류 향상을 위한 2차 조건 사후 최대 확률기법 기반 SVM

  • Received : 2011.02.25
  • Accepted : 2011.06.14
  • Published : 2011.09.25

Abstract

Support vector machines are well known for their outstanding performance in pattern recognition fields. One example of their applications is music/speech classification for a standardized codec such as 3GPP2 selectable mode vocoder. In this paper, we propose a novel scheme that improves the speech/music classification of support vector machines based on the second-order conditional maximum a priori. While conventional support vector machine optimization techniques apply during training phase, the proposed technique can be adopted in classification phase. In this regard, the proposed approach can be developed and employed in parallel with conventional optimizations, resulting in synergistic boost in classification performance. According to experimental results, the proposed algorithm shows its compatibility and potential for improving the performance of support vector machines.

Support vector machine (SVM)은 패턴인식 분야에 많이 사용되어지고 있고 그 한 예로서 3GPP2 selectable mode vocoder(SMV)와 같은 규격화된 코덱에 쓰여 코덱의 음성/음악 분류 성능을 향상시킬 수 있다. 본 논문에서는 SVM을 개선시켜 음성/음악의 분류성능을 더욱 향상시키는 새로운 방법을 제안한다. 음성/음악신호의 각 프레임들은 서로 강한 상관관계를 가지고 있는데, 이를 바탕으로 2차 조건 사후 최대 확률기법을 SVM에 적용하여 음성/음악 분류성능을 향상시킨다. 또한 SVM을 학습시킬 때 적용되는 기존의 기법들과는 달리 제안되는 기법은 SVM이 패턴분류를 행할 때 사용된다. 그렇기 때문에 기존의 기법들과 독립적으로 개발되고 사용될 수 있고, 따라서 패턴분류의 성능을 한층 더 향상시킬 수 있다. 실험을 통해 제안된 기법의 독립성과 성능향상을 기존의 기법들과 비교하여 증명하였다.

Keywords

References

  1. 3GPP2 Spec., "Source-controlled variable-rate multimedia wideband speech codec (VMR-WB), service option 62 and 63 for spread spectrum systems," 3GPP2-C.S0052-A, vol. 1.0, April. 2005.
  2. Y. Gao, E. Shlomot, A. Benyassine, J. Hyssen, Huan-yu Su, and C. Murgia, "The SMV algorithm selected by TIA and 3GPP2 for CDMA appications," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 709-712, May 2001.
  3. S. -K. Kim and J. -H. Chang, "Speech/music classification enhancement for 3GPP2 SMV codec based on support vector machine," IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, Vol. E92-A, no. 2, pp. 630-632, February 2009. https://doi.org/10.1587/transfun.E92.A.630
  4. S. -K. Kim and J. -H. Chang, "Discriminative weight training for support vector machine-based speech/music classification in 3GPP2 SMV codec," IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, vol. E93-A, no. 1, pp. 316-319, January 2010. https://doi.org/10.1587/transfun.E93.A.316
  5. 임정수, 송지현, 장준혁, "SVM의 미세조정을 통한 음성/음악 분류 성능향상," 전자공학회 논문지 SP편 48권 2호, 141-148쪽, 2011년 3월
  6. X. Wang, J. Chen, P Wang, Z. Huang, "Infrared human face auto locating based on SVM and a smart thermal biometrics system," in Proc. Sixth International Conference on Intelligent Systems Design and Applications (ISDA'06) , vol. 2, pp. 1066-1072, October 2006.
  7. A. Ganapathiraju, J. E. Hamaker, J. Picone, "Applications of support vector machines to speech recognition," IEEE Trans. Signal Processing, vol. 52, pp. 2348-2355, August 2004. https://doi.org/10.1109/TSP.2004.831018
  8. S. C. Greer, and A. Dejaco, "Standardization of the selectable mode vocoder," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 953-956, May 2001.
  9. C. V. Goudar, P. Rabha, M. Deshpande, and A. Rao, "SMVLite: reduced complexity selectable mode vocoder," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 701-704, May 2006.
  10. V. N. Vapnik, "An overview of statistical learning theory," IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988-999, 1999. https://doi.org/10.1109/72.788640
  11. J. -M. Kum and J. -H. Chang, "Speech enhancement based on minima controlled recursive averaging incorporating second-order conditional MAP criterion," IEEE Signal Processing Letters, Vol. 16, no. 7, pp. 624-627, July 2009. https://doi.org/10.1109/LSP.2009.2019351
  12. John C. Platt, "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods," in Advances in Large Margin Classifiers, MIT Press, pp. 61-74, 1999.
  13. J. W. Shin, H. J. Kwon, S. H. Jin, and N. S. Kim, "Voice activity detection based on conditional map criterion," IEEE Signal Processing Letters, vol. 15, no. 2, pp. 257-260, February. 2008. https://doi.org/10.1109/LSP.2008.917027
  14. W. M. Fisher, G. R. Doddington and K. M. Goudie-Marshall, "The DARPA speech recognition research database: Specifications and status," in Proc. DARPA Workshop Speech Recognition, pp. 93-99, February 1986.