DOI QR코드

DOI QR Code

Speech/Music Classification Based on the Higher-Order Moments of Subband Energy

  • Seo, Jiin Soo (Dept. of Electrical Eng., Gangneung-Wonju National University)
  • Received : 2018.03.31
  • Accepted : 2018.06.11
  • Published : 2018.07.31

Abstract

This paper presents a study on the performance of the higher-order moments for speech/music classification. For a successful speech/music classifier, extracting features that allow direct access to the relevant speech or music specific information is crucial. In addition to the conventional variance-based features, we utilize the higher-order moments of features, such as skewness and kurtosis. Moreover, we investigate the subband decomposition parameters in extracting features, which improves classification accuracy. Experiments on two speech/music datasets, which are publicly available, were performed and show that the higher-order moment features can improve classification accuracy when combined with the conventional variance-based features.

Keywords

References

  1. Z. Fu, G. Lu, K.M. Ting, and D. Zhang, “A Survey of Audio-based Music Classification and Annotation,” IEEE Transactions on Multimedia, Vol. 13, No. 2, pp. 303-319, 2011. https://doi.org/10.1109/TMM.2010.2098858
  2. G. Park, S.Y. Park, and S.J. Kang, "Effective Mood Classification Method Based on Music Segments," Journal of Korea Multimedia Society, Vol. 10, No. 3, pp. 391-400, 2007.
  3. E. Scheirer and M. Slaney, "Construction and Evaluation of a Robust Multifeature Speech/ Music Discriminator," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1331-1334, 1997.
  4. G. Sell and P. Clark, "Music Tonality Features for Speech/Music Discrimination," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2489-2493, 2014.
  5. J. Saunders, "Real Time Discrimination of Broadcast Speech/Music," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 993-996, 1996.
  6. M. Kos, M. Grasic, and Z. Kacic, "Online Speech/Music Segmentation Based on the Variance Mean of Filter Bank Energy," EURASIP Journal on Advances in Signal Processing, pp. 1-13, 2009.
  7. M. Kos, Z. Kacic, and D. Vlaj, “Acoustic Classification and Segmentation Using Modified Spectral Roll-off and Variance-Based Features,” Digital Signal Processing, Vol. 23, No. 2, pp. 659-674, 2013. https://doi.org/10.1016/j.dsp.2012.10.008
  8. M. Carey, E. Parris, and H. Thomas, "A Comparison of Features for Speech, Music Discrimination," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 149-152, 1999.
  9. B.K. Khonglah and S.M. Prasanna, "Speech/Music Classification Using Speech-specific Features," Digital Signal Processing, Vol. 48, No. 1, pp. 71-83, 2016. https://doi.org/10.1016/j.dsp.2015.09.005
  10. K. Kim, A. Baijal, B. Ko, S. Lee, I. Hwang, and Y. Kim, et al., "Speech Music Discrimination Using an Ensemble of Biased Classifiers," Proceeding of Audio Engineering Society Convention 139, pp. 9457, 2015.
  11. A. Pikrakis and S. Theodoridis, "Speech-music Discrimination: A Deep Learning Perspective," Proceeding of European Signal Processing Conference, pp. 616-620, 2014.
  12. J. Seo and S. Lee, “Higher-order Moments for Musical Genre Classification,” Signal Processing, Vol. 91, No. 8, pp. 2154-2157, 2011. https://doi.org/10.1016/j.sigpro.2011.03.019
  13. E. Gu and K.H. Park, “Defect Detection Algorithm of TFT-LCD Polarizing Film Using the Probability Density Function Based on Cluster Characteristic,” Journal of Korea Multimedia Society, Vol. 19, No. 3, pp. 633- 641, 2016. https://doi.org/10.9717/kmms.2016.19.3.633
  14. D. O'Shaughnessy, Speech Communications: Human and Machine, Wiley-IEEE Press, Piscataway, 1999.
  15. J. Seo, “A Music Similarity Function Based on the Centroid Model,” IEICE Transactions on Information and Systems, Vol. 96, No. 7, pp. 1573-1576, 2013.
  16. G. Tzanetakis, GTZAN music/speech collection, http://marsyasweb.appspot.com/download/data_sets/ (accessed July, 24, 2018).
  17. G. Tzanetakis and P. Cook, “Musical Genre Classification of Audio Signals,” IEEE Transactions on Speech and Audio Processing, Vol. 10, No. 5, pp. 293-302, 2002. https://doi.org/10.1109/TSA.2002.800560
  18. V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: An ASR Corpus Based on Public Domain Audio Books," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5206-5210, 2015.
  19. H. Ghaemmaghami, D. Dean, and S. Sridharan, "Speaker Attribution of Australian Broadcast News Data," Proceeding of Workshop on Speech, Language and Audio in Multimedia, pp. 72-77, 2013.
  20. J. Seo, “Speaker Change Detection Based on a Weighted Distance Measure over the Centroid Model,” IEICE Transactions on Information and Systems, Vol. 95, No. 5, pp. 1543-1546, 2012.
  21. C. Chang and C. Lin, “LIBSVM: A Library for Support Vector Machines,” ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, pp. 155-166, 2011.
  22. C. Panagiotakis and G. Tziritas, “A Speech/ Music Discriminator Based on RMS and Zero-crossings,” IEEE Transactions on Multimedia, Vol. 7, No. 1, pp. 155-166, 2005. https://doi.org/10.1109/TMM.2004.840604

Cited by

  1. 오프라인 매장에서 고객 순번 관리를 위한 스마트 시스템 vol.21, pp.8, 2018, https://doi.org/10.9717/kmms.2018.21.8.925