[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9717/kmms.2018.21.7.737

Speech/Music Classification Based on the Higher-Order Moments of Subband Energy

Seo, Jiin Soo (Dept. of Electrical Eng., Gangneung-Wonju National University)

Publication Information

Journal of Korea Multimedia Society / v.21, no.7, 2018 , pp. 737-744 More about this Journal

Abstract

This paper presents a study on the performance of the higher-order moments for speech/music classification. For a successful speech/music classifier, extracting features that allow direct access to the relevant speech or music specific information is crucial. In addition to the conventional variance-based features, we utilize the higher-order moments of features, such as skewness and kurtosis. Moreover, we investigate the subband decomposition parameters in extracting features, which improves classification accuracy. Experiments on two speech/music datasets, which are publicly available, were performed and show that the higher-order moment features can improve classification accuracy when combined with the conventional variance-based features.

Keywords

Speech/Music Classification; Audio Segmentation; Subband Energy; Skewness; Kurtosis;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Z. Fu, G. Lu, K.M. Ting, and D. Zhang, “A Survey of Audio-based Music Classification and Annotation,” IEEE Transactions on Multimedia, Vol. 13, No. 2, pp. 303-319, 2011. DOI
2	G. Park, S.Y. Park, and S.J. Kang, "Effective Mood Classification Method Based on Music Segments," Journal of Korea Multimedia Society, Vol. 10, No. 3, pp. 391-400, 2007.
3	E. Scheirer and M. Slaney, "Construction and Evaluation of a Robust Multifeature Speech/ Music Discriminator," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1331-1334, 1997.
4	G. Sell and P. Clark, "Music Tonality Features for Speech/Music Discrimination," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2489-2493, 2014.
5	J. Saunders, "Real Time Discrimination of Broadcast Speech/Music," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 993-996, 1996.
6	M. Kos, M. Grasic, and Z. Kacic, "Online Speech/Music Segmentation Based on the Variance Mean of Filter Bank Energy," EURASIP Journal on Advances in Signal Processing, pp. 1-13, 2009.
7	M. Kos, Z. Kacic, and D. Vlaj, “Acoustic Classification and Segmentation Using Modified Spectral Roll-off and Variance-Based Features,” Digital Signal Processing, Vol. 23, No. 2, pp. 659-674, 2013. DOI
8	M. Carey, E. Parris, and H. Thomas, "A Comparison of Features for Speech, Music Discrimination," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 149-152, 1999.
9	K. Kim, A. Baijal, B. Ko, S. Lee, I. Hwang, and Y. Kim, et al., "Speech Music Discrimination Using an Ensemble of Biased Classifiers," Proceeding of Audio Engineering Society Convention 139, pp. 9457, 2015.
10	A. Pikrakis and S. Theodoridis, "Speech-music Discrimination: A Deep Learning Perspective," Proceeding of European Signal Processing Conference, pp. 616-620, 2014.
11	J. Seo and S. Lee, “Higher-order Moments for Musical Genre Classification,” Signal Processing, Vol. 91, No. 8, pp. 2154-2157, 2011. DOI
12	E. Gu and K.H. Park, “Defect Detection Algorithm of TFT-LCD Polarizing Film Using the Probability Density Function Based on Cluster Characteristic,” Journal of Korea Multimedia Society, Vol. 19, No. 3, pp. 633- 641, 2016. DOI
13	D. O'Shaughnessy, Speech Communications: Human and Machine, Wiley-IEEE Press, Piscataway, 1999.
14	J. Seo, “A Music Similarity Function Based on the Centroid Model,” IEICE Transactions on Information and Systems, Vol. 96, No. 7, pp. 1573-1576, 2013.
15	G. Tzanetakis, GTZAN music/speech collection, http://marsyasweb.appspot.com/download/data_sets/ (accessed July, 24, 2018).
16	G. Tzanetakis and P. Cook, “Musical Genre Classification of Audio Signals,” IEEE Transactions on Speech and Audio Processing, Vol. 10, No. 5, pp. 293-302, 2002. DOI
17	V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: An ASR Corpus Based on Public Domain Audio Books," Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5206-5210, 2015.
18	B.K. Khonglah and S.M. Prasanna, "Speech/Music Classification Using Speech-specific Features," Digital Signal Processing, Vol. 48, No. 1, pp. 71-83, 2016. DOI
19	H. Ghaemmaghami, D. Dean, and S. Sridharan, "Speaker Attribution of Australian Broadcast News Data," Proceeding of Workshop on Speech, Language and Audio in Multimedia, pp. 72-77, 2013.
20	J. Seo, “Speaker Change Detection Based on a Weighted Distance Measure over the Centroid Model,” IEICE Transactions on Information and Systems, Vol. 95, No. 5, pp. 1543-1546, 2012.
21	C. Chang and C. Lin, “LIBSVM: A Library for Support Vector Machines,” ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, pp. 155-166, 2011.
22	C. Panagiotakis and G. Tziritas, “A Speech/ Music Discriminator Based on RMS and Zero-crossings,” IEEE Transactions on Multimedia, Vol. 7, No. 1, pp. 155-166, 2005. DOI