DOI QR코드

DOI QR Code

A Study on the Performance of Music Retrieval Based on the Emotion Recognition

감정 인식을 통한 음악 검색 성능 분석

  • Seo, Jin Soo (Department of Electronic Engineering Gangneung-Wonju National University)
  • 서진수 (강릉원주대학교 전자공학과)
  • Received : 2015.02.02
  • Accepted : 2015.03.17
  • Published : 2015.05.31

Abstract

This paper presents a study on the performance of the music search based on the automatically recognized music-emotion labels. As in the other media data, such as speech, image, and video, a song can evoke certain emotions to the listeners. When people look for songs to listen, the emotions, evoked by songs, could be important points to consider. However; very little study has been done on the performance of the music-emotion labels to the music search. In this paper, we utilize the three axes of human music perception (valence, activity, tension) and the five basic emotion labels (happiness, sadness, tenderness, anger, fear) in measuring music similarity for music search. Experiments were conducted on both genre and singer datasets. The search accuracy of the proposed emotion-based music search was up to 75 % of that of the conventional feature-based music search. By combining the proposed emotion-based method with the feature-based method, we achieved up to 14 % improvement of search accuracy.

본 논문은 자동으로 분류된 음악 신호의 감정을 기반으로 하는 음악 검색 의 성능을 분석하였다. 음성, 영상 등의 다른 미디어 신호와 마찬가지로 음악은 인간에게 특정한 감정을 불러일으킬 수 있다. 이러한 감정은 사람들이 음악을 검색할 때 중요한 고려요소가 될 수 있다. 그렇지만 아직까지 음악의 감정을 직접 인식하여 음악 검색을 수행하고 성능을 검증한 경우는 거의 없었다. 본 논문에서는 음악 감정을 표현하는 주요한 세 축인 유발성, 활성, 긴장 과 기본 5대 감정인 행복, 슬픔, 위안, 분노, 불안의 정도를 구하고, 그 값들의 유사도를 기반으로 음악 검색을 수행하였다. 장르와 가수 데이터셋에서 실험을 수행하였다. 제안된 감정 기반 음악 검색 성능은 기존의 특징 기반 방법의 성능에 대비해서 최대 75 % 수준의 검색 정확도를 보였다. 또한 특징 기반 방법을 제안된 감정 기반 방법과 병합할 경우 최대 14 % 검색 성능 향상을 이룰 수 있었다.

Keywords

References

  1. M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, "Content-based music information retrieval: Current directions and future challenges," Proc. IEEE 96, 668-696 (2008).
  2. P. Cano, E. Battle, T. Kalker, and J. Haitsma, "A review of audio fingerprinting," J. VLSI Sig. Process. 41, 271-84 (2005). https://doi.org/10.1007/s11265-005-4151-3
  3. J. Seo, "A robust audio fingerprinting method based on segmentation boundaries" (in Korean), J. Acoust. Soc. Kr. 31, 260-265 (2012). https://doi.org/10.7776/ASK.2012.31.4.260
  4. G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Speech Audio Process. 10, 293-302 (2002). https://doi.org/10.1109/TSA.2002.800560
  5. B. Logan and A. Salomon, "A music similarity function based on signal analysis," in Proc. ICME-2001, 745-748 (2001).
  6. J. Seo, "A music similarity function based on the centroid model," IECIC Trans. Info. and Sys. 96, 1573-1576 (2013).
  7. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital. Sig. Process. 10, 19-41 (2000). https://doi.org/10.1006/dspr.1999.0361
  8. C. Cao and M. Li, "Thinkit's submissions for MIREX 2009 audio music classification and similarity tasks," in Proc. ISMIR-2009 (2009).
  9. C. Charbuillet, D. Tardieu, and G. Peeters, "GMM supervector for content based music similarity," in Proc. DAFX-2011, 425-428 (2011).
  10. W. M. Campbell, D. E. Sturim, and D. A. Reynolds, "Support vector machines using GMM supervectors for speaker verification," IEEE Signal Process. Lett. 13, 308-311 (2006). https://doi.org/10.1109/LSP.2006.870086
  11. Y. H. Yang, Y. C. Lin, Y. F. Su, and H. H. Chen, "A regression approach to music emotion recognition," IEEE Trans. Audio, Speech, Language Process. 16, 448- 457 (2008). https://doi.org/10.1109/TASL.2007.911513
  12. T. Eerola, O. Lartillot, and P. Toiviainen, "Prediction of multidimensional emotional ratings in music from audio using multivariate regression models," in Proc. ISMIR-2009, 621-626 (2009).
  13. M. Barthet, G. Fazekas, and M Sandler, "Music emotion recognition: from content-to context-based models," From Sounds to Music and Emotions, 228-252 (2013).
  14. J. A. Russell, "A circumplex model of affect," J. pers. soc. psychol. 39, 1161-1178 (1980). https://doi.org/10.1037/h0077714
  15. E. Bigand, S. Vieillard, F. Madurell, J. Marozeau, and A. Dacquet, "Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts," Cognition & Emotion 19, 1113-1139 (2005). https://doi.org/10.1080/02699930500204250
  16. U. Schimmack and R. Reisenzein, "Experiencing activation: Energetic arousal and tense arousal are not mixtures of valence and activation," Emotion 2, 412-417 (2002). https://doi.org/10.1037/1528-3542.2.4.412
  17. J. Skowronek, M. McKinney, and S. van de Par, "A demonstrator for automatic music mood estimation," in Proc. ISMIR-2007, 345-346 (2007).
  18. X. Hu, M. Bay, and J. S. Downie, "Creating a simplified music mood classification ground-truth set," in Proc. ISMIR-2007, 309-310 (2007).
  19. Y. E. Kim, E. Schmidt, and L. Emelle, "Moodswing: A collaborative game for music mood label collection," in Proc. ISMIR-2008, 231-236 (2008).
  20. J. H. Lee and X. Hu, "Generating ground truth for music mood classification using mechanical turk," in Proc. JCDL-2012, 129-138 (2012).
  21. O. Lartillot and P. Toiviainen, "A Matlab toolbox for musical feature extraction from audio," in Proc. Digital Audio Effects, 237-244 (2007).
  22. W.-J. Yoon, K.-K. Lee, and K.-S. Park, "A Study on the Efficient Feature Vector Extraction for Music Information Retrieval System" (in Korean), J. Acoust. Soc. Kr. 23, 532-539 (2004).
  23. C. Park, M. Park, S. Kim, and H. Kim, "Music Identification Using Pitch Histogram and MFCC-VQ Dynamic Pattern" (in Korean), J. Acoust. Soc. Kr. 24, 178-185 (2005).
  24. J. Lee, "How similar is too similar?: Exploring users' perceptions of similarity in playlist evaluation," in Proc. ISMIR-2011, 109-114 (2011).
  25. A. Novello, M. M. F. McKinney, and A. Kohlrausch, "Perceptual evaluation of inter-song similarity in western popular music," J. New Music Res. 40, 1-26 (2011). https://doi.org/10.1080/09298215.2010.523470