DOI QR코드

DOI QR Code

A code-based chromagram similarity for cover song identification

커버곡 검색을 위한 코드 기반 크로마그램 유사도

  • Seo, Jin Soo (Department of Electronic Engineering, Gangneung-Wonju National University)
  • 서진수 (강릉원주대학교 전자공학과)
  • Received : 2019.02.08
  • Accepted : 2019.05.24
  • Published : 2019.05.31

Abstract

Computing chromagram similarity is indispensable in constructing cover song identification system. This paper proposes a code-based chromagram similarity to reduce the computational and the storage costs for cover song identification. By learning a song-specific codebook, a chromagram sequence is converted into a code sequence, which results in the reduction of the feature storage cost. We build a lookup table over the learned codebooks to compute chromagram similarity efficiently. Experiments on two music datasets were performed to compare the proposed code-based similarity with the conventional one in terms of cover song search accuracy, feature storage, and computational cost.

음악 커버곡 검색 시스템 구현에 있어서 크로마그램 간 유사도 계산은 필수적인 구성 요소이다. 본 논문은 크로마그램 비교에 소요되는 저장공간 및 계산량을 줄이기 위한 크로마그램 코딩 방법을 제안한다. 음악별로 코드북을 학습하여 크로마그램 수열을 코드 수열로 변환하여 저장 공간을 줄이게 된다. 얻어진 코드 간 거리를 룩업 테이블에 저장하여 크로마그램 비교의 속도를 개선하였다. 두 가지 커버곡 실험 데이터셋에서 성능 비교를 수행하여, 제안된 코드 기반 방법과 기존 방법 간의 커버곡 검색 정확도, 저장 공간, 계산량을 비교하였다.

Keywords

GOHHBH_2019_v38n3_314_f0001.png 이미지

Fig. 1. The music-similarity computation for the cover song identification based on the optimal transposition index and sequence alignment.

GOHHBH_2019_v38n3_314_f0002.png 이미지

Fig. 2. Computation of the pairwise similarity matrix SC by using the lookup table obtained from the learned codebooks.

GOHHBH_2019_v38n3_314_f0003.png 이미지

Fig. 3. The pairwise similarity matrix between the original song (“More t han w ords”) and its cover version. (a) S from OTI. (b) SC with K = 16. (c) SC with K = 48. (d) SC with K = 80.

Table 1. Identification performance of the covers80 dataset. Accuracy measures are precision at one, P@1, and the mean of average precision, MAP. RP, RC, and RS refer to relative precision, relative computational cost, and relative storage cost respectively.

GOHHBH_2019_v38n3_314_t0001.png 이미지

Table 2. Identification performance of the covers330 dataset. Accuracy measures are the mean number of covers identified within the ten first answers, MNCI10, and the mean of average precision, MAP. RP, RC, and RS refer to relative precision, relative computational cost, and relative storage cost respectively.

GOHHBH_2019_v38n3_314_t0002.png 이미지

References

  1. M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, "Content-based music information retrieval: Current directions and future challenges," Proc. the IEEE 96, 668-696 (2008). https://doi.org/10.1109/JPROC.2008.916370
  2. J. -Y. Lee and H. -G. Kim, "Audio fingerprinting using a robust hash function based on the MCLT peak-pair" (in Korean), J. Acoust. Soc. Kr. 34, 157-162 (2015). https://doi.org/10.7776/ASK.2015.34.2.157
  3. J. S. Seo, J. Kim, and J. Park, "Centroid-model based music similarity with alpha divergence" (in Korean), J. Acoust. Soc. Kr. 35, 83-91 (2016). https://doi.org/10.7776/ASK.2016.35.2.083
  4. J. Serra, E. Gomez, P. Herrera, and X. Serra, "Chroma binary similarity and local alignment applied to cover song identification," IEEE Trans. Audio Speech Lang Process. 16, 1138-1151 (2008). https://doi.org/10.1109/TASL.2008.924595
  5. J. S. Seo, "Cover song search based on magnitude and phase of the 2D Fourier transform" (in Korean), J. Acoust. Soc. Kr. 37, 518-524 (2018).
  6. M. Muller and S. Ewert, "Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features," Proc. ISMIR-2011, 215-220 (2011).
  7. P. Foster, S. Dixon, and A. Klapuri, "Identifying cover songs using information-theoretic measures of similarity," IEEE Trans. Audio Speech Lang. Process. 23, 993-1005 (2015). https://doi.org/10.1109/TASLP.2015.2416655
  8. D. F. Silva, C. -C. Yeh, G. E. A. P. A. Batista, and E. Keogh, "SIMPle: Assessing music similarity using subsequences joins," Proc. ISMIR-2016, 23-29 (2016).
  9. T. F. Smith and M. S. Waterman, "Identification of common molecular subsequences," J. Molecular Biology 147, 195-197 (1981). https://doi.org/10.1016/0022-2836(81)90087-5
  10. E. Nowak, F. Jurie, and B. Triggs, "Sampling strategies for bag-of-features image classification," Proc. ECCV-2006, 490-503 (2006).
  11. L. Wu, S. C. H. Hoi, and N. Yu, "Semantics-preserving bag-of-words models and applications," IEEE Trans. Image Process. 19, 1908-1920 (2010). https://doi.org/10.1109/TIP.2010.2045169
  12. D. P. W. Ellis and G. E. Poliner, "Identifying cover songs with chroma features and dynamic programming beat tracking," Proc. ICASSP-2007, 1429-1432 (2007).
  13. Covers80 Cover Song Data Set, Available, https://labrosa.ee.columbia.edu/projects/coversongs/covers80/, 2007.
  14. M. Muller and S. Ewert, "Towards timbre-invariant audio features for harmony-based music," IEEE Trans. Audio Speech Lang. Process. 18, 649-662 (2010). https://doi.org/10.1109/TASL.2010.2041394