Cover song search based on magnitude and phase of the 2D Fourier transform

Seo, Jin Soo;

doi:10.7776/ASK.2018.37.6.518

한국음향학회지 (The Journal of the Acoustical Society of Korea)

제37권6호
/
Pages.518-524
/
2018
/
1225-4428(pISSN)
/
2287-3775(eISSN)

한국음향학회 (The Acoustical Society of Korea)

DOI QR Code

이차원 퓨리에 변환의 크기와 위상을 이용한 커버곡 검색

Cover song search based on magnitude and phase of the 2D Fourier transform

서진수 (강릉원주대학교 전자공학과)

Seo, Jin Soo (Department of Electronic Engineering, Gangneung-Wonju National University)

투고 : 2018.09.10
심사 : 2018.11.21
발행 : 2018.11.30

https://doi.org/10.7776/ASK.2018.37.6.518 인용 PDF KSCI HTML

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

라이브 음악 또는 리메이크를 통해서 재발매된 음악을 원곡의 커버곡이라 부른다. 본 논문은 고속 커버곡 검색을 위한 특징 축약을 위해 2차원 퓨리에 변환을 이용하는 방법을 연구하였다. 이차원 퓨리에 변환은 조변화에 대해서 불변성을 가지고 있으므로, 커버곡 검색을 위한 특징 축약 방법으로 적합하다. 기존 퓨리에 변환 방법에서는 크기값 만을 활용하였으나, 본 논문에서는 인접한 크로마 블록은 같은 조변화를 가진다는 가정하에 위상 정보를 추가로 활용하는 방법을 제안하였다. 두 가지 커버곡 실험 데이터셋에서 성능 비교를 수행하였으며, 제안된 방법이 기존 방법에 비해서 우수한 커버곡 검색 정확도를 보임을 확인하였다.

The cover song refers to live recordings or reproduced albums. This paper studies two-dimensional Fourier transform as a feature-dimension reduction method to search cover song fast. The two-dimensional Fourier transform is conducive in feature-dimension reduction for cover song search due to musical-key invariance. This paper extends the previous work, which only utilize the magnitude of the Fourier transform, by introducing an invariant from phase based on the assumption that adjacent frames have the same musical-key change. We compare the cover song retrieval accuracy of the Fourier-transform based methods over two datasets. The experimental results show that the addition of the invariant from phase improves the cover song retrieval accuracy over the previous magnitude-only method.

키워드

GOHHBH_2018_v37n6_518_f0001.png 이미지

Fig. 1. Overview of the cover song search system based on song-level chromagram summarization.^[8]

GOHHBH_2018_v37n6_518_f0002.png 이미지

Fig. 2. Chromagram summarization using 2D Fourier transform.

GOHHBH_2018_v37n6_518_f0003.png 이미지

Fig. 4. Search accuracy (%) versus block size W for covers80 dataset.

GOHHBH_2018_v37n6_518_f0004.png 이미지

Fig. 5. Search accuracy (%) versus block size W for kpop100 dataset.

GOHHBH_2018_v37n6_518_f0005.png 이미지

Fig. 6. Search accuracy (%) versus PCA dimension for covers80 dataset with W = 75.

GOHHBH_2018_v37n6_518_f0006.png 이미지

Fig. 7. Search accuracy (%) versus PCA dimension for kpop100 dataset with W = 75.

GOHHBH_2018_v37n6_518_f0007.png 이미지

Fig. 3. (a) Chromagram of the excerpt of the original song "Between the bars". (b) Chromagram of the excerpt of the cover song "Between the bars". (c) Real part of H_i from (a) and (b) is given by solid and dashed line respectively. (d) Imaginary part of H_i from (a) and (b) is given by solid and dashed line respectively. (e) Real part of H_i from (a) and another song ("My heart will go on") is given by solid and dashed line respectively. (f) Imaginary part of H_i from (a) and another song ("My heart will go on") is given by solid and dashed line respectively. From (c) to (f), first 50 coefficients of zigzag scan of H_i are displayed (i.e. low-frequency components).

참고문헌

Z. Fu, G. Lu, K. M. Ting, and D. Zhang, "A survey of audio-based music classification and annotation," IEEE Trans. Multimedia 13, 303-319 (2011). https://doi.org/10.1109/TMM.2010.2098858
J. Seo, J. Kim, and J. Park, "Centroid-model based music similarity with alpha divergence" (in Korean), J. Acoust. Soc. Kr. 35, 83-91 (2016). https://doi.org/10.7776/ASK.2016.35.2.083
J. Lee and H. Kim, "Audio fingerprinting using a robust hash function based on the MCLT peak-pair" (in Korean), J. Acoust. Soc. Kr. 34, 157-162 (2015). https://doi.org/10.7776/ASK.2015.34.2.157
B. Logan and A. Salomon, "A music similarity function based on signal analysis," Proc. ICME-2001, 745-748 (2001).
C. Charbuillet, D. Tardieu, and G. Peeters, "GMM supervector for content based music similarity," Proc. DAFX-2011, 425-428 (2011).
J. Serra, E. Gomez, P. Herrera, and X. Serra, "Chroma binary similarity and local alignment applied to cover song identification," IEEE Trans. Audio Speech Lang. Process. 16, 1138-1151 (2008). https://doi.org/10.1109/TASL.2008.924595
P. Foster, S. Dixon, and A. Klapuri, "Identifying cover songs using information-theoretic measures of similarity," IEEE Trans. Audio Speech Lang. Process. 23, 993-1005 (2015). https://doi.org/10.1109/TASLP.2015.2416655
J. Seo, J. Kim, and J. Park, "An investigation of chroma n-gram selection for cover song search" (in Korean), J. Acoust. Soc. Kr. 36, 436-441 (2017).
M. Muller and S. Ewert, "Towards timbre-invariant audio features for harmony-based music," IEEE Trans. Audio Speech Lang. Process. 18, 649-662 (2010). https://doi.org/10.1109/TASL.2010.2041394
M. Muller and S. Ewert, "Chroma toolbox: MATLAB implementations for extracting variants of chroma-based audio features," Proc. ISMIR-2011, 215-220 (2011).
D. Silva, C. Yeh, G. Batista, and E. Keogh, "SIMPle: Assessing music similarity using subsequences joins," Proc. ISMIR-2016, 23-29 (2016).
T. Bertin-Mahieux and D. Ellis, "Large-scale cover song recognition using the 2D Fourier transform magnitude," Proc. ISMIR-2016, 241-246 (2012).
J. Bello, C. Duxbury, M. Davies, and M. Sandler, "On the use of phase and energy for musical onset detection in the complex domain," IEEE Signal Process. Letters 11, 553-556 (2004). https://doi.org/10.1109/LSP.2004.827951
J. Seo, J. A. Haitsma, and T. Kalker, "Linear speed-change resilient audio fingerprinting," Proc. MPCA-2002, 45-48 (2002).
D. Ellis and G. Poliner, "Identifying cover songs' with chroma features and dynamic programming beat tracking," Proc. ICASSP-2007, 1429-1432 (2007).
B. Reddy and B. Chatterji, "An FFT-based technique for translation, rotation, and scale-invariant image registration," IEEE Trans. Image Process. 5, 1266-1271 (1996). https://doi.org/10.1109/83.506761
The covers80 cover song data set, available, https://labrosa.ee.columbia.edu/projects/coversongs/covers80/, 2007.
D. Ellis and C. Cotton, "The 2007 LabROSA cover song detection system," in MIREX extended abstract 2007, (2007).

한국음향학회지 (The Journal of the Acoustical Society of Korea)

이차원 퓨리에 변환의 크기와 위상을 이용한 커버곡 검색

Cover song search based on magnitude and phase of the 2D Fourier transform

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)