Fig. 1. Overview of the cover song search system based on song-level chromagram summarization.[8]
Fig. 2. Chromagram summarization using 2D Fourier transform.
Fig. 4. Search accuracy (%) versus block size W for covers80 dataset.
Fig. 5. Search accuracy (%) versus block size W for kpop100 dataset.
Fig. 6. Search accuracy (%) versus PCA dimension for covers80 dataset with W = 75.
Fig. 7. Search accuracy (%) versus PCA dimension for kpop100 dataset with W = 75.
Fig. 3. (a) Chromagram of the excerpt of the original song "Between the bars". (b) Chromagram of the excerpt of the cover song "Between the bars". (c) Real part of Hi from (a) and (b) is given by solid and dashed line respectively. (d) Imaginary part of Hi from (a) and (b) is given by solid and dashed line respectively. (e) Real part of Hi from (a) and another song ("My heart will go on") is given by solid and dashed line respectively. (f) Imaginary part of Hi from (a) and another song ("My heart will go on") is given by solid and dashed line respectively. From (c) to (f), first 50 coefficients of zigzag scan of Hi are displayed (i.e. low-frequency components).
참고문헌
- Z. Fu, G. Lu, K. M. Ting, and D. Zhang, "A survey of audio-based music classification and annotation," IEEE Trans. Multimedia 13, 303-319 (2011). https://doi.org/10.1109/TMM.2010.2098858
- J. Seo, J. Kim, and J. Park, "Centroid-model based music similarity with alpha divergence" (in Korean), J. Acoust. Soc. Kr. 35, 83-91 (2016). https://doi.org/10.7776/ASK.2016.35.2.083
- J. Lee and H. Kim, "Audio fingerprinting using a robust hash function based on the MCLT peak-pair" (in Korean), J. Acoust. Soc. Kr. 34, 157-162 (2015). https://doi.org/10.7776/ASK.2015.34.2.157
- B. Logan and A. Salomon, "A music similarity function based on signal analysis," Proc. ICME-2001, 745-748 (2001).
- C. Charbuillet, D. Tardieu, and G. Peeters, "GMM supervector for content based music similarity," Proc. DAFX-2011, 425-428 (2011).
- J. Serra, E. Gomez, P. Herrera, and X. Serra, "Chroma binary similarity and local alignment applied to cover song identification," IEEE Trans. Audio Speech Lang. Process. 16, 1138-1151 (2008). https://doi.org/10.1109/TASL.2008.924595
- P. Foster, S. Dixon, and A. Klapuri, "Identifying cover songs using information-theoretic measures of similarity," IEEE Trans. Audio Speech Lang. Process. 23, 993-1005 (2015). https://doi.org/10.1109/TASLP.2015.2416655
- J. Seo, J. Kim, and J. Park, "An investigation of chroma n-gram selection for cover song search" (in Korean), J. Acoust. Soc. Kr. 36, 436-441 (2017).
- M. Muller and S. Ewert, "Towards timbre-invariant audio features for harmony-based music," IEEE Trans. Audio Speech Lang. Process. 18, 649-662 (2010). https://doi.org/10.1109/TASL.2010.2041394
- M. Muller and S. Ewert, "Chroma toolbox: MATLAB implementations for extracting variants of chroma-based audio features," Proc. ISMIR-2011, 215-220 (2011).
- D. Silva, C. Yeh, G. Batista, and E. Keogh, "SIMPle: Assessing music similarity using subsequences joins," Proc. ISMIR-2016, 23-29 (2016).
- T. Bertin-Mahieux and D. Ellis, "Large-scale cover song recognition using the 2D Fourier transform magnitude," Proc. ISMIR-2016, 241-246 (2012).
- J. Bello, C. Duxbury, M. Davies, and M. Sandler, "On the use of phase and energy for musical onset detection in the complex domain," IEEE Signal Process. Letters 11, 553-556 (2004). https://doi.org/10.1109/LSP.2004.827951
- J. Seo, J. A. Haitsma, and T. Kalker, "Linear speed-change resilient audio fingerprinting," Proc. MPCA-2002, 45-48 (2002).
- D. Ellis and G. Poliner, "Identifying cover songs' with chroma features and dynamic programming beat tracking," Proc. ICASSP-2007, 1429-1432 (2007).
- B. Reddy and B. Chatterji, "An FFT-based technique for translation, rotation, and scale-invariant image registration," IEEE Trans. Image Process. 5, 1266-1271 (1996). https://doi.org/10.1109/83.506761
- The covers80 cover song data set, available, https://labrosa.ee.columbia.edu/projects/coversongs/covers80/, 2007.
- D. Ellis and C. Cotton, "The 2007 LabROSA cover song detection system," in MIREX extended abstract 2007, (2007).