Fig. 1. The music-similarity computation for the cover song identification based on the optimal transposition index and sequence alignment.
Fig. 2. Computation of the pairwise similarity matrix SC by using the lookup table obtained from the learned codebooks.
Fig. 3. The pairwise similarity matrix between the original song (“More t han w ords”) and its cover version. (a) S from OTI. (b) SC with K = 16. (c) SC with K = 48. (d) SC with K = 80.
Table 1. Identification performance of the covers80 dataset. Accuracy measures are precision at one, P@1, and the mean of average precision, MAP. RP, RC, and RS refer to relative precision, relative computational cost, and relative storage cost respectively.
Table 2. Identification performance of the covers330 dataset. Accuracy measures are the mean number of covers identified within the ten first answers, MNCI10, and the mean of average precision, MAP. RP, RC, and RS refer to relative precision, relative computational cost, and relative storage cost respectively.
References
- M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, "Content-based music information retrieval: Current directions and future challenges," Proc. the IEEE 96, 668-696 (2008). https://doi.org/10.1109/JPROC.2008.916370
- J. -Y. Lee and H. -G. Kim, "Audio fingerprinting using a robust hash function based on the MCLT peak-pair" (in Korean), J. Acoust. Soc. Kr. 34, 157-162 (2015). https://doi.org/10.7776/ASK.2015.34.2.157
- J. S. Seo, J. Kim, and J. Park, "Centroid-model based music similarity with alpha divergence" (in Korean), J. Acoust. Soc. Kr. 35, 83-91 (2016). https://doi.org/10.7776/ASK.2016.35.2.083
- J. Serra, E. Gomez, P. Herrera, and X. Serra, "Chroma binary similarity and local alignment applied to cover song identification," IEEE Trans. Audio Speech Lang Process. 16, 1138-1151 (2008). https://doi.org/10.1109/TASL.2008.924595
- J. S. Seo, "Cover song search based on magnitude and phase of the 2D Fourier transform" (in Korean), J. Acoust. Soc. Kr. 37, 518-524 (2018).
- M. Muller and S. Ewert, "Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features," Proc. ISMIR-2011, 215-220 (2011).
- P. Foster, S. Dixon, and A. Klapuri, "Identifying cover songs using information-theoretic measures of similarity," IEEE Trans. Audio Speech Lang. Process. 23, 993-1005 (2015). https://doi.org/10.1109/TASLP.2015.2416655
- D. F. Silva, C. -C. Yeh, G. E. A. P. A. Batista, and E. Keogh, "SIMPle: Assessing music similarity using subsequences joins," Proc. ISMIR-2016, 23-29 (2016).
- T. F. Smith and M. S. Waterman, "Identification of common molecular subsequences," J. Molecular Biology 147, 195-197 (1981). https://doi.org/10.1016/0022-2836(81)90087-5
- E. Nowak, F. Jurie, and B. Triggs, "Sampling strategies for bag-of-features image classification," Proc. ECCV-2006, 490-503 (2006).
- L. Wu, S. C. H. Hoi, and N. Yu, "Semantics-preserving bag-of-words models and applications," IEEE Trans. Image Process. 19, 1908-1920 (2010). https://doi.org/10.1109/TIP.2010.2045169
- D. P. W. Ellis and G. E. Poliner, "Identifying cover songs with chroma features and dynamic programming beat tracking," Proc. ICASSP-2007, 1429-1432 (2007).
- Covers80 Cover Song Data Set, Available, https://labrosa.ee.columbia.edu/projects/coversongs/covers80/, 2007.
- M. Muller and S. Ewert, "Towards timbre-invariant audio features for harmony-based music," IEEE Trans. Audio Speech Lang. Process. 18, 649-662 (2010). https://doi.org/10.1109/TASL.2010.2041394