A music similarity function based on probabilistic linear discriminant analysis for cover song identification

Jin Soo, Seo;Junghyun, Kim;Hyemi, Kim;

doi:10.7776/ASK.2022.41.6.662

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 41 Issue 6
/
Pages.662-667
/
2022
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

A music similarity function based on probabilistic linear discriminant analysis for cover song identification

커버곡 검색을 위한 확률적 선형 판별 분석 기반 음악 유사도

Jin Soo, Seo (Department of Electronic Engineering, Gangneung-Wonju National University) ;
Junghyun, Kim ;
Hyemi, Kim

서진수 (강릉원주대학교 전자공학과) ;
김정현 (한국전자통신연구원 콘텐츠연구본부) ;
김혜미 (한국전자통신연구원 콘텐츠연구본부)

Received : 2022.09.30
Accepted : 2022.10.27
Published : 2022.11.30

https://doi.org/10.7776/ASK.2022.41.6.662 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Computing music similarity is an indispensable component in developing music search service. This paper focuses on learning a music similarity function in order to boost cover song identification performance. By using the probabilistic linear discriminant analysis, we construct a latent music space where the distances between cover song pairs reduces while the distances between the non-cover song pairs increases. We derive a music similarity function by testing hypothesis, whether two songs share the same latent variable or not, using the probabilistic models with the assumption that observed music features are generated from the learned latent music space. Experimental results performed on two cover music datasets show that the proposed music similarity improves the cover song identification performance.

음악 유사도 계산은 음악 검색 서비스 구현에서 가장 중요한 요소 중 하나이다. 본 논문은 커버곡 검색의 성능을 제고하기 위한 음악 유사도 학습에 대해서 다룬다. 음악 유사도 함수를 유도하는 데 확률적 선형 판별 분석을 이용하여 잠재 음악 공간을 구한다. 잠재 음악 공간은 같은 커버곡 간의 거리는 줄이고 다른 곡 간의 거리는 크게 되도록 학습한다. 추출된 음악 특징이 잠재 음악 변수에서 생성되었다는 가정 하에 확률 모델을 구하고, 음악의 동질성 여부를 가설검증하여 음악 유사도 함수를 유도한다. 두 가지 커버곡 실험 데이터셋에서 성능 비교를 수행하여 제안한 음악 유사도 함수가 커버곡 검색 성능을 개선시킬 수 있음을 보였다.

Keywords

Acknowledgement

References

Y. V. S. Murthy and S. G. Koolagudi, "Content-based music information retrieval and its applications toward the music industry: A review," ACM Comput. Surv. 51, 1-46 (2019).
J. S. Seo, J. Kim, and J. Park, "Centroid-model based music similarity with alpha divergence" (in Korean), J. Acoust. Soc. Kr. 35, 83-91 (2016). https://doi.org/10.7776/ASK.2016.35.2.083
F. Yesiler, G. Doras, R. M. Bittner, C. J. Tralie, and J. Serra, "Audio-based musical version identification: Elements and challenges," IEEE Signal Process. Mag. 38, 115-136 (2021).
J. Serra, E. Gomez, P. Herrera, and X. Serra, "Chroma binary similarity and local alignment applied to cover song identification," IEEE Trans. Audio Speech Lang. Process, 16, 1138-1151 (2008). https://doi.org/10.1109/TASL.2008.924595
J. S. Seo, "Cover song search based on magnitude and phase of the 2D Fourier transform" (in Korean), J. Acoust. Soc. Kr. 37, 518-524 (2018).
G. Doras and G. Peeters, "Cover detection using dominant melody embeddings," Proc. ISMIR, 107-114 (2019).
F. Yesiler, J. Serra, and E. Gomez, "Accurate and scalable version identification using musically-motivated embeddings," Proc. ICASSP, 21-25 (2020).
X. Du, Z. Yu, B. Zhu, X. Chen, and Z. Ma, "Bytecover: Cover song identification via multi-loss training," Proc. ICASSP, 551-555 (2021).
S. Prince, P. Li, Y. Fu, U. Mohammed, and J. Elder, "Probabilistic models for inference about identity," IEEE TPAMI, 34, 144-157 (2012). https://doi.org/10.1109/TPAMI.2011.104
P. Rajan, A. Afanasyev, V Hautamaki, and T. Kinnunen, "From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification," Digit. Signal Process. 31, 93-101 (2014). https://doi.org/10.1016/j.dsp.2014.05.001
D. Snyder, D. Garcia-Romero, G. Sell, A. McCree, D. Povey, and S. Khudanpur, "Speaker recognition for multi-speaker conversations using x-vectors," Proc. ICASSP, 5796-5800 (2019).
B. McFee and J. P. Bello, "Structured training for large-vocabulary chord recognition," Proc. ISMIR, 188-194 (2017).
A. Hermans, L. Beyer, and B. Leibe, "In defense of the triplet loss for person re-identification," arXiv: 1703.07737 (2017).
H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, "Bag of tricks and a strong baseline for deep person re-identification," Proc. CVPR workshops, 1487-1495 (2019).
F. Yesiler, C. Tralie, A. Correya, D. F. Silva, P. Tovstogan, E. Gomez, and X. Serra, "Da-TACOS: A dataset for cover song identification and understanding," Proc. ISMIR, 327-334 (2019).
Covers80 Cover Song Data Set, http://labrosa.ee.columbia.edu/projects/coversongs/covers80/, (Last viewed February 1, 2017).
F. Yesiler, J. Serra, and E. Gomez, "Less is more: Faster and better music version identification with embedding distillation," Proc. ISMIR, 884-892 (2020).

The Journal of the Acoustical Society of Korea (한국음향학회지)

A music similarity function based on probabilistic linear discriminant analysis for cover song identification

커버곡 검색을 위한 확률적 선형 판별 분석 기반 음악 유사도

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)