DOI QR코드

DOI QR Code

분리된 보컬을 활용한 음색기반 음악 특성 탐색 연구

Investigation of Timbre-related Music Feature Learning using Separated Vocal Signals

  • 투고 : 2019.09.18
  • 심사 : 2019.11.26
  • 발행 : 2019.11.30

초록

음악에 대한 선호도는 다양한 요소들에 의해 결정되며, 추천의 이유를 보여주는 특성을 발굴하는 것은 음악 추천에 있어 중요하다. 본 논문은 가수 인식 작업을 통해 학습한 모델을 활용하여 다양한 음악적 특성을 반영하는 요소들 중 가수의 목소리 특성을 추출하는 방법을 제안한다. 배경음이 포함된 음원 역시 활용할 수 있지만, 음원에 포함된 배경음은 네트워크가 가수의 목소리를 온전하게 인식하는 것을 방해할 수 있다. 이를 해결하기 위해 본 연구에서는 음원 분리를 통해 배경음을 분리하는 사전 작업을 수행하고자 하며, SiSEC에 등장해 검증된 모델 구조를 활용하여 분리된 보컬로 이루어진 데이터 세트를 생성한다. 최종적으로 분리된 보컬을 활용하여 아티스트의 목소리를 반영하는 음색 기반 음악 특성을 발굴하고자 하며, 배경음이 분리되지 않은 음원을 활용한 기존 방법과의 비교를 통해 음원 분리의 효과를 알아보고자 한다.

Preference for music is determined by a variety of factors, and identifying characteristics that reflect specific factors is important for music recommendations. In this paper, we propose a method to extract the singing voice related music features reflecting various musical characteristics by using a model learned for singer identification. The model can be trained using a music source containing a background accompaniment, but it may provide degraded singer identification performance. In order to mitigate this problem, this study performs a preliminary work to separate the background accompaniment, and creates a data set composed of separated vocals by using the proven model structure that appeared in SiSEC, Signal Separation and Evaluation Campaign. Finally, we use the separated vocals to discover the singing voice related music features that reflect the singer's voice. We compare the effects of source separation against existing methods that use music source without source separation.

키워드

참고문헌

  1. J. Park, J. Lee, J. Park, J. Ha, J. Nam, "Representation Learning of Music Using Artist Labels", Proceeding of International Society for Music Information Retrieval Conference, Paris, France, pp. 717-724, 2018.
  2. B. Logan, A. Salomon, "A Music Similarity Function Based on Signal Analysis", ICME, Tokyo, Japen, pp. 22-25, 2001.
  3. H. Eghbal-Zadeh, B. Lehner, M. Schedl, G. Widmer, "I-Vectors for Timbre-Based Music Similarity and Music Artist Classification", Proceeding of International Society for Music Information Retrieval Conference, Malaga ,Spain pp. 554-560, 2015.
  4. C. I. Wang, G. Tzanetakis, "Singing style investigation by residual siamese convolutional neural networks", Proceeding of International Conference Acoustic, Speech and Signal Processing, Calgary, Canada, pp. 116-120, 2018.
  5. K. Lee, J. Nam, "LEARNING A JOINT EMBEDDING SPACE OF MONOPHONIC AND MIXED MUSIC SIGNALS FOR SINGING VOICE", Proceeding of International Society for Music Information Retrieval Conference, Delft, Netherlands, 2019.
  6. S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi, Y. Mitsufuji, "Improving music source separation based on deep neural networks through data augmentation and network blending", Proceeding of International Conference Acoustic, Speech and Signal Processing, New Orleans, LA, USA, pp. 261-265, 2017.
  7. A. Jansson, E. Humphrey, N. Montecchio, R. Bittner, A. Kumar, T. Weyde, "Singing voice separation with deep U-Net convolutional networks", Proceeding of International Society for Music Information Retrieval Conference, Suzhou, China, pp. 745-751, 2017.
  8. D. Stoller, S. Ewert, S. Dixon, "Wave-u-net: A multi- scale neural network for end-to-end source separation", Proceeding of International Society for Music Information Retrieval Conference, Paris, France, pp. 334-340, 2018.
  9. D. Ward, R. D. Mason, C. Kim, F. R. Stoter, A. Liutkus, M. Plumbley, "SISEC 2018: state of the art in musical audio source separation-Subjective selection of the best algorithm", proceeding of the 4th Workshop on Intelligent Music Production, 2018.
  10. Z. Rafii, A. Liutkus, F. R. Stoter, S. I. Mimilakis, R. Bittener, "The MUSDB18 corpus for music separation", 2017 Zafar Rafii, Antoine Liutkus, Fabian Rovert-Stoter, Stylianos loannis Mimiiakis, Rachel Bittner. MUSDB18 - a corpus for music separation, 2017, <10.5281/zenodo.1117371>.
  11. R. Bittener, J. Salamon, M. Tierney, M. Mauch, C. Cannam, J. P. Bello, "MedleyDB: A multitrack dataset for annotation-intensive mir research, Proceeding of International Society for Music Information Retrieval Conference, Taipei, Taiwan, pp. 155-160, 2014.
  12. J. Schluter, T. Grill, "Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks", Proceeding of International Society for Music Information Retrieval Conference, Malaga, Spain, pp. 121-126, 2015.
  13. K. Lee, K. Choi, J. Nam, "Revisiting Singing Voice Detection: a quantitative review and the future outlook", Proceeding of International Society for Music Information Retrieval Conference, Paris, France, pp. 506-513, 2018.
  14. J. Schluter, "Learning to pinpoint singing voice from weakly labeled examples", Proceeding of International Society for Music Information Retrieval Conference, New York, USA, pp. 44-50, 2016.
  15. B. Mcfee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, O. Neito, "Librosa: Audio and music signal analysis in python", Proceeding of the 14th Python in Science Conference, 2015.
  16. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Kudlur, "Tensorflow: a system for large-scale machine learning", Proceeding of the 12th USENIX conference on OSDI, 2016.