Browse > Article
http://dx.doi.org/10.3837/tiis.2017.12.019

Automatic melody extraction algorithm using a convolutional neural network  

Lee, Jongseol (Communications & Media R&D, Korea Electronics Technology Institute)
Jang, Dalwon (Communications & Media R&D, Korea Electronics Technology Institute)
Yoon, Kyoungro (Department of Computer Science and Engineering, Konkuk University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.11, no.12, 2017 , pp. 6038-6053 More about this Journal
Abstract
In this study, we propose an automatic melody extraction algorithm using deep learning. In this algorithm, feature images, generated using the energy of frequency band, are extracted from polyphonic audio files and a deep learning technique, a convolutional neural network (CNN), is applied on the feature images. In the training data, a short frame of polyphonic music is labeled as a musical note and a classifier based on CNN is learned in order to determine a pitch value of a short frame of audio signal. We want to build a novel structure of melody extraction, thus the proposed algorithm has a simple structure and instead of using various signal processing techniques for melody extraction, we use only a CNN to find a melody from a polyphonic audio. Despite of simple structure, the promising results are obtained in the experiments. Compared with state-of-the-art algorithms, the proposed algorithm did not give the best result, but comparable results were obtained and we believe they could be improved with the appropriate training data. In this paper, melody extraction and the proposed algorithm are introduced first, and the proposed algorithm is then further explained in detail. Finally, we present our experiment and the comparison of results follows.
Keywords
melody extraction; convolutional neural network; train-test framework;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Jang, M. Jin and C. D. Yoo, "Music genre classification using novel features and a weighted voting method," in Proc. of ICME, 2008.
2 T. LH. Li, A. B. Chan, and A. HW. Chun, "Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network," in Proc of IMECS, 2010.
3 X. Hu and J. S. Downie, "Improving mood classification in music digital libraries by combining lyrics and audio," in Proc of the 10th annual joint conference on Digital libraries, pp159-168, 2010.
4 J. H. Kim, S. Lee, S. M. Kim and W. Y. Yoo, "Music mood classification model based on Arousal-Valence values," in Proc of ICACT, pp 292-295, 2011.
5 D. Jang, C. D. Yoo, S. Lee, S. Kim and T. Kalker, "Pairwise Boosted Audio Fingerprint," IEEE Trans. Information Forensics and Security, vol. 4, no. 4, pp. 995-1004, Dec. 2009.   DOI
6 J. Haitsma and T. Kalker, "A highly robust audio fingerprinting system," ISMIR 2002.
7 S. Durand, J. P. Bello, B. David, and G. Richard, "Feature Adapted Convolutional neural Networks for Downbeat Tracking," in Proc. of ICASSP, 2016.
8 K. Choi, G. Fazekas, and M. Sandler, "Automatic tagging using deep convolutional neural networks," in Proc of ISMIR, 2016.
9 S. Jo and C. D. Yoo, "Melody extraction from polyphonic audio based on particle filter," in Proc of ISMIR, pp. 357-362, 2010.
10 D. P. W. Ellis and G. E. Poliner, "Classification-based melody transcription," Machine Learning, Vol. 65, pp. 439-456, 2006.   DOI
11 J. Salamon, E. Gomez, D. P. W. Ellis, and G. Richard, "Melody extraction from polyphonic music signals: Approaches, applications, and challenges," IEEE Signal Processing magazine, 2014
12 T.-C. Yeh, M.-J. Wu, J.-S. Jang, W.-L. Chang and I.-B. Liao, "A hybrid approach to singing pitch extraction based on trend estimation and hidden Markov models," in Proc. of IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, pp. 457-460, Mar. 2012.
13 K. Dressler, "An auditory streaming approach for melody extraction from polyphonic music," in Proc. of 12th ISMIR, Miami, FL, pp. 19-24, Oct. 2011.
14 V. Rao and P. Rao, "Vocal melody extraction in the presence of pitched accompaniment in polyphonic music," IEEE Trans. Audio, Speech, Lang. Processing, vol. 18, no. 8, pp. 2145-2154, Nov. 2010.   DOI
15 S. Jo, S. Joo and C. D. Yoo, "Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model," in Proc. of InterSpeech, Makuhari, Japan, Sept. 2010, pp. 2902-2905.
16 V. Arora and L. Behera, "On-line melody extraction from polyphonic audio using harmonic cluster tracking," IEEE Trans. Audio, Speech, Lang. Processing, vol. 21, no. 3, pp. 520-530, Mar. 2013.   DOI
17 C. Hsu and J. S. R. Jang, "Singing pitch extraction by voice vibrato/tremolo estimation and instrument partial deletion," in Proc. of 11th ISMIR, Utrecht, The Netherlands, Aug. 2010, pp. 525-530. http://ismir2010.ismir.net/proceedings/ismir2010-89.pdf
18 S. Kum, C. Oh, and J. Nam, "Melody Extraction on Vocal Segments using Multi-Column Deep Neural Networks," in Proc. of ISMIR, 2016
19 E. J. Humphrey, J. P. Bello, and Y. LeCun, "Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics," in Proc of ISMIR, 2012.
20 Music Information Retrieval Evaluation eXchange [Online], Available: http://www.music-ir.org/mirex/wiki/MIREX_HOME
21 R. B. Palm, "Prediction as a candidate for learning deep hierarchical models of data", Technical University of Denmark, 2012.
22 2015: MIREX2015 Results [online] Available: http://www.music-ir.org/mirex/wiki/2015:MIREX2015_Results
23 A. Krizhevsky, I Sutckever and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Proc. of NIPS, 2012.
24 D.C. Ciresan, U Meier and L. M. Gambardella, " Convolutional neural network committees for handwritten character classification," in Proc. of International Conference on Document Analysis and Recognition, pp. 1250-1254, 2011
25 C. Zhang and Z. Zhang. "Improving multiview face detection with multi-task deep convolutional neural networks," in Proc. of Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on, pp. 1036-1041, 2014.
26 J. Zbontar and Y LeCun, "Computing the stereo matching cost with a convolutional neural network, " Proceeding of CVPR 2015.
27 2016: Audio Melody Extraction [online] Available: http://www.music-ir.org/mirex/wiki/2016:Audio_Melody_Extraction
28 D. Hermes, "Measurement of pitch by subharmonic summation," Journal of Acoustic of Society of America, vol.83, pp.257-264,1988.   DOI
29 V. Arora and L. Behera, "On-line melody extraction from polyphonic audio using harmonic cluster tracking," IEEE Trans. on Audio Speech and Language Processing, vol. 21, no. 3, pp. 520 -530, Mar. 2013.   DOI
30 T.-C. Yeh, M.-J. Wu, J.-S. Jang, W.-L. Chang, and I.-B. Liao, "A hybrid approach to singing pitch extraction based on trend estimation and hidden Markov models," in Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, pp. 457-460, Mar. 2012.
31 J. S. R. Jang and H. R. Lee, "A general framework of progressive filtering and its application to query by singing/humming," IEEE Trans. on Audio, Speech, and language Processing, vol. 16, no. 2, pp 350-358, Feb., 2008.   DOI
32 J. S. Downie, "Music information retrieval," Annual Review of Information Science and Technology, 37:295-340, 2003.
33 R. Typke, F. Wiering and R. Veltkamp, "A survey of music information retrieval systems," in Proc. of ISMIR, pp. 153-160, 2005.
34 D. Jang, C.-J. Song, S. Shin, S.-J. Park, S.-J. Jang and S.-P. Lee, "Implementation of a matching engine for a practical query-by-singing/humming system," in Proc. of ISSPIT, pp. 258-263, 2011.
35 G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Trans. Speech Audio Process, vol. 10, no. 5, pp. 293-302, 2002.   DOI
36 S. W. Hainsworth and M. D. Macleod, "Particle filtering applied to musical tempo tracking," EURASIP J. Applied Signal Processing, vol. 15, pp. 2385-2395, 2004.
37 D. P. W. Ellis and G. E. Poliner, "Identifying cover songs with chroma features and dynamic programming beat tracking," in Proc. of Int. Conf Acoustic, Speech and Signal Processing, Honolulu, HI, 2007.