Browse > Article
http://dx.doi.org/10.3745/KTSDE.2021.10.11.483

Creating Songs Using Note Embedding and Bar Embedding and Quantitatively Evaluating Methods  

Lee, Young-Bae (한성대학교 지식서비스&컨설팅대학원 미래융합컨설팅학과)
Jung, Sung Hoon (한성대학교 기계전자공학부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.10, no.11, 2021 , pp. 483-490 More about this Journal
Abstract
In order to learn an existing song and create a new song using an artificial neural network, it is necessary to convert the song into numerical data that the neural network can recognize as a preprocessing process, and one-hot encoding has been used until now. In this paper, we proposed a note embedding method using notes as a basic unit and a bar embedding method that uses the bar as the basic unit, and compared the performance with the existing one-hot encoding. The performance comparison was conducted based on quantitative evaluation to determine which method produced a song more similar to the song composed by the composer, and quantitative evaluation methods used in the field of natural language processing were used as the evaluation method. As a result of the evaluation, the song created with bar embedding was the best, followed by note embedding. This is significant in that the note embedding and bar embedding proposed in this paper create a song that is more similar to the song composed by the composer than the existing one-hot encoding.
Keywords
Automatic Composition; One-Hot Encoding; Note Embedding; Bar Embedding; Quantitative Evaluation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 F. Shah, T. Naik and N. Vyas, "LSTM Based Music Generation," 2019 International Conference on Machine Learning and Data Engineering, 2019.
2 R. Vedantam, C. L. Zitnick, and D. Parikh, "CIDEr: Consensus-based image description evaluation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4566-4575, 2015.
3 I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," arXiv preprint arXiv: 1409.3215, 2014.
4 R. Jozefowicz, W. Zaremba, and B. Sutskever, "An empirical exploration of recurrent network architectures," in Proceedings of the 32nd International Conference on Machine Learning, pp.2342-2350, 2015.
5 D. Bahdanau, K. H. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2016.
6 K. Xu, J. Ba, R. Kiros, K. H. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio, "How, attend and tell: Neural image caption generation with visual attention," in Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp.2048-2057, 2015.
7 K. Papineni, S. Roukos, T. Ward and W. J. Zhu, "BLEU: a Method for Automatic Evaluation of Machine Translation," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp.311-318, 2002.
8 H. Chu, R. Urtasun, and S. Fidler, "Song from PI: A musically plausible network for pop music generation" arXiv preprint arXiv:1611.03477, 2016.
9 S. R. Hwang and Y. C. Park, "Chord-based stepwise Korean Trot music generation technique using RNN-GAN," The Journal of the Acoustical Society of Korea, Vol.39, No.6, pp. 622-628.   DOI
10 B. Logan, D. P. Ellis, and A. Berenzweig, "Toward evaluation techniques for music similarity," The MIR/MDL Evaluation Project White Paper Collection, Vol.3, pp.81-85, 2003.
11 C. Y. Lin, "ROUGE: A package for automatic evaluation of summaries," in Proceedings of the Workshop on Text Summarization Branches Out, pp.74-81, 2004.
12 S. Banerjee and A. Lavie "METEOR: An automatic metric for MT evaluation with improved correlation with human judgments," in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp.65-72, 2005.
13 S. Sharma, L. E. Asri, H. Schulz, and J. Zumer, "Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation," arXiv preprint arXiv: 1706.09799, 2017.
14 T. Mikolov, M. Karafiat, L. Burget, J. Cernock, and S. Khudanpur, "Recurrent neural network based language model," in Proceedings of the 11th Annual Conference of the International Speech Communication Association, pp.1045-1048, 2010.
15 A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of tricks for efficient text classification," in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp.427-431, 2017.
16 P. Chen and E. Xu , "CS 224 N project report: From Note 2 Vec to Chord 2 Vec," 2019.
17 KakaoBrain. Similarity Method Between Words [Internet], https://www.kakaobrain.com/blog/6.
18 L. C. Yang and A. Lerch, "On the evaluation of generative models in music," Neural Computing and Applications, Vol.32, No.9, pp.4773-4784, 2020.   DOI
19 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," in Proceedings of Workshop at ICLR, 2013.
20 J. Pennington, R. Socher, and C. D. Manning, "GloVe: Global vectors for word representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp.1532-1543, 2014.
21 S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol.9, No.8, pp.1735-1780, 1997.   DOI
22 Z. Lin, M. Feng, S. N. Santos, M. Yu, B. Xiangl, B. Zhou, and Y. Bengio, "A structured self-attentive sentence embedding," arXiv preprint arXiv:1703.03130, 2017.
23 H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Proceedings of the Annual Conference of the International Speech Communication Association, pp.338-342, 2014.
24 Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, "Hierarchical attention networks for document classification," in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp.1480-1489, 2016.