DOI QR코드

DOI QR Code

Bi-LSTM 모델을 이용한 음악 생성 시계열 예측

Prediction of Music Generation on Time Series Using Bi-LSTM Model

  • 김광진 ((주)젠트정보기술) ;
  • 이칠우 (전남대학교 컴퓨터정보통신공학과)
  • 투고 : 2022.10.25
  • 심사 : 2022.11.28
  • 발행 : 2022.11.30

초록

딥러닝은 기존의 분석 모델이 갖는 한계를 극복하고 텍스트, 이미지, 음악 등 다양한 형태의 결과물을 생성할 수 있는 창의적인 도구로 활용되고 있다. 본 고에서는 Niko's MIDI Pack 음원 파일 1,609개를 데이터 셋으로 삼아 전처리 과정을 수행하고, 양방향 장단기 기억 순환 신경망(Bi-LSTM) 모델을 이용하여, 효율적으로 음악을 생성할 수 있는 전처리 방법과 예측 모델을 제시한다. 생성되는 으뜸음을 바탕으로 음악적 조성(調聲)에 적합한 새로운 시계열 데이터를 생성할 수 있도록 은닉층을 다층화하고, 디코더의 출력 게이트에서 인코더의 입력 데이터 중 영향을 주는 요소의 가중치를 적용하는 어텐션(Attention) 메커니즘을 적용한다. LSTM 모델의 인식률 향상을 위한 파라미터로서 손실함수, 최적화 방법 등 설정 변수들을 적용한다. 제안 모델은 MIDI 학습의 효율성 제고 및 예측 향상을 위해 높은음자리표(treble clef)와 낮은음자리표(bass clef)를 구분하여 추출된 음표, 음표의 길이, 쉼표, 쉼표의 길이와 코드(chord) 등을 적용한 다채널 어텐션 적용 양방향 기억 모델(Bi-LSTM with attention)이다. 학습의 결과는 노이즈와 구별되는 음악의 전개에 어울리는 음표와 코드를 생성하며, 화성학적으로 안정된 음악을 생성하는 모델을 지향한다.

Deep learning is used as a creative tool that could overcome the limitations of existing analysis models and generate various types of results such as text, image, and music. In this paper, we propose a method necessary to preprocess audio data using the Niko's MIDI Pack sound source file as a data set and to generate music using Bi-LSTM. Based on the generated root note, the hidden layers are composed of multi-layers to create a new note suitable for the musical composition, and an attention mechanism is applied to the output gate of the decoder to apply the weight of the factors that affect the data input from the encoder. Setting variables such as loss function and optimization method are applied as parameters for improving the LSTM model. The proposed model is a multi-channel Bi-LSTM with attention that applies notes pitch generated from separating treble clef and bass clef, length of notes, rests, length of rests, and chords to improve the efficiency and prediction of MIDI deep learning process. The results of the learning generate a sound that matches the development of music scale distinct from noise, and we are aiming to contribute to generating a harmonistic stable music.

키워드

참고문헌

  1. Hochreiter and Schmidhuber, "Long Short -Term Memory," Neural Computation Vol. 9, No. 8, pp. 1740-1750, Nov. 1997. 
  2. J. Schmidhuber, "Deep Learning in Neural Networks," Neural Networks, Vol. 61, pp. 85-117, Jan. 2015.  https://doi.org/10.1016/j.neunet.2014.09.003
  3. Do Nhu Tai and Soo Hyung Kim et al., "Tracking by Detection of Multiple Faces using SSD and CNN Features," 스마트미디어저널, 제7권, 제2호, 65-66쪽, 2018년 06월 
  4. Thanh-Cong Do, Hyung Jeong Yang, Soo Hyung Kim et al., "Region of Interest Localization for Bone Age Estimation Using Whole-Body Bone Scintigraphy," 스마트미디어저널, 제10권, 제2호, 22-29쪽, 2021년 06월 
  5. Andries Van Der Merwe, Walter Schulze, "Music generation with Markov models," IEEE MultiMedia, Vol. 18, No.3, pp. 78-84, Mar. 2011. 
  6. Sanidhya Mangal, Rahul Modak, Poorva Joshi, "LSTM Based Music Generation System," arXiv:1908.01080v1, Aug. 2019.  1020202.100.00
  7. Zheng Sun, Jiaqi Liu et al., "Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding," 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 2-5, Honolulu, HI, USA, Nov. 2018. 
  8. Ian J. Goodfellow et al., "Generative Adversarial Nets," Part of Advances in Neural Information Processing Systems 27 (NIPS 2014), pp. 3-5, Montreal, Canada, Dec. 2014. 
  9. Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment," arXiv:1709.06298v2, Nov. 2017. 
  10. Kyung Hyun Cho et al., "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling," arXiv:1412.3555vl, pp. 4-5, Dec. 2014. 
  11. Ashish Vaswani and Noam Shazeer et al. , "Attention Is All You Need," Part of Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 2-6, 2017 
  12. Cheng-Zhi Anna Huang et al., "Music Transformer: Generating Music With Long-Term Structure," ICLR 2019, pp. 3-5, May, 2019. 
  13. Yongjie Human et al., "Music Generation Based on Convolution-LSTM," Computer and Information Science, Vol. 11, pp. 51-52, Jun. 2018. 
  14. 배준, "인공지능을 이용한 국악 멜로디 생성기에 관한 연구," 한국정보통신학회논문지, 제25권, 제7호, 873-875쪽, 2021년 07월 
  15. Time-Domain versus Frequence-Domain, https://www.radartutorial.eu/10.processing/sp53.en.html, (accessed Oct. 2022). 
  16. Ke-Lin Du, M.N.s. Swamy, "Neural Networks and Statistical Learning," Springer, pp. 337-353, Oct. 2013. 
  17. Ikram Ul Haq, Iqbal Gondal et al., "Categorical Features Transformation with Compact One-Hot Encoder for Fraud Detection in Distributed Environment," 16th Australasian Conference, pp. 69-80, Bathurst, NSW, Australia, Nov. 2018. 
  18. Li, Chuming, et al., "Am-lfs: AutoML for loss function search," Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8410-8419, Seoul, Korea, Oct. 2019. 
  19. Williams and David Zipser, "Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity," Computer Science, pp. 444-445, Jan. 1995. 
  20. Paul J. Werbos, "Backpropagation Through Time: What It Does and How to Do It," Proceedings of the IEEE, Vol. 78, No. 10, pp. 1553-1560, Oct. 1990. 
  21. George Philipp, Dawn Song, Jaime G. Carbonell,, "The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions," arXiv:1712.05577v4, Apr. 2018. 
  22. Minh-Thang Luong, Hieu Pham, Christopher D. Manning, "Effective Approaches to Attention-based Neural Machine Translation," arXiv:1508.04025v5, Sep. 2015.