DOI QR코드

DOI QR Code

Music Genre Classification using Spikegram and Deep Neural Network

스파이크그램과 심층 신경망을 이용한 음악 장르 분류

  • Jang, Woo-Jin (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Yun, Ho-Won (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Shin, Seong-Hyeon (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Cho, Hyo-Jin (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Jang, Won (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Park, Hochong (Dept. of Electronics Engineering, Kwangwoon University)
  • 장우진 (광운대학교 전자공학과) ;
  • 윤호원 (광운대학교 전자공학과) ;
  • 신성현 (광운대학교 전자공학과) ;
  • 조효진 (광운대학교 전자공학과) ;
  • 장원 (광운대학교 전자공학과) ;
  • 박호종 (광운대학교 전자공학과)
  • Received : 2017.08.23
  • Accepted : 2017.10.19
  • Published : 2017.11.30

Abstract

In this paper, we propose a new method for music genre classification using spikegram and deep neural network. The human auditory system encodes the input sound in the time and frequency domain in order to maximize the amount of sound information delivered to the brain using minimum energy and resource. Spikegram is a method of analyzing waveform based on the encoding function of auditory system. In the proposed method, we analyze the signal using spikegram and extract a feature vector composed of key information for the genre classification, which is to be used as the input to the neural network. We measure the performance of music genre classification using the GTZAN dataset consisting of 10 music genres, and confirm that the proposed method provides good performance using a low-dimensional feature vector, compared to the current state-of-the-art methods.

본 논문은 스파이크그램과 심층 신경망을 이용한 새로운 음악 장르 분류 방법을 제안한다. 인간의 청각 시스템은 최소 에너지와 신경 자원을 사용하여 최대 청각 정보를 뇌로 전달하기 위하여 입력 소리를 시간과 주파수 영역에서 부호화한다. 스파이크그램은 이러한 청각 시스템의 부호화 동작을 기반으로 파형을 분석하는 기법이다. 제안하는 방법은 스파이크그램을 이용하여 신호를 분석하고 그 결과로부터 장르 분류를 위한 핵심 정보로 구성된 특성 벡터를 추출하고, 이를 심층 신경망의 입력 벡터로 사용한다. 성능 측정에는 10개의 음악 장르로 구성된 GTZAN 데이터 세트를 사용하였고, 제안 방법이 기존 방법에 비해 낮은 차원의 특성 벡터를 사용하여 우수한 성능을 제공하는 것을 확인하였다.

Keywords

References

  1. M. Henaff, K. Jarrett, K. Kavukcuoglu and Y. LeCun, "Unsupervised Learning of Sparse Features for Scalable Audio Classification," Proceeding of International Society for Music Information Retrieval Conference (ISMIR), pp.681-686, Sep. 2011.
  2. S. H. Kim, D. S. Kim and B. W. Suh, "Music Genre Classification Using Multimodal Deep Learning," Proceeding of Human Computer Interaction Korea, pp.389-395, Jan. 2016.
  3. D. Bhalke, B. Rajesh and D. Bormane, "Automatic Genre Classification Using Fractional Fourier Transform Based Mel Frequency Cepstral Coefficient and Timbral Features," Archives of Acoustics, Vol.42, No.2, pp.213-222, 2017. https://doi.org/10.1515/aoa-2017-0024
  4. M. Patil and U. Nemade, "Music Genre Classification Using MFCC, K-NN and SVM Classifier," International Journal of Computer Engineering In Research Trends, Vol.4, No.2, pp.43-47, Feb. 2017.
  5. P. Manzagol, T. Bertin-Mahieux and D. Eck, "On The Use of Sparse Time-Relative Auditory Codes for Music," Proceeding of International Society for Music Information Retrieval Conference (ISMIR), pp.603-608, Sep. 2008.
  6. G. Tzanetakis and P. Cook, "Musical Genre Classification of Audio Signals," IEEE Transactions on Speech and Audio Processing, Vol.10, No.5, pp. 293-302, July 2002. https://doi.org/10.1109/TSA.2002.800560
  7. E. Smith and M. Lewicki, "Efficient Auditory Coding," Nature, Vol.439, No.7079, pp.978-982, Feb. 2006. https://doi.org/10.1038/nature04485
  8. G. Mather, Foundations of Perception, Psychology Press, 2006.
  9. J. Tropp and A. Gilbert, "Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit," IEEE Transactions on Information Theory, Vol.53, No.12, Dec. 2007.
  10. N. Srivastava, G. Hinton, A. Krizhevsky and R. Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," Journal of Machine Learning Research, Vol.15, No.1, pp.1929-1958, June 2014.