DOI QR코드

DOI QR Code

On-Line Audio Genre Classification using Spectrogram and Deep Neural Network

스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류

  • Yun, Ho-Won (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Shin, Seong-Hyeon (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Jang, Woo-Jin (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Park, Hochong (Dept. of Electronics Engineering, Kwangwoon University)
  • Received : 2016.09.06
  • Accepted : 2016.10.07
  • Published : 2016.11.30

Abstract

In this paper, we propose a new method for on-line genre classification using spectrogram and deep neural network. For on-line processing, the proposed method inputs an audio signal for a time period of 1sec and classifies its genre among 3 genres of speech, music, and effect. In order to provide the generality of processing, it uses the spectrogram as a feature vector, instead of MFCC which has been widely used for audio analysis. We measure the performance of genre classification using real TV audio signals, and confirm that the proposed method has better performance than the conventional method for all genres. In particular, it decreases the rate of classification error between music and effect, which often occurs in the conventional method.

본 논문은 스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류 방법을 제안한다. 제안한 방법은 온라인 동작을 위하여 1초 단위로 신호를 입력하여 speech, music, effect 중 하나의 장르로 분류하고, 동작의 범용성을 위하여 기존 오디오 분석에 널리 사용되는 MFCC 대신에 스펙트로그램 기반의 특성 벡터를 사용한다. 실제 TV 방송 신호를 사용하여 장르 분류 성능을 측정하였고, 제안 방법이 기존 방법보다 각 장르에 대하여 우수한 성능을 제공하는 것을 확인하였다. 특히 제안 방법은 기존 방법에서 나타나는 music과 effect 사이를 잘못 분류하는 문제점을 감소시킨다.

Keywords

References

  1. Daeyoung Jang, Jeongil Seo, Yong Ju Lee, Jae-hyoun Yoo, Taejin Park and Taejin Lee, "A Study on Realistic Sound Reproduction for UHDTV," Journal of Broadcast Engineering, vol 20, no. 1, pp. 68-81, Jan. 2015. https://doi.org/10.5909/JBE.2015.20.1.68
  2. G. Tzanetakis and P. Cook, "Musical Genre Classification of Audio Signals," IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 293-302, Jul. 2002. https://doi.org/10.1109/TSA.2002.800560
  3. Tao Feng, "Deep learning for music genre classification," private document.
  4. Jung-Sung Lee and Hyoung-Gook Kim, "Background Music Identification in TV Broadcasting Program Algorithm using Audio Peak Detection," Proc. of 2013 Korean Institute of Broadcast and Media Engineers Summer Conference, pp. 34-35, Jun. 2013.
  5. Z. Kons and O. Toledo-Ronen, "Audio event classification using deep neural networks," Proc. of Interspeech, pp. 1482-1486, 2013.
  6. D. Reynolds, "Gaussian Mixture Models," Encyclopedia of Biometrics, pp. 827-832, Jul. 2015.
  7. ETSI ES 202 211, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Extended Front-End Feature Extraction Algorithm; Compression Algorithm; Back-End Speech Reconstruction Algorithm," Nov. 2003.
  8. G. E. Hinton and R. R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, pp. 504-507, Jul. 2006. https://doi.org/10.1126/science.1127647
  9. N. Srivastava, G. Hinton, A. Krizhevsky and R. Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," Journal of Machine Learning Research, 15(1), pp. 1929-1958, Jun. 2014.