DOI QR코드

DOI QR Code

Convolutional neural network based amphibian sound classification using covariance and modulogram

공분산과 모듈로그램을 이용한 콘볼루션 신경망 기반 양서류 울음소리 구별

  • 고경득 (고려대학교 전기전자공학과) ;
  • 박상욱 (고려대학교 전기전자공학과) ;
  • 고한석 (고려대학교 전기전자공학과)
  • Received : 2017.12.15
  • Accepted : 2018.01.30
  • Published : 2018.01.31

Abstract

In this paper, a covariance matrix and modulogram are proposed for realizing amphibian sound classification using CNN (Convolutional Neural Network). First of all, a database is established by collecting amphibians sounds including endangered species in natural environment. In order to apply the database to CNN, it is necessary to standardize acoustic signals with different lengths. To standardize the acoustic signals, covariance matrix that gives distribution information and modulogram that contains the information about change over time are extracted and used as input to CNN. The experiment is conducted by varying the number of a convolutional layer and a fully-connected layer. For performance assessment, several conventional methods are considered representing various feature extraction and classification approaches. From the results, it is confirmed that convolutional layer has a greater impact on performance than the fully-connected layer. Also, the performance based on CNN shows attaining the highest recognition rate with 99.07 % among the considered methods.

본 논문에서는 양서류 울음소리 구별을 CNN(Convolutional Neural Network)에 적용하기 위한 방법으로 공분산 행렬과 모듈로그램(modulogram)을 제안한다. 먼저, 멸종 위기 종을 포함한 양서류 9종의 울음소리를 자연 환경에서 추출하여 데이터베이스를 구축했다. 구축된 데이터를 CNN에 적용하기 위해서는 길이가 다른 음향신호를 정형화하는 과정이 필요하다. 음향신호를 정형화하기 위해서 분포에 대한 정보를 나타내는 공분산 행렬과 시간에 대한 변화를 내포하는 모듈로그램을 추출하여, CNN의 입력으로 사용했다. CNN은 convolutional layer와 fully-connected layer의 수를 변경해 가며 실험하였다. 추가적으로, CNN의 성능을 비교하기 위해 기존에 음향 신호 분석에서 쓰이는 알고리즘과 비교해보았다. 그 결과, convolutional layer가 fully-connected layer보다 성능에 큰 영향을 끼치는 것을 확인했다. 또한 CNN을 사용하였을 때 99.07 % 인식률로, 기존에 음향분석에 쓰이는 알고리즘 보다 높은 성능을 보인 것을 확인했다.

Keywords

References

  1. S. Park, W. Choi, and H. Ko, "Acoustic event filterbank for enabling robust event recognition by cleaning robot," IEEE Trans. Consu. Electro., 61, 189-196 (2015). https://doi.org/10.1109/TCE.2015.7150593
  2. M. J. Alam, P. Kenny, and D. O'Shaughnessy, "Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique," Digitanl Signal Processing, 29, 147-157 (2014). https://doi.org/10.1016/j.dsp.2014.03.001
  3. S. Park, Y. Lee, D. K. Han, and H. Ko, "Subspace projection cepstral coefficients for noise robust acoustic event recognition," Proc. ICASSP, 761-765 (2017).
  4. J. J. Noda, C. M. Travieso, D. Sanchez-Rodriguez, M. K. Dutta, and A. Singh, "Using bioacoustic signals and support vector machine for automatic classification of insects," Proc. SPIN, 656-659 (2016).
  5. X. Zhuang, J. Huang, G. Potamianos, and M. Hasegawa-Johnson, "Acoustic fall detection using Gaussian mixture models and GMM supervectors," Proc. ICASSP, 69-72 (2009).
  6. G. Hinton, L. Deng, D. Yu, G. E. Dahi, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and Brian Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine, 29, 82-97 (2012).
  7. N. Hassan, D. A. Ramli, and H. Jaafar, "Deep neural network approach to frog species recognition," Proc. CSPA, 173-178 (2017).
  8. M. A. Raghuram, N. R. Chavan, R. Belur, and S. G. Koolagudi, "Bird classification based on their sound patterns," Int. J. Speech Technology, 19, 791-804 (2016). https://doi.org/10.1007/s10772-016-9372-2
  9. R. Wang, H. Guo, L. S. Davis, and Q. Dai, "Covariance discriminative learning: A natural and efficient approach to image set classification," Proc. CVPR, 2496-2503 (2012).
  10. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Proc. NIPS, 1097-1105 (2012).
  11. G. E. Dahl, T. N. Sainath, and G. E. Hinton, "Improving deep neural networks for LVCSR using rectified linear units and dropout," Proc. ICASSP, 8609-8613 (2013).