DOI QR코드

DOI QR Code

Temporal attention based animal sound classification

시간 축 주의집중 기반 동물 울음소리 분류

  • 김정민 (고려대학교 전기전자공학과) ;
  • 이영로 (고려대학교 전기전자공학과) ;
  • 김동현 (고려대학교 전기전자공학과) ;
  • 고한석 (고려대학교 전기전자공학과)
  • Received : 2020.07.22
  • Accepted : 2020.08.24
  • Published : 2020.09.30

Abstract

In this paper, to improve the classification accuracy of bird and amphibian acoustic sound, we utilize GLU (Gated Linear Unit) and Self-attention that encourages the network to extract important features from data and discriminate relevant important frames from all the input sequences for further performance improvement. To utilize acoustic data, we convert 1-D acoustic data to a log-Mel spectrogram. Subsequently, undesirable component such as background noise in the log-Mel spectrogram is reduced by GLU. Then, we employ the proposed temporal self-attention to improve classification accuracy. The data consist of 6-species of birds, 8-species of amphibians including endangered species in the natural environment. As a result, our proposed method is shown to achieve an accuracy of 91 % with bird data and 93 % with amphibian data. Overall, an improvement of about 6 % ~ 7 % accuracy in performance is achieved compared to the existing algorithms.

본 논문에서는 조류와 양서류 울음소리의 구별 정확도를 높이기 위해 게이트 선형유닛과 자가주의 집중 모듈을 활용해서 데이터의 중요한 부분을 중심으로 특징 추출 및 데이터 프레임의 중요도를 판별해 구별 정확도를 높인다. 이를 위해 먼저 1차원의 음향 데이터를 로그 멜 스펙트럼으로 변환한다. 로그 멜 스펙트럼에서 배경잡음같이 중요하지 않은 정보는 게이트 선형유닛을 거쳐 제거한다. 그러고 난 뒤 시간 축에 자가주의집중기법을 적용해 구별 정확도를 높인다. 사용한 데이터는 자연환경에서 멸종위기종을 포함한 조류 6종의 울음소리와 양서류 8종의 울음소리로 구성했다. 그 결과, 게이트 선형유닛 알고리즘과 시간 축에서 자가주의집중을 적용한 구조의 평균 정확도는 조류를 구분했을 때 91 %, 양서류를 구분했을 때 93 %의 분류율을 보였다. 또한, 기존 알고리즘보다 약 6 % ~ 7 % 향상된 정확도를 보이는 것을 확인했다.

Keywords

References

  1. K. Ko, J. Park, D. K. Han, and H. Ko, "Channel and frequency attention module for diverse animal sound classification," Proc. IEICE Trans. on Information and Systems, E102.D, 2615-2618 (2019). https://doi.org/10.1587/transinf.2019EDL8128
  2. J. Strout, B. Rogan, S. M. M. Seyednezhad, K. Smart, M. Bush, and E. Ribeiro, "Anuran call classification with deep learning," Proc. IEEE ICASSP. 2662-2665 (2017).
  3. X. Dong, N. Yan, and Y. Wei, "Insect sound recognition based on convolutional neural network," Proc, IEEE 3rd ICIVC. 855-859 (2018).
  4. J. Salamon, J. P. Bello, A. Farnsworth, and S. Kelling, "Fusing shallow and deep learning for bioacoustic bird species classification," Proc. IEEE ICASSP. 141-145 (2017).
  5. K. Ko, S. Park, and H. Ko, "Convolutional feature vectors and support vector machine for animal sound classification," IEEE 40th EMBC. 376-379 (2018).
  6. Y. Xu, Q. Kong, W. Wang, and M. D. Plumbley, "Large-scale weakly supervised audio classification using gated convolutional neural network," Proc. IEEE ICASSP. 121-125 (2018).
  7. J. Yan, Y. Song, W. Guo, L.-R. Dai, I. McLoughlin, and L. Chen, "A region based attention method for weakly supervised sound event detection and classification," Proc. IEEE ICASSP. 755-759 (2019).
  8. K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, and K. Takeda, "Weakly-supervised sound event detection with self-attention," Proc. IEEE ICASSP. 66-70 (2020).
  9. Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," Proc. ICML. 933-941 (2017).
  10. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in NPS30. 5998-6008 (2017).
  11. S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167 (2015).
  12. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in NPS25. 1097-1105 (2012).
  13. K. Ko, S. Park, and H. Ko, "Convolutional neural network based amphibian sound classification using covariance and modulogram" (in Korean), J. Acoust. Soc. Kr. 37, 60-65 (2018).