DOI QR코드

DOI QR Code

Wild Bird Sound Classification Scheme using Focal Loss and Ensemble Learning

Focal Loss와 앙상블 학습을 이용한 야생조류 소리 분류 기법

  • 이재승 (고려대학교 전기전자공학과) ;
  • 유제혁 (덕성여자대학교 데이터사이언스학과)
  • Received : 2024.01.18
  • Accepted : 2024.03.06
  • Published : 2024.04.30

Abstract

For effective analysis of animal ecosystems, technology that can automatically identify the current status of animal habitats is crucial. Specifically, animal sound classification, which identifies species based on their sounds, is gaining great attention where video-based discrimination is impractical. Traditional studies have relied on a single deep learning model to classify animal sounds. However, sounds collected in outdoor settings often include substantial background noise, complicating the task for a single model. In addition, data imbalance among species may lead to biased model training. To address these challenges, in this paper, we propose an animal sound classification scheme that combines predictions from multiple models using Focal Loss, which adjusts penalties based on class data volume. Experiments on public datasets have demonstrated that our scheme can improve recall by up to 22.6% compared to an average of single models.

효과적인 동물 생태계 분석을 위해서는 동물 서식 현황을 자동으로 파악할 수 있는 동물 관제 기술이 중요하다. 특히 울음소리로 종을 판별하는 동물 소리 분류 기술은 영상을 통한 판별이 어려운 환경에서 큰 주목을 받고 있다. 기존 연구들은 단일 딥러닝 모델을 사용하여 동물 소리를 분류하였으나, 야외 환경에서 수집된 동물 소리는 많은 배경 잡음을 포함하여 단일 모델의 판별력을 악화시키며, 종에 따른 데이터 불균형으로 인해 모델의 편향된 학습을 야기한다. 이에, 본 논문에서는 클래스의 데이터 수를 고려하여 페널티를 부여하는 Focal Loss를 사용한 여러 분류 모델의 예측결과를 앙상블을 통해 결합하여 잡음이 많은 동물 소리를 효과적으로 분류할 수 있는 기법을 제안한다. 공개 데이터 셋을 사용한 실험에서, 제안된 기법은 단일 모델의 평균 성능에 비해 Recall 기준으로 최대 22.6%의 성능 개선을 달성하였다.

Keywords

References

  1. Aggarwal, C. C. (2014). Data Classification: Algorithms and Applications, CRC Press.
  2. Cramer, A., Lostanlen, V., Farnsworth, A., Salamon, J. and Bello, J. P. (2020). Chirping up the Right Tree: Incorporating Biological Taxonomies into Deep Bioacoustic Classifiers. IEEE International Conference on Acoustics, Speech, and Signal Processing. May. 04-08, Barcelona, Spain, pp. 901-905.
  3. Fernandez, A., Garcia, S., Galar, M., Prati, R. C., Krawczyk, B. and Herrera, F. (2018). Learning from Imbalanced Data Sets, Springer.
  4. Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M. and Suganthan, P. N. (2022). Ensemble Deep Learning: A Review. Engineering Applications of Artificial Intelligence, 115. https://doi.org/10.1016/j.engappai.2022.105151
  5. Gunawan, K. W., Hidayat, A. A., Cenggoro, T. W. and Pardamean, B. (2023). Repurposing Transfer Learning Strategy of Computer Vision for Owl Sound Classification. Procedia Computer Science, 216, 424-430. https://doi.org/10.1016/j.procs.2022.12.154
  6. Hidayat, A. A., Cenggoro, T. W. and Pardamean, B. (2021). Convolutional Neural Networks for Scops Owl Sound Classification. Procedia Computer Science, 179. https://doi.org/10.1016/j.procs.2020.12.010
  7. Huang, G., Liu, Z., Maaten, L. and Weinberger, K. Q. (2017). Densely Connected Convolutional Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jul. 21-26, Honolulu, HI, USA, pp. 4700-4708.
  8. Incze, A., Jancso, H., Szilagyi, Z., Farkas, A. and Sulyok, C. (2018). Bird Sound Recognition Using a Convolutional Neural Network. IEEE 16th International Symposium on Intelligent Systems and Informatics, Sep. 13-15, Subotica, Serbia, pp. 295-300.
  9. Jeong H., Go, J. and Shin, C. (2021). Abnormal Detection with Microscope through Deep Learning. Journal of Korea Society of Industrial Information Systems, 26(2), https://doi.org/10.9723/jksiis.2021.26.2.001
  10. Kahl, S., Wood, C., Eibl, M. and Klinck, H. (2021). BirdNET: A Deep Learning Solution for Avian Diversity Monitoring. Ecological Informatics, 61, https://doi.org/10.1016/j.ecoinf.2021.101236
  11. Kim, C., Cho, Y., Jung, S., Rew, J. and Hwang, E. (2020). Animal Sounds Classification Scheme based on Multi-Feature Network with Mixed Datasets. KSI I Transactions of Internet and Information Systems, 14(8), 3384-3398, https://doi.org/10.3837/tiis.2020.08.013
  12. Kim, E., Moon, J., Shim, J. and Hwang, E. (2023). DualDiscWaveGAN-Based Data Augmentation Scheme for Animal Sound Classification. Sensors, 23(4), https://doi.org/10.3390/s23042024
  13. Kim, J., Seok, C., Kim, M. and Kim, S. (2022). A System for Recommending Audio Devices based on Frequency Band Analysis of Vocal Component in Sound Source. Journal of Korea Society of Industrial Information Systems, 27(6), 1-12, https://doi.org/10.9723/jksiis.2022.27.6.001
  14. Kim, J., Lee, Y., Kim, D. and Ko, H. (2020). Temporal Attention based Animal Sound Classification. The Journal of the Acoustical Society of Korea, 39(5), 406-413, https://doi.org/10.7776/ASK.2020.39.5.406
  15. Koh, C., Chang, J., Tai, C., Huang, D., Hsieh, H. and Liu, Y. (2019). Bird Sound Classification using Convolutional Neural Networks. Conference and Labs of the Evaluation Forum, Sep. 9-12, Lugano, Switzerland, 2380
  16. Korea Forest Service (2023). Changes in Forests due to Climate Change, https://www. forest.go.kr/ (Accessed on Jan. 03rd, 2024)
  17. Lee, W., Kim, Y., Kim, J. and Lee, C. (2020). Forecasting of Iron Ore Prices using Machine Learning. Journal of Korea Society of Industrial Information Systems, 25(2), 57-72, https://doi.org/10.9723/jksiis.2020.25.2.057
  18. Lin, T., Goyal, P., Girshick, R., He, K. and Dollar, P. (2017). Focal Loss for Dense Object Detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, Oct. 22-29, Venice, Italy, pp. 2980-2988.
  19. Martynov, E. and Uematsu, Y. (2022). Dealing with Class Imbalance in Bird Sound Classification. Conference and Labs of the Evaluation Forum, Sep. 5-8, Bologna, Italy, pp. 2151-2158.
  20. Mohammed, A. and Kora, R. (2023). A Comprehensive Review on Ensemble Deep Learning: Opportunities and Challenges. Journal of King Saud University - Computer and Information Science, 35(2), 757-774, https://doi.org/10.1016/j.jksuci.2023.01.014
  21. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L, Bai, J., and Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the Advances in Neural Information Processing Systems, Dec. 08-14, Vancouver, BC, Canada, pp. 8026-8037.
  22. Prusa, Z. and Holighaus, N. (2022). Phase Vocoder Done Right. arXiv, arXiv:2202.07382, https://doi.org/10.48550/arXiv.2202.07382
  23. Sun, Y., Maeda, T. M., Solis-Lemus, C., Pimentel-Alarcon, D. and Burivalova, Z. (2022). Classification of Animal Sounds in a Hyperdiverse Rainforest using Convolutional Neural Networks with Data Augmentation. Ecological Indicators, 145. https://doi.org/10.1016/j.ecolind.2022.109621
  24. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 26-Jul 01, Las Vegas, NV, USA, pp. 2818-2826.