[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2020.08.013

Animal Sounds Classification Scheme Based on Multi-Feature Network with Mixed Datasets

Kim, Chung-Il (School of Electrical Engineering, Korea University)
Cho, Yongjang (School of Electrical Engineering, Korea University)
Jung, Seungwon (School of Electrical Engineering, Korea University)
Rew, Jehyeok (School of Electrical Engineering, Korea University)
Hwang, Eenjun (School of Electrical Engineering, Korea University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.14, no.8, 2020 , pp. 3384-3398 More about this Journal

Abstract

In recent years, as the environment has become an important issue in dealing with food, energy, and urban development, diverse environment-related applications such as environmental monitoring and ecosystem management have emerged. In such applications, automatic classification of animals using video or sound is very useful in terms of cost and convenience. So far, many works have been done for animal sounds classification using artificial intelligence techniques such as a convolutional neural network. However, most of them have dealt only with the sound of a specific class of animals such as bird sounds or insect sounds. Due to this, they are not suitable for classifying various types of animal sounds. In this paper, we propose a sound classification scheme based on a multi-feature network for classifying sounds of multiple species of animals. To do that, we first collected multiple animal sound datasets and grouped them into classes. Then, we extracted their audio features by generating mixed records and used those features for training. To evaluate the effectiveness of our scheme, we constructed an animal sound classification model and performed various experiments. We report some of the results.

Keywords

Environmental monitoring; Animal sound classification; Convolutional neural networks;

Citations & Related Records

Reference

1	G. M. Lovett, D. A. Burns, C. T. Driscoll, J. C. Jenkins, M. J. Mitchell, L. Rustad, J. B. Shanley, G. E. Likens, and R. Haeuber, "Who needs environmental monitoring?," Frontiers in Ecology and the Environment, vol. 5, no. 5, pp. 253-260, 2007. DOI
2	F. Briggs, Y. Huang, R. Raich, K. Eftaxias, Z. Lei, W. Cukierski, S. F. Hadley, A. Hadley, M. Betts, and X. Z. Fern, "The 9th annual MLSP competition: new methods for acoustic classification of multiple simultaneous bird species in a noisy environment," in Proc. of 2013 IEEE international workshop on machine learning for signal processing (MLSP), pp. 1-8, 2013.
3	D. Pimentel, and M. Burgess, "Environmental and economic costs of the application of pesticides primarily in the United States," Integrated pest management, pp. 47-71, 2014.
4	M. Q. Benedict, and A. S. Robinson, "The first releases of transgenic mosquitoes: an argument for the sterile insect technique," Trends in parasitology, vol. 19, no. 8, pp. 349-355, 2003. DOI
5	A. D. Garg, and R. V. Hippargi, "Significance of frogs and toads in environmental conservation," 2007.
6	F. Su, L. Yang, T. Lu, and G. Wang, "Environmental sound classification for scene recognition using local discriminant bases and HMM," in Proc. of the 19th ACM international conference on Multimedia, pp. 1389-1392, 2011.
7	P. Jancovic, and M. Kokuer, "Automatic detection and recognition of tonal bird sounds in noisy environments," EURASIP Journal on Advances in Signal Processing, vol. 2011, no. 1, pp. 982936, 2011. DOI
8	I. Potamitis, S. Ntalampiras, O. Jahn, and K. Riede, "Automatic bird sound detection in long real-field recordings: Applications and tools," Applied Acoustics, vol. 80, pp. 1-9, 2014. DOI
9	C. M. Bishop, Neural networks for pattern recognition, Oxford university press, 1995.
10	X. Zhang, Y. Zou, and W. Shi, "Dilated convolution neural network with LeakyReLU for environmental sound classification," in Proc. of 2017 22nd International Conference on Digital Signal Processing (DSP), pp. 1-5, 2017.
11	H. Zhang, I. McLoughlin, and Y. Song, "Robust sound event recognition using convolutional neural networks," in Proc. of 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 559-563, 2015.
12	T. Pellegrini, "Densely connected CNNs for bird audio detection," in Proc. of 2017 25th European Signal Processing Conference (EUSIPCO), pp. 1734-1738, 2017.
13	E. Cakir, S. Adavanne, G. Parascandolo, K. Drossos, and T. Virtanen, "Convolutional recurrent neural networks for bird audio detection," in Proc. of 2017 25th European Signal Processing Conference (EUSIPCO), pp. 1744-1748, 2017.
14	D. Stowell, M. Wood, Y. Stylianou, and H. Glotin, "Bird detection in audio: a survey and a challenge," in Proc. of 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1-6, 2016.
15	D. Stowell, M. D. Wood, H. Pamula, Y. Stylianou, and H. Glotin, "Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge," Methods in Ecology and Evolution, vol. 10, no. 3, pp. 368-380, 2019. DOI
16	I. Sobieraj, Q. Kong, and M. D. Plumbley, "Masked non-negative matrix factorization for bird detection using weakly labeled data," in Proc. of 2017 25th European Signal Processing Conference (EUSIPCO), pp. 1769-1773, 2017.
17	K. Ko, S. Park, and H. Ko, "Convolutional feature vectors and support vector machine for animal sound classification," in Proc. of 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 376-379, 2018.
18	E. Fonseca, J. Pons Puig, X. Favory, F. Font Corbera, D. Bogdanov, A. Ferraro, S. Oramas, A. Porter, and X. Serra, "Freesound datasets: a platform for the creation of open audio datasets," in Proc. of Hu X, Cunningham SJ, Turnbull D, Duan Z, editors. Proceedings of the 18th ISMIR Conference; 2017 oct 23-27; Suzhou, China.[Canada]: International Society for Music Information Retrieval; 2017, p. 486-493, 2017.
19	B. L. Sullivan, C. L. Wood, M. J. Iliff, R. E. Bonney, D. Fink, and S. Kelling, "eBird: A citizen-based bird observation network in the biological sciences," Biological conservation, vol. 142, no. 10, pp. 2282-2292, 2009. DOI
20	E. Fonseca, M. Plakal, F. Font, D. P. Ellis, X. Favory, J. Pons, and X. Serra, "General-purpose tagging of freesound audio with audioset labels: Task description, dataset, and baseline," arXiv preprint arXiv:1807.09902, 2018.
21	Y. Tokozume, and T. Harada, "Learning environmental sounds with end-to-end convolutional neural network," in Proc. of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2721-2725, 2017.
22	X. Li, V. Chebiyyam, and K. Kirchhoff, "Multi-stream network with temporal attention for environmental sound classification," Proc. Interspeech 2019, 3604-3608, 2019.
23	Y. Su, K. Zhang, J. Wang, and K. Madani, "Environment sound classification using a two-stream CNN based on decision-level fusion," Sensors, vol. 19, no. 7, pp. 1733, 2019. DOI
24	M. Sahidullah, and G. Saha, "Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition," Speech communication, vol. 54, no. 4, pp. 543-565, 2012. DOI
25	Z. Zhang, S. Xu, T. Qiao, S. Zhang, and S. Cao, "Attention based convolutional recurrent neural network for environmental sound classification," in Proc. of Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pp. 261-271, 2019.
26	S. Li, Y. Yao, J. Hu, G. Liu, X. Yao, and J. Hu, "An ensemble stacked convolutional neural network model for environmental event sound recognition," Applied Sciences, vol. 8, no. 7, pp. 1152, 2018. DOI
27	S. S. Stevens, J. Volkmann, and E. B. Newman, "A scale for the measurement of the psychological magnitude pitch," The Journal of the Acoustical Society of America, vol. 8, no. 3, pp. 185-190, 1937. DOI
28	K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.