Browse > Article
http://dx.doi.org/10.7776/ASK.2019.38.1.039

Bird sounds classification by combining PNCC and robust Mel-log filter bank features  

Badi, Alzahra (School of Electrical Engineering, Korea University)
Ko, Kyungdeuk (School of Electrical Engineering, Korea University)
Ko, Hanseok (School of Electrical Engineering, Korea University)
Abstract
In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.
Keywords
Acoustic event recognition; Environmental sound classification; CNN (Convolutional Neural Network); Weiner filter; PNCCs (Power Normalized Cepstral Coefficients);
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 F. R. Gonzalez-Hernandez, L. P. Sanchez-Fernandez, S. Suarez-Guerra, and L. A. Sanchez-Perez, "Marine mammal sound classification based on a parallel recognition model and octave analysis," Applied Acoustics, 119, 17-28 (2017).   DOI
2 M. Malfante, J. Mars, M. D. Mura, C. Gervaise, J. I. Mars, and C. Gervaise, "Automatic fish sounds classification," J. Acoust. Soc. Am. 143, 2834-2846 (2018).   DOI
3 O. M. Aodha, R. Gibb, K. E. Barlow, E. Browning, M. Firman, R. Freeman, B. Harder, L. Kinsey, G. R. Mead, S. E. Newson, I. Pandourski, S. Parsons, J. Russ, A. Szodorary-Paradi, F. Szodoray-Paradi, E. Tilova, M. Girolami, G. Brostow, and K. E. Jones, "Bat detective-Deep learning tools for bat acoustic signal detection," PLoS Comput. Biol., 14, e1005995 (2018).   DOI
4 F. Briggs, B. Lakshminarayanan, L. Neal, X. Z. Fern, R. Raich, S. J. K. Hadley, A. S. Hadley, and M. G. Betts, "Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach." J. Acoust. Soc. Am. 131, 4640-4650 (2012).   DOI
5 K. Ko, S. Park, and H. Ko, "Convolutional feature vectors and support vector machine for animal sound classification," Proc. IEEE Eng. Med. Biol. Soc. 376-379 (2018).
6 R. Lu and Z. Duan, "Bidirectional Gru for sound event detection," Detection and Classification of Acoustic Scenes and Events (DCASE), (2017).
7 T. H. Vu and J.-C. Wang, "Acoustic scene and event recognition using recurrent neural networks," Detection and Classification of Acoustic Scenes and Events (DCASE), (2016).
8 Y. Miao, M. Gowayyed, and F. Metze, "EESEN: End-to-End speech recognition using deep RNN models and WFST-based decoding," 2015 IEEE Work. Autom. Speech Recognit. Understanding, ASRU 2015, 167-174 (2016).
9 D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, "End-to-End Attention-based large vocabulary speech recognition," Acoust. Speech Signal Process (ICASSP), 2016 IEEE Int. Conf., 4945-4949 (2016).
10 A. Ahmed, Y. Hifny, K. Shaalan, and S. Toral, "Lexicon free Arabic speech recognition recipe," Advances in Intelligent Systems and Computing, 533, 147-159 (2017).   DOI
11 C. Kim and R. M. Stern, "Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction," Proc. 10th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 28-31 (2009).
12 M. J. Alam, P. Kenny, and D. O'Shaughnessy, "Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique," Digit. Signal Process., 29, 147-157 (2014).   DOI
13 M. T. S. Al-Kaltakchi, W. L. Woo, S. S. Dlay, and J. A. Chambers, "Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification," 4th Int. Work. Biometrics Forensics (IWBF), 1-6 (2016).
14 S. Park, S. Mun, Y. Lee, D. K. Han, and H. Ko, "Analysis acoustic features for acoustic scene classification and score fusion of multi-classification systems applied to DCASE 2016 challenge," arXiv Prepr. arXiv1807.04970 (2018).
15 N. Upadhyay and R. K. Jaiswal, "Single channel speech enhancement: using Wiener filtering with recursive noise estimation," Procedia Comput. Sci., 84, 22-30 (2016).   DOI
16 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in neural information processing systems, 1097-1105 (2012).
17 S. S. Stevens, "On the psychological law," Psychological Review, 64, 153 (1957).   DOI
18 P. M. Chauhan and N. P. Desai, "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using Wiener filter," Green Computing Communication and Electrical Engineering (ICGCCEE), 1-5 (2014).
19 S. M. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation theory (PTR Prentice-Hall, Englewood Cliffs, 1993), pp. 400-409.
20 T. Gerkmann and R. C. Hendriks, "Noise power estimation based on the probability of speech presence," Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), 145-148 (2011).
21 L. Zhang, L. Zhang, and B. Du, "Deep learning for remote sensing data: A technical tutorial on the state of the art," IEEE Geosci. Remote Sens. Mag., 4, 22-40 (2016).   DOI
22 K. Ko, S. Park, and H. Ko, "Convolutional neural network based amphibian sound classification using covariance and modulogram" (in Korean), J. Acoust. Soc. Kr. 37, 60-65 (2018).
23 J. Park, W. Kim, D. K. Han, and H. Ko, "Voice activity detection in noisy environments based on double-combined fourier transform and line fitting," Sci. World J., 2014, e146040 (2014).
24 ITU-T, ITU-T P.56, Objective Measurement of Active Speech Level, 2011.
25 J. Salamon and J. P. Bello, "Deep convolutional neural networks and sata augmentation for environmental sound classification," IEEE Signal Process. Lett., 24, 279-283 (2017).   DOI
26 R. Radhakrishnan, A. Divakaran, and A. Smaragdis, "Audio analysis for surveillance applications," Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust., 158-161 (2005).