Fig. 1. Feature extraction for robust log Mel-filter bank and PNCC feature.
Fig 2. Power spectral density of the extracted features of (a) log mel-filter bank of clean signal, (b) log mel-filter bank under shop noise (10 dB), (c) robust log mel-filter bank under shop noise (10 dB), (d) PNCC under shop noise, and (e) combine features under noise (10 dB).
Fig. 3. Convolutional neural network architecture for classifying using (a) single feature (baseline) (b) combined features.
Fig. 4. The confusion matrix for noise with type shop under SNR of 10 dB for (a) robust log mel-filter bank (b) PNCC features and (c) combined features.
Table 1. Bird species classification in clean environment.
Table 2. Bird species classification accuracy in noisy environment.
References
- R. Radhakrishnan, A. Divakaran, and A. Smaragdis, "Audio analysis for surveillance applications," Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust., 158-161 (2005).
- J. Salamon and J. P. Bello, "Deep convolutional neural networks and sata augmentation for environmental sound classification," IEEE Signal Process. Lett., 24, 279-283 (2017). https://doi.org/10.1109/LSP.2017.2657381
- F. R. Gonzalez-Hernandez, L. P. Sanchez-Fernandez, S. Suarez-Guerra, and L. A. Sanchez-Perez, "Marine mammal sound classification based on a parallel recognition model and octave analysis," Applied Acoustics, 119, 17-28 (2017). https://doi.org/10.1016/j.apacoust.2016.11.016
- M. Malfante, J. Mars, M. D. Mura, C. Gervaise, J. I. Mars, and C. Gervaise, "Automatic fish sounds classification," J. Acoust. Soc. Am. 143, 2834-2846 (2018). https://doi.org/10.1121/1.5036628
- O. M. Aodha, R. Gibb, K. E. Barlow, E. Browning, M. Firman, R. Freeman, B. Harder, L. Kinsey, G. R. Mead, S. E. Newson, I. Pandourski, S. Parsons, J. Russ, A. Szodorary-Paradi, F. Szodoray-Paradi, E. Tilova, M. Girolami, G. Brostow, and K. E. Jones, "Bat detective-Deep learning tools for bat acoustic signal detection," PLoS Comput. Biol., 14, e1005995 (2018). https://doi.org/10.1371/journal.pcbi.1005995
- F. Briggs, B. Lakshminarayanan, L. Neal, X. Z. Fern, R. Raich, S. J. K. Hadley, A. S. Hadley, and M. G. Betts, "Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach." J. Acoust. Soc. Am. 131, 4640-4650 (2012). https://doi.org/10.1121/1.4707424
- K. Ko, S. Park, and H. Ko, "Convolutional feature vectors and support vector machine for animal sound classification," Proc. IEEE Eng. Med. Biol. Soc. 376-379 (2018).
- R. Lu and Z. Duan, "Bidirectional Gru for sound event detection," Detection and Classification of Acoustic Scenes and Events (DCASE), (2017).
- T. H. Vu and J.-C. Wang, "Acoustic scene and event recognition using recurrent neural networks," Detection and Classification of Acoustic Scenes and Events (DCASE), (2016).
- Y. Miao, M. Gowayyed, and F. Metze, "EESEN: End-to-End speech recognition using deep RNN models and WFST-based decoding," 2015 IEEE Work. Autom. Speech Recognit. Understanding, ASRU 2015, 167-174 (2016).
- D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, "End-to-End Attention-based large vocabulary speech recognition," Acoust. Speech Signal Process (ICASSP), 2016 IEEE Int. Conf., 4945-4949 (2016).
- A. Ahmed, Y. Hifny, K. Shaalan, and S. Toral, "Lexicon free Arabic speech recognition recipe," Advances in Intelligent Systems and Computing, 533, 147-159 (2017). https://doi.org/10.1007/978-3-319-48308-5_15
- C. Kim and R. M. Stern, "Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction," Proc. 10th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 28-31 (2009).
- M. J. Alam, P. Kenny, and D. O'Shaughnessy, "Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique," Digit. Signal Process., 29, 147-157 (2014). https://doi.org/10.1016/j.dsp.2014.03.001
- M. T. S. Al-Kaltakchi, W. L. Woo, S. S. Dlay, and J. A. Chambers, "Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification," 4th Int. Work. Biometrics Forensics (IWBF), 1-6 (2016).
- S. Park, S. Mun, Y. Lee, D. K. Han, and H. Ko, "Analysis acoustic features for acoustic scene classification and score fusion of multi-classification systems applied to DCASE 2016 challenge," arXiv Prepr. arXiv1807.04970 (2018).
- N. Upadhyay and R. K. Jaiswal, "Single channel speech enhancement: using Wiener filtering with recursive noise estimation," Procedia Comput. Sci., 84, 22-30 (2016). https://doi.org/10.1016/j.procs.2016.04.061
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in neural information processing systems, 1097-1105 (2012).
- P. M. Chauhan and N. P. Desai, "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using Wiener filter," Green Computing Communication and Electrical Engineering (ICGCCEE), 1-5 (2014).
- S. M. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation theory (PTR Prentice-Hall, Englewood Cliffs, 1993), pp. 400-409.
- T. Gerkmann and R. C. Hendriks, "Noise power estimation based on the probability of speech presence," Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), 145-148 (2011).
- S. S. Stevens, "On the psychological law," Psychological Review, 64, 153 (1957). https://doi.org/10.1037/h0046162
- L. Zhang, L. Zhang, and B. Du, "Deep learning for remote sensing data: A technical tutorial on the state of the art," IEEE Geosci. Remote Sens. Mag., 4, 22-40 (2016). https://doi.org/10.1109/MGRS.2016.2540798
- K. Ko, S. Park, and H. Ko, "Convolutional neural network based amphibian sound classification using covariance and modulogram" (in Korean), J. Acoust. Soc. Kr. 37, 60-65 (2018).
- J. Park, W. Kim, D. K. Han, and H. Ko, "Voice activity detection in noisy environments based on double-combined fourier transform and line fitting," Sci. World J., 2014, e146040 (2014).
- ITU-T, ITU-T P.56, Objective Measurement of Active Speech Level, 2011.