DOI QR코드

DOI QR Code

양서류 울음 소리 식별을 위한 특징 벡터 및 인식 알고리즘 성능 분석

Performance assessments of feature vectors and classification algorithms for amphibian sound classification

  • 박상욱 (고려대학교 전기전자공학과) ;
  • 고경득 (고려대학교 전기전자공학과) ;
  • 고한석 (고려대학교 전기전자공학과)
  • 투고 : 2017.09.12
  • 심사 : 2017.11.29
  • 발행 : 2017.11.30

초록

본 논문에서는 양서류 울음소리를 통한 종 인식 시스템 개발을 위해, 음향 신호 분석에서 활용되는 주요 알고리즘의 인식 성능을 평가했다. 먼저, 멸종위기 종을 포함하여 총 9 종의 양서류를 선정하여, 각 종별 울음소리를 야생에서 녹음하여 실험 데이터를 구축했다. 성능평가를 위해, MFCC(Mel Frequency Cepstral Coefficient), RCGCC(Robust Compressive Gammachirp filterbank Cepstral Coefficient), SPCC(Subspace Projection Cepstral Coefficient)의 세 특징벡터와 GMM(Gaussian Mixture Model), SVM(Support Vector Machine), DBN-DNN(Deep Belief Network - Deep Neural Network)의 세 인식기가 고려됐다. 추가적으로, 화자 인식에 널리 사용되는 i-vector를 이용한 인식 실험도 수행했다. 인식 실험 결과, SPCC-SVM의 경우 98.81 %로 가장 높은 인식률을 확인 할 수 있었으며, 다른 알고리즘에서도 90 %에 가까운 인식률을 확인했다.

This paper presents the performance assessment of several key algorithms conducted for amphibian species sound classification. Firstly, 9 target species including endangered species are defined and a database of their sounds is built. For performance assessment, three feature vectors such as MFCC (Mel Frequency Cepstral Coefficient), RCGCC (Robust Compressive Gammachirp filterbank Cepstral Coefficient), and SPCC (Subspace Projection Cepstral Coefficient), and three classifiers such as GMM(Gaussian Mixture Model), SVM(Support Vector Machine), DBN-DNN(Deep Belief Network - Deep Neural Network) are considered. In addition, i-vector based classification system which is widely used for speaker recognition, is used to assess for this task. Experimental results indicate that, SPCC-SVM achieved the best performance with 98.81 % while other methods also attained good performance with above 90 %.

키워드

참고문헌

  1. S. Park, W. Choi, D. K. Han, and H. Ko "Acoustic event filterbank for enabling robust event recognition by cleaning robot," IEEE Trans. Consu. Electro., 61, 189-196 (2015). https://doi.org/10.1109/TCE.2015.7150593
  2. M. J. Alam, P. Kenny, and D. O'Shaughnessy, "Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique," Digital Signal Processing, 29, 147-157 (2014). https://doi.org/10.1016/j.dsp.2014.03.001
  3. S. Park, Y. Lee, D. K. Han, and H. Ko, "Subspace projection cepstral coefficients for noise robust acoustic event recognition," Proc. ICASSP, 761-765 (2017).
  4. G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, 18, 1527-1554 (2006). https://doi.org/10.1162/neco.2006.18.7.1527
  5. N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans. Audio, Speech. and Lang. Proc. 19, 788-798 (2011). https://doi.org/10.1109/TASL.2010.2064307
  6. J. Park, W. Kim, D. K. Han, and H. Ko, "Voice activity detection in noisy environments based on double-combined fourier transform and line fitting," The Scientific World J. 2014, 1-12 (2014).
  7. L. J. P. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Machine Learning Research, 9, 2579-2605 (2008).
  8. Z. Kons and O. Toledo-Ronen, "Audio event classification using deep neural networks," Proc. INTERSPEECH, 1482-1486 (2013).
  9. P. Kenny, G. Boulianne, and P. Dumouchel, "Eigenvoice modelling with sparse training data," IEEE Trans. Speech and Audio Processing, 13, 345-354 (2005). https://doi.org/10.1109/TSA.2004.840940
  10. M. E. Tipping and C. M. Bishop, "Mixtures of probabilistic principal component analyzers," Neural Computation, 11, 443-482 (1999). https://doi.org/10.1162/089976699300016728