DOI QR코드

DOI QR Code

Estimation of Optimal Mixture Number of GMM for Environmental Sounds Recognition

환경음 인식을 위한 GMM의 혼합모델 개수 추정

  • Han, Da-Jeong (Division of Electronic and Computer Engineering, Chonnam University) ;
  • Park, Aa-Ron (Division of Electronic and Computer Engineering, Chonnam University) ;
  • Baek, Sung-June (Division of Electronic and Computer Engineering, Chonnam University)
  • 한다정 (전남대학교 전자컴퓨터공학부) ;
  • 박아론 (전남대학교 전자컴퓨터공학부) ;
  • 백성준 (전남대학교 전자컴퓨터공학부)
  • Received : 2011.12.05
  • Accepted : 2012.02.10
  • Published : 2012.02.29

Abstract

In this paper we applied the optimal mixture number estimation technique in GMM(Gaussian mixture model) using BIC(Bayesian information criterion) and MDL(minimum description length) as a model selection criterion for environmental sounds recognition. In the experiment, we extracted 12 MFCC(mel-frequency cepstral coefficients) features from 9 kinds of environmental sounds which amounts to 27747 data and classified them with GMM. As mentioned above, BIC and MDL is applied to estimate the optimal number of mixtures in each environmental sounds class. According to the experimental results, while the recognition performances are maintained, the computational complexity decreases by 17.8% with BIC and 31.7% with MDL. It shows that the computational complexity reduction by BIC and MDL is effective for environmental sounds recognition using GMM.

본 논문에서는 환경음 인식에 GMM(Gaussain mixture model)을 이용할 때 MDL(minimum description length)와 BIC(Bayesian information criterion) 모델선택 기준을 이용하여 최적의 혼합모델 개수를 결정하는 방법에 대해 다루었다. 실험은 모두 9가지 종류의 환경음으로부터 12차 MFCC(mel-frequency cepstral coefficients) 특징 27747개를 추출하고 이를 GMM으로 분류하였다. 각 환경음 클래스의 최적 혼합모델 개수를 추정 하기위해 MDL과 BIC를 적용하고 그 결과를 고정 개수의 혼합모델을 사용한 경우와 비교하였다. 실험 결과에 따르면 혼합모델 선택 방법을 적용한 경우가 그렇지 않은 경우에 비해 거의 유사한 인식성능을 유지하면서 계산복잡도는 BIC와 MDL를 통해 각각 17.8%와 31.7%가 감소하는 것을 확인하였다. 이는 GMM을 이용한 환경음 인식에서 BIC와 MDL 적용을 통해 계산복잡도를 효과적으로 감소시킬 수 있음을 보여준다.

Keywords

References

  1. National Information Society Agency Information Strategy Planning Division, "Paradigm shift in the era of smart vision and ICT strategy", National Information Society Agency, 2010.
  2. Il-Young Hong, "Context-aware software, Now mind you should read beyond gesture," Korea IT Industry Promotion Agency, 2008.
  3. Jun-Qyu Park, Seong-Joon Baek, "Improvement of Environmental Sounds Recognition by Post Processing", the Korea Contents Society vol. 10, pp.31-39, 2010. https://doi.org/10.5392/JKCA.2010.10.7.031
  4. S. Chu, S. Narayana, C.-C. J. Kuo, and M. J. Mataric, "Where am I? Scene recognition for mobile robots using audio features," in Proc. ICME, 2006.
  5. S. Chu, S. Narayanan, and C.-C. Jay Kuo "Environmental Sound Recognition With Time-Frequency Audio Features," IEEE Trans. on Audio, Speech, and Language Processing, Vol.17, No.6, pp.1-16, 2009. https://doi.org/10.1109/TASL.2008.2010365
  6. Richard O.Duda, Peter E.Hart, David G.Stork, Pattern Classification, John Wiley & Sons, 2001
  7. Burnham, Kenneth P, and David R. Anderson, Model selection and Multimodal Inference : A Practical Information-Theoretic Approach Seconded. New York : Springer-Verlag, 2002
  8. G. McLachlan., D. Peel., "Finite Mixture Models," A wiley-interscience publication, 2000.
  9. S. S. Chen and P. S. Gopalkrishana, "Speaker, enviroment, and channel change detection and clustering via the Bayesian information criterion," Proceedings of the IEEE Interational Conference on vol.2, pp.645-648, 1998.
  10. J.Rissanen., "modeling by shortest data description," Automatica, vol.14, pp.465-471, 1978. https://doi.org/10.1016/0005-1098(78)90005-5