DOI QR코드

DOI QR Code

광역 스펙트로그램과 심층신경망에 기반한 중첩된 소리의 인식과 영향 분석

Recognition of Overlapped Sound and Influence Analysis Based on Wideband Spectrogram and Deep Neural Networks

  • 김영언 (서울과학기술대학교 NID융합대학원) ;
  • 박구만 (서울과학기술대학교 NID융합대학원)
  • Kim, Young Eon (Graduate School of NID Fusion, Seoul National University of Science and Technology) ;
  • Park, Gooman (Graduate School of NID Fusion, Seoul National University of Science and Technology)
  • 투고 : 2018.03.12
  • 심사 : 2018.04.23
  • 발행 : 2018.05.30

초록

많은 음성인식 시스템들은 MFCC와 HMM등의 분류 기법을 사용하여 사람의 음성을 인식한다. 그러나 이러한 음성인식 시스템은 단일 음성신호를 인식하는 것을 목적으로 설계되어, 인간과 기계사이의 일대일 음성 인식에는 적합하나, 애완동물 소리와 실내 소리같은 음성보다 다양하고 넓은 주파수의 소리 군으로 중첩된 음향 속에서 설정된 소리를 인식하기에는 제한이 있다. 중첩된 소리들의 주파수는 사람의 목소리보다 높은 최대 20 kHz까지 넓은 주파수 범위로 구성된다. 본 논문에서는 광역 사운드 스펙트로그램과 DNN에 기반한 케라스 시?셜 모델 기법을 활용하여 인지 주파수 범위를 넓게 확대하는 새로운 인식방법을 제안한다. 광역 사운드 스펙트로그램이 본 논문에서 설계된 특징 추출 및 분류 시스템과 같이 넓은 주파수 범위의 다양한 소리를 분석하고 실험하도록 채택되었다. 소리 인식률을 개선하기 위하여, 케라스 시?셜 모델이 사운드 스펙트로그램에 의하여 생성되어 추출된 특징을 사용하여 패턴인식을 수행하기 위한 방법으로 채용되었다. 제안된 특징 추출 및 분류 시스템이 광역 사운드 스펙트로그램과 케라스 시?셜 모델을 채용하여 애완동물 소리와 실내 소리같은 다양한 주파수들로 구성되어 중첩된 음향 속에서 설정된 소리를 우수하게 분류하는 것을 확인하였다. 그리고 중첩된 소리의 크기에 비례하여 인식에 미치는 특성과 영향을 단계별로 비교 분석하였다.

Many voice recognition systems use methods such as MFCC, HMM to acknowledge human voice. This recognition method is designed to analyze only a targeted sound which normally appears between a human and a device one. However, the recognition capability is limited when there is a group sound formed with diversity in wider frequency range such as dog barking and indoor sounds. The frequency of overlapped sound resides in a wide range, up to 20KHz, which is higher than a voice. This paper proposes the new recognition method which provides wider frequency range by conjugating the Wideband Sound Spectrogram and the Keras Sequential Model based on DNN. The wideband sound spectrogram is adopted to analyze and verify diverse sounds from wide frequency range as it is designed to extract features and also classify as explained. The KSM is employed for the pattern recognition using extracted features from the WSS to improve sound recognition quality. The experiment verified that the proposed WSS and KSM excellently classified the targeted sound among noisy environment; overlapped sounds such as dog barking and indoor sounds. Furthermore, the paper shows a stage by stage analyzation and comparison of the factors' influences on the recognition and its characteristics according to various levels of noise.

키워드

참고문헌

  1. J. Jo, H. Yoo, S. Cha and I. Park, "Optimization of Floating-Point Bit-width for MFCC Feature Extraction," The Institute of electronics and information engineers of Korea, summer academic conference, Vol. 36, No. 1, pp. 1194-1197, June 2013..
  2. H. Park, S. Kim, M. Jin and C. You, “The latest Speech Recognition Technology Trends Based on Machine Learning,” The Magazine of the IEEE, Vol. 41, No. 3, pp. 18-27, March 2014.
  3. J. Cho, “A Spectral Compensation Method for Noise Robust Speech Recognition,” The Institute of electronics and information engineers of Korea, Vol. 49, No. 2, pp. 9-17, June 2012. https://doi.org/10.5573/ieek.2012.49.11.009
  4. Y. Li and G. Liu, "Sound Classification Based on Spectrogram for Surveillance Applications," Proceedings of NIDC2016, pp.293-297, June 2016.
  5. X. WANG, X. Shi, D. Yang and Y. Zhou, "Research on the Application of 3D Spectrogram in Bird Tweet and Speech Signals," 19th Chinese Control And Decision Conference, pp.7744-7747, July 2017.
  6. J. Lee, D. Yun and S. Choi, "Intelligibility Improvement of Low Bit-Rate Speech Coder Using Stochastic Spectral Equalizer," The Journal of Korean Institute of Communications and Information Sciences, Vol. 41 No. 10, pp. 1183-1185, Oct. 2016. https://doi.org/10.7840/kics.2016.41.10.1183
  7. J. Choi, D. Yun and S. Choi, "A Method of Frequency Response Normalization of Smart Phones Based on Deep Neural Networks for Virtual Reality Sound Reconstruction," The Korean institute of broadcast and media engineers, Fall academic conference, pp.19-20, Nov. 2017.
  8. A. Ghosal, R. Chakraborty, B. C. Dhara and S. K. Saha, "Song/ Instrumental Classification using Spectrogram Based Contextual Features," Cube 2012, pp.1-5, Sep. 2013.
  9. J. Dennis, H. Tran and H. Li, “Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions,” IEEE Signal Processing Letters, Vol. 18, No. 2, pp. 130-133, Feb. 2011. https://doi.org/10.1109/LSP.2010.2100380
  10. N. Koluguri, G. NishaMeenakshi, and P. Ghosh, “Spectrogram Enhancement Using Multiple Window Savitzky-Golay (MWSG) Filter for Robust Bird Sound Detection,” IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 25, No. 6, pp. 1183-1192, June 2017. https://doi.org/10.1109/TASLP.2017.2690562
  11. J. Choi, Y. Park, C. Jeong and Y. Kim, "Development of a Sound Detection System for Security using the MFCC and HMM," The Korea information and communication society, Fall academic conference, pp.352-353, Sep. 2016.
  12. H. Tachibana, N. Ono, H. Kameoka and S. Sagayama, “Harmonic/Percussive Sound Separation Based on Anisotropic Smoothness of Spectrograms,” IEEE/ACM Transactions on audio, Speech and Language Processing, Vol. 22, No. 12, pp. 2059-2073, Dec. 2014. https://doi.org/10.1109/TASLP.2014.2351131