DOI QR코드

DOI QR Code

Enhanced Sound Signal Based Sound-Event Classification

향상된 음향 신호 기반의 음향 이벤트 분류

  • 최용주 (CJ대한통운 정보전략팀) ;
  • 이종욱 (고려대학교 컴퓨터융합소프트웨어학과) ;
  • 박대희 (고려대학교 컴퓨터융합소프트웨어학과) ;
  • 정용화 (고려대학교 컴퓨터융합소프트웨어학과)
  • Received : 2018.12.20
  • Accepted : 2019.02.03
  • Published : 2019.05.31

Abstract

The explosion of data due to the improvement of sensor technology and computing performance has become the basis for analyzing the situation in the industrial fields, and various attempts to detect events based on such data are increasing recently. In particular, sound signals collected from sensors are used as important information to classify events in various application fields as an advantage of efficiently collecting field information at a relatively low cost. However, the performance of sound-event classification in the field cannot be guaranteed if noise can not be removed. That is, in order to implement a system that can be practically applied, robust performance should be guaranteed even in various noise conditions. In this study, we propose a system that can classify the sound event after generating the enhanced sound signal based on the deep learning algorithm. Especially, to remove noise from the sound signal itself, the enhanced sound data against the noise is generated using SEGAN applied to the GAN with a VAE technique. Then, an end-to-end based sound-event classification system is designed to classify the sound events using the enhanced sound signal as input data of CNN structure without a data conversion process. The performance of the proposed method was verified experimentally using sound data obtained from the industrial field, and the f1 score of 99.29% (railway industry) and 97.80% (livestock industry) was confirmed.

센서 기술과 컴퓨팅 성능의 향상으로 인한 데이터의 폭증은 산업 현장의 상황을 분석하기 위한 토대가 되었으며, 이와 같은 데이터를 기반으로 현장에서 발생하는 다양한 이벤트를 탐지 및 분류하려는 시도들이 최근 증가하고 있다. 특히 음향 센서는 상대적으로 저가의 가격으로 현장 정보를 왜곡 없이 음향 신호를 수집할 수 있다는 큰 장점을 기반으로 다양한 분야에 설치되고 있다. 그러나 소리 취득 시 발생하는 잡음을 효과적으로 제어하지 못한다면 산업 현장의 이벤트를 안정적으로 분류할 수 없으며, 분류하지 못한 이벤트가 이상 상황이라면 이로 인한 피해는 막대해질 수 있다. 본 연구에서는 잡음 상황에서도 강인한 시스템을 보장하기 위하여, 딥러닝 알고리즘을 기반으로 잡음의 영향을 개선 시킨 음향 신호를 생성한 후, 해당 음향 이벤트를 분류할 수 있는 시스템을 제안한다. 특히, GAN을 기반으로 VAE 기술을 적용한 SEGAN을 활용하여 아날로그 음향 신호 자체에서 잡음이 제거된 신호를 생성하였으며, 향상된 음향 신호를 데이터 변환과정 없이 CNN 구조의 입력 데이터로 활용한 후 음향 이벤트에 대한 식별까지도 가능하도록 end-to-end 기반의 음향 이벤트 분류 시스템을 설계하였다. 산업 현장에서 취득한 음향 데이터를 활용하여 제안하는 시스템의 성능을 실험적으로 검증한바, 99.29%(철도산업)와 97.80%(축산업)의 안정적인 분류 성능을 확인하였다.

Keywords

JBCRJM_2019_v8n5_193_f0001.png 이미지

Fig. 1. Overall Structure of the Proposed Method

JBCRJM_2019_v8n5_193_f0002.png 이미지

Fig. 2. Sample Waveform and Spectrogram of Railway Point Machine Sound Data

JBCRJM_2019_v8n5_193_f0003.png 이미지

Fig. 3. F1 Score of the Proposed Method on Railway Sound Data Under Various Noise Conditions

JBCRJM_2019_v8n5_193_f0004.png 이미지

Fig. 4. Sample Waveform and Spectrogram of Porcine Sound Data

JBCRJM_2019_v8n5_193_f0005.png 이미지

Fig. 5. F1 Score of the Proposed Method on Porcine Sound Data Under Various Noise Conditions

Table 1. Basic Statistics of Environmental Noise on Railway Point Machine Sound Data

JBCRJM_2019_v8n5_193_t0001.png 이미지

Table 2. Results of Similarity Measurement Between Noisy Signal and Enhanced Signal on Railway Sound Data

JBCRJM_2019_v8n5_193_t0002.png 이미지

Table 3. Basic Statistics of Environmental Noise on Porcine Sound Data

JBCRJM_2019_v8n5_193_t0003.png 이미지

Table 4. Results of Similarity Measurement Between Noisy Signal and Enhanced Signal on Porcine Sound Data

JBCRJM_2019_v8n5_193_t0004.png 이미지

Table 5. Quantitative and Qualitative Comparison Analysis Between the Proposed Method and Other Methods

JBCRJM_2019_v8n5_193_t0005.png 이미지

References

  1. Y. Choi, J. Lee, D. Park, and Y. Chung, “Noise-Robust Porcine Respiratory Diseases Classification Using Texture Analysis and CNN,” KIPS Transactions on Software and Data Engineering, Vol. 7, No. 3, pp. 91-98, 2018. https://doi.org/10.3745/KTSDE.2018.7.3.91
  2. Y. Kim, J. Sa, Y. Chung, D. Park, and S. Lee, “Resource-Efficient Pet Dog Sound Events Classification Using LSTMFCN Based on Time-Series Data,” Sensors, Vol. 8, No. 18, pp. 4019, 2018.
  3. J. Sa, Y. Choi, Y. Chung, H. Kim, D. Park, and S. Yoon, "Replacement Condition Detection of Railway Point Machines Using an Electric Current Sensor," Sensors, Vol. 17, pp. 263, 2017. https://doi.org/10.3390/s17020263
  4. J. Salamon and J.P. Bello, “Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification,” IEEE Signal Processing Letters, Vol. 24, No. 3, pp. 279-283, 2017. https://doi.org/10.1109/LSP.2017.2657381
  5. J. Lee, H. Choi, D. Park, Y. Chung, H.Y. Kim, and S. Yoon, “Fault Detection and Diagnosis of Railway Point Machines by Sound Analysis,” Sensors, Vol. 16, No. 4, pp. 549, 2016. https://doi.org/10.3390/s16040549
  6. Y. Choi, J. Lee, D. Park, J. Lee, Y. Chung, H.Y. Kim, and S. Yoon, “Stress Detection of Railway Point Machine Using Sound Analysis,” KIPS Transactions on Software and Data Engineering, Vol. 5, No. 9, pp. 433-440, 2016. https://doi.org/10.3745/KTSDE.2016.5.9.433
  7. M. Guarino, P. Jans, A. Costa, J.M. Aerts, and D. Berckmans, “Field Test of Algorithm for Automatic Cough Detection in Pig Houses,” Computers and Electronics in Agriculture, Vol. 62, No. 1, pp. 22-28, 2008. https://doi.org/10.1016/j.compag.2007.08.016
  8. Y. Chung, S. Oh, J. Lee, D. Park, H. Chang, and S. Kim, “Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance,” Sensors, Vol. 13, No. 10, pp. 12929-12942, 2013. https://doi.org/10.3390/s131012929
  9. J. Lee, L. Jin, D. Park, Y. Chung, and H. Chang, “Acoustic Features for Pig Wasting Disease Detection,” International Journal of Information Processing and Management, Vol. 6, No. 1, pp. 37-46, 2015.
  10. R. Zazo, T.N. Sainath, G. Simko, and C. Parada, "Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection," In Proceeding of Interspeech, pp. 3668-3672, 2016.
  11. H. Zhang, I. McLoughlin, Y. Song, "Robust Sound Event Recognition Using Convolutional Neural Networks," IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 559-563, 2015.
  12. Y. Choi, O. Atif, J. Lee, D. Park, and Y. Chung, “Noise-Robust Sound-Event Classification System with Texture Analysis,” Symmetry, Vol. 10, No. 9, pp. 402, 2018. https://doi.org/10.3390/sym10090402
  13. Z. Zhang, J. Geiger, J. Pohjalainen, A.E.D. Mousa, W. Jin, and B. Schuller, “Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments,” ACM Transactions on Intelligent Systems and Technology, Vol. 9, No. 5, pp. 49, 2018.
  14. Y. Choi, Y. Jung, Y. Kim, Y. Suh, and H. Kim, “An Endto-End Method for Korean Text-to-Speech Systems,” Phonetics and Speech Sciences, Vol. 10, No. 1, pp. 39-48, 2018.
  15. S. Pascual, A. Bonafonte, and J. Serra, "SEGAN: Speech Enhancement Generative Adversarial Network," In Proceedings of Interspeech, pp. 3642-3646, 2017.
  16. S. Dieleman and B. Schrauwen, "End-to-End Learning for Music Audio," IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6964-6968, 2014.
  17. R. Collobert, C. Puhrsch, and G. Synnaeve, "Wav2Letter: An End-to-End ConvNet-Based Speech Recognition System," arXiv preprint arXiv:1609.03193, 2016.
  18. Y. Zhang, W. Chan, and N. Jaitly, "Very Deep Convolutional Networks for End-to-End Speech Recognition," IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4845-4849, 2017.
  19. S. Kim, T. Hori, and S. Watanabe, "Joint CTC-Attention Based End-to-End Speech Recognition Using Multi-Task Learning," IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4835-4839, 2017.
  20. R.V. Sharan and T.J. Moir, "Noise Robust Audio Surveillance Using Reduced Spectrogram Image Feature and Oneagainst-all SVM," Neurocomputing, Vol. 158, pp. 90-99, 2015. https://doi.org/10.1016/j.neucom.2015.02.001
  21. J. Lee, Y. Choi, D. Park, and Y. Chung, “Sound Noise-Robust Porcine Wasting Diseases Detection and Classification System Using Convolutional Neural Network,” Journal of Korean Institute of Information Technology, Vol. 16, No. 5, pp. 1-13, 2018. https://doi.org/10.14801/jkiit.2018.16.5.1
  22. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Bengio, "Generative Adversarial Nets," In Advances in Neural Information Processing Systems, pp. 2672-2680, 2014.
  23. C. Zhang and Y. Peng, "Stacking VAE and GAN for Context-aware Text-to-Image Generation," IEEE Fourth International Conference on Multimedia Big Data, pp. 1-5, 2018.
  24. T. Asada, C. Roberts, and T. Koseki, "An Algorithm for Improved Performance of Railway Condition Monitoring Equipment: Alternating-Current Point Machine Case Study," Transportation Research Part C: Emerging Technologies, Vol. 30, pp. 81-92, 2013. https://doi.org/10.1016/j.trc.2013.01.008
  25. A.W. Rix, J.G. Beerends, M.P. Hollier, and A.P. Hekstra, "Perceptual Evaluation of Speech Quality (PESQ)-A New Method for Speech Quality Assessment of Telephone Networks and Codecs," IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, pp. 749-752, 2001.
  26. J. Hansen and B. Pellom, "An Effective Quality Evaluation Protocol for Speech Enhancement Algorithms," International Conference on Spoken Language Processing, Vol. 7, pp. 2819-2822, 1998.
  27. B. Shao, D. Wang, T. Li, and M. Ogihara, “Music Recommendation Based on Acoustic Features and User Access Patterns,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 17, No. 8, pp. 1602-1611, 2009. https://doi.org/10.1109/TASL.2009.2020893
  28. J. Han, M. Kamber, and J. Pei, "Data Mining: Concepts and Techniques," 3rd ed., Morgan Kaufman, San Francisco, CA, USA, 2012.
  29. S. Theodoridis and K. Koutroumbas, "Pattern Recognition," 4th ed., Academic Press: Kidlington, Oxford, UK, 2009.
  30. D.M. Powers, “Evaluation: From Precision, Recall and FFactor to ROC, Informedness Markedness and Correlation,” Journal of Machine Learning Technologies, Vol. 2, No. 1, pp. 37-63, 2011.