DOI QR코드

DOI QR Code

Environmental Sound Classification for Selective Noise Cancellation in Industrial Sites

산업현장에서의 선택적 소음 제거를 위한 환경 사운드 분류 기술

  • Choi, Hyunkook (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Kim, Sangmin (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Park, Hochong (Dept. of Electronics Engineering, Kwangwoon University)
  • Received : 2020.09.07
  • Accepted : 2020.10.21
  • Published : 2020.11.30

Abstract

In this paper, we propose a method for classifying environmental sound for selective noise cancellation in industrial sites. Noise in industrial sites causes hearing loss in workers, and researches on noise cancellation have been widely conducted. However, the conventional methods have a problem of blocking all sounds and cannot provide the optimal operation per noise type because of common cancellation method for all types of noise. In order to perform selective noise cancellation, therefore, we propose a method for environmental sound classification based on deep learning. The proposed method uses new sets of acoustic features consisting of temporal and statistical properties of Mel-spectrogram, which can overcome the limitation of Mel-spectrogram features, and uses convolutional neural network as a classifier. We apply the proposed method to five-class sound classification with three noise classes and two non-noise classes. We confirm that the proposed method provides improved classification accuracy by 6.6% point, compared with that using conventional Mel-spectrogram features.

본 논문에서는 산업현장에서의 선택적 소음 제거를 위한 환경 사운드 분류 기술을 제안한다. 산업현장에서의 소음은 작업자의 청력 손실의 주요 원인이 되며, 소음 문제를 해결하기 위한 소음 제거 기술이 널리 연구되고 있다. 그러나 기존 소음 제거 기술은 모든 소리를 구분 없이 차단하는 문제를 가지며, 모든 소음에 공통된 제거 방법을 적용하여 각 소음에 최적화된 소음 제거 성능을 보장할 수 없다. 이러한 문제를 해결하기 위해 사운드 종류에 따라 선택적 동작을 하는 소음 제거가 필요하고, 본 논문에서는 이를 위해 딥 러닝 기반의 환경 사운드 분류 기술을 제안한다. 제안 방법은 기존 오디오 특성인 멜-스펙트로그램의 한계를 극복하기 위해 새로운 특성으로서 멜-스펙트로그램 기반의 시간 변화 특성과 통계적 주파수 특성을 사용하며, 합성곱 신경망을 이용하여 특성을 모델링 한다. 제안하는 분류기를 사용하여 3가지 소음과 2가지 비소음으로 구성된 총 5가지 클래스로 사운드를 분류하였고, 제안하는 오디오 특성을 사용하여 기존 멜-스펙트로그램 특성을 사용할 때에 비하여 분류 정확도가 6.6% 포인트 향상되는 것을 확인하였다.

Keywords

Acknowledgement

이 논문은 2020년도 정부(과학기술정보통신부)의 재원으로 정보통신기술진흥센터의 지원을 받아 수행된 연구임(No.2018-0-01407).

References

  1. A. Dzhambov and D. Dimitrova, "Occupational noise exposure and the risk for work-related injury: a systematic review and meta-analysis," Annals of Work Exposures and Health, Vol. 61, No. 9, pp. 1037-1053, Nov. 2017. https://doi.org/10.1093/annweh/wxx078
  2. S. Kuo and D. Morgan, "Active noise control : a tutorial review," Proceedings of the IEEE, Vol. 87, No. 6, pp. 943-973, June 1999. https://doi.org/10.1109/5.763310
  3. S. Suh, W. Lim, Y. Jeong, T. Lee and H. Kim, "Dual CNN structured sound event detection algorithm based on real life acoustic dataset," J. of Broadcast Engineering, Vol. 23, No. 6, pp. 855-865, 2018. https://doi.org/10.5909/JBE.2018.23.6.855
  4. K. J. Piczak, "Environmental sound classification with convolutional neural networks," in Proc. of IEEE Int. Workshop on Machine Learning for Signal Processing (MLSP), Boston, pp. 1-6, Sep. 2015.
  5. H. W. Yun, S. H. Shin, W. J. Jang and H. Park, "On-line audio genre classification using spectrogram and deep neural network," J. of Broadcast Engineering, Vol. 21, No. 6, pp. 977-985, Nov. 2016. https://doi.org/10.5909/JBE.2016.21.6.977
  6. Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 521.7553, pp. 436-444, May 2015. https://doi.org/10.1038/nature14539
  7. X. Glorot, A. Bordes and Y. Bengio, "Deep sparse rectifier neural networks," in Proc. of Int. Conf. on Artificial Intelligence and Statistics, pp. 315-323. 2011.
  8. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556, 2014.
  9. S. Zagoruyko and N. Komodakis, "Wide residual networks," arXiv:1605.07146, 2016.
  10. V. Zue, S. Seneff and J. Glass, "Speech database development at MIT: TIMIT and beyond," Speech Communication, Vol. 9, No. 4, pp. 351-356, Aug. 1990. https://doi.org/10.1016/0167-6393(90)90010-7
  11. https://www.sound-ideas.com/Collection/54/2/0/Industry-Machinery-Tools-and-Office-SFX (accessed May 2019)
  12. K. He, X. Zhang, S. Ren and J. Sun, "Delving deep into rectifiers: surpassing human-level performance on ImageNet classification," in Proc. IEEE Int. Conf. on Computer Vision, Chile, pp. 1026-1034, 2015.
  13. D. P. Kingma and J. Ba, "Adam: a method for stochastic optimization," arXiv:1412.6980, 2014.
  14. N. Srivastava, G. Hinton, A. Krizhevesky and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," J. of Machine Learning Research, Vol. 15, No. 1, pp. 1929-1958, June 2014.
  15. A. Krogh and J. Vedelsby, "Neural network ensembles, cross validation, and active learning," Advances in Neural Information Processing Systems, pp. 231-238, 1995.