DOI QR코드

DOI QR Code

A Study on the Classification of Fault Motors using Sound Data

소리 데이터를 이용한 불량 모터 분류에 관한 연구

  • Il-Sik, Chang (Graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology) ;
  • Gooman, Park (Graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology)
  • 장일식 (서울과학기술대학교 나노IT디자인융합대학원) ;
  • 박구만 (서울과학기술대학교 나노IT디자인융합대학원)
  • Received : 2022.06.17
  • Accepted : 2022.09.29
  • Published : 2022.11.30

Abstract

Motor failure in manufacturing plays an important role in future A/S and reliability. Motor failure is detected by measuring sound, current, and vibration. For the data used in this paper, the sound of the car's side mirror motor gear box was used. Motor sound consists of three classes. Sound data is input to the network model through a conversion process through MelSpectrogram. In this paper, various methods were applied, such as data augmentation to improve the performance of classifying fault motors and various methods according to class imbalance were applied resampling, reweighting adjustment, change of loss function and representation learning and classification into two stages. In addition, the curriculum learning method and self-space learning method were compared through a total of five network models such as Bidirectional LSTM Attention, Convolutional Recurrent Neural Network, Multi-Head Attention, Bidirectional Temporal Convolution Network, and Convolution Neural Network, and the optimal configuration was found for motor sound classification.

제조에서의 모터 불량은 향후 A/S 및 신뢰성에 중요한 역활을 한다. 모터의 불량 구분은 소리, 전류, 진동등의 측정을 통해 검출한다. 본 논문에서 사용한 데이터는 자동차 사이드미러 모터 기어박스의 소리를 사용하였다. 모터 소리는 3가지의 클래스로 구성되어 있다. 소리 데이터는 멜스펙트로그램을 통한 변환 과정을 거쳐 네트워크 모델에 입력된다. 본 논문에서는 불량 모터 구분 성능을 올리기 위한 데이터 증강, 클래스 불균형에 따는 다양한 데이터 재샘플링, 재가중치 조절, 손실함수의 변경, 표현 학습과 클래스 구분의 두 단계 분리 방법 등 다양한 방법을 적용하였으며, 추가적으로 커리큘럼 러닝 방법, 자기 스페이스 학습 방법 등을 Bidirectional LSTM Attention, Convolutional Recurrent Neural Network, Multi-Head Attention, Bidirectional Temporal Convolution Network, Convolution Neural Network 등 총 5가지 네트워크 모델을 통하여 비교하고, 모터 소리 구분에 최적의 구성을 찾을 수 있었다.

Keywords

References

  1. A. Khamparia, D. Gupta, N. G. Nguyen, A. Khanna, B. Pandey and P. Tiwari, "Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network," in IEEE Access, vol. 7, pp. 7717-7727, 2019. doi: http://doi.org/10.1109/ACCESS.2018.2888882   
  2. K. Jaiswal and D. Kalpeshbhai Patel, "Sound Classification Using Convolutional Neural Networks," 2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 81-84, 2018. doi: http://doi.org/10.1109/CCEM.2018.00021   
  3. P. Tzirakis, J. Zhang and B. W. Schuller, "End-to-End Speech Emotion Recognition Using Deep Neural Networks," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5089-5093, 2018. doi: http://doi.org/10.1109/ICASSP.2018.8462677   
  4. Zhichao Zhang, Shugong Xu, Tianhao Qiao, Shunqing Zhang, Shan Cao, "Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification", 2019. arXiv:1907.02230   
  5. S. Wyatt et al., "Environmental Sound Classification with Tiny Transformers in Noisy Edge Environments," 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), pp. 309-314, 2021. doi: http://doi.org/10.1109/WF-IoT51360.2021.9596007   
  6. Alexey Dosovitskiy, Lucas Beyer et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," 2020. arXiv:2010.11929   
  7. Yue, Zhihan, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yu Tong and Bixiong Xu. "TS2Vec: Towards Universal Representation of Time Series.," 2021. arXiv:2106.10466   
  8. Sepp Hochreiter, Jurgen Schmidhuber, "Long Short-Term Memory," Neural computation 9, 1735-80, 1997. doi: http://doi.org/10.1162/neco.1997.9.8.1735   
  9. B. Shi, X. Bai and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 1 Nov. 2017. doi: http://doi.org/10.1109/TPAMI.2016.2646371   
  10. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, "Attention is All you Need," 31st Conference on Neural Information Processing Systems (NIPS), 2017.   
  11. Shaojie Bai, J. Zico Kolter, Vladlen Koltun, "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling," 2018. arXiv:1803.01271   
  12. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, Volume 60, Issue 6, pp 84-90, June 2017. doi: http://doi.org/10.1145/3065386   
  13. Bengio, Y., Louradour, J., Collobert, R., & Weston, J. "Curriculum learning. In Proceedings of the 26th annual international conference on machine learning," pp. 41-48, June, 2009.   
  14. Kumar, M., Packer, B., & Koller, D. "Self-paced learning for latent variable models," Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 1189-1197, 2010.   
  15. Qingsong Wen, Liang Sun et al, "Time Series Data Augmentation for Deep Learning: A Survey," Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence Survey Track, pp. 4653-4660, 2021. doi: http://doi.org/10.24963/ijcai.2021/631   
  16. Park, Daniel S., et al. "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition," Proc. Interspeech 2019, pp. 2613-2617, 2019.   
  17. D. S. Park et al., "Specaugment on Large Scale Datasets," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6879-6883, 2020. doi: http://doi.org/10.1109/ICASSP40776.2020.9053205   
  18. Xingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng, "SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition," 2019. arXiv:1912.05533    
  19. Helin Wang and Yuexian Zou and Wenwu Wang. "SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification", arXiv:2103.16858v3   
  20. Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis "Decoupling Representation and Classifier for Long-Tailed Recognition," ICLR 2020.
  21. Y. Cui, M. Jia, T. -Y. Lin, Y. Song and S. Belongie, "Class-Balanced Loss Based on Effective Number of Samples," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9260-9269, 2019. doi: http://doi.org/10.1109/CVPR.2019.00949
  22. T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollar, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999-3007, 2017. doi: http://doi.org/10.1109/ICCV.2017.324
  23. Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma, "Learning imbalanced datasets with label-distribution-aware margin loss," Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No:140, pp.1567-1578, December 2019.
  24. W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj and L. Song, "SphereFace: Deep Hypersphere Embedding for Face Recognition," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6738-6746, 2017. doi: http://doi.org/10.1109/CVPR.2017.713
  25. H. Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5265-5274, 2018. doi: http://doi.org/10.1109/CVPR.2018.00552
  26. J. Deng, J. Guo, N. Xue and S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685-4694, 2019. doi: http://doi.org/10.1109/CVPR.2019.00482
  27. J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.
  28. Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. doi: http://doi.org/10.1371/journal.pone.0196391
  29. Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. "A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals", in Proc. ISMIR, pp. 559-564, 2012.