Browse > Article
http://dx.doi.org/10.5909/JBE.2022.27.6.885

A Study on the Classification of Fault Motors using Sound Data  

Il-Sik, Chang (Graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology)
Gooman, Park (Graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology)
Publication Information
Journal of Broadcast Engineering / v.27, no.6, 2022 , pp. 885-896 More about this Journal
Abstract
Motor failure in manufacturing plays an important role in future A/S and reliability. Motor failure is detected by measuring sound, current, and vibration. For the data used in this paper, the sound of the car's side mirror motor gear box was used. Motor sound consists of three classes. Sound data is input to the network model through a conversion process through MelSpectrogram. In this paper, various methods were applied, such as data augmentation to improve the performance of classifying fault motors and various methods according to class imbalance were applied resampling, reweighting adjustment, change of loss function and representation learning and classification into two stages. In addition, the curriculum learning method and self-space learning method were compared through a total of five network models such as Bidirectional LSTM Attention, Convolutional Recurrent Neural Network, Multi-Head Attention, Bidirectional Temporal Convolution Network, and Convolution Neural Network, and the optimal configuration was found for motor sound classification.
Keywords
MelSpectrogram; Data Augmentation; Class Imbalance; Sound Classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Khamparia, D. Gupta, N. G. Nguyen, A. Khanna, B. Pandey and P. Tiwari, "Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network," in IEEE Access, vol. 7, pp. 7717-7727, 2019. doi: http://doi.org/10.1109/ACCESS.2018.2888882      DOI
2 K. Jaiswal and D. Kalpeshbhai Patel, "Sound Classification Using Convolutional Neural Networks," 2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 81-84, 2018. doi: http://doi.org/10.1109/CCEM.2018.00021      DOI
3 P. Tzirakis, J. Zhang and B. W. Schuller, "End-to-End Speech Emotion Recognition Using Deep Neural Networks," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5089-5093, 2018. doi: http://doi.org/10.1109/ICASSP.2018.8462677      DOI
4 Zhichao Zhang, Shugong Xu, Tianhao Qiao, Shunqing Zhang, Shan Cao, "Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification", 2019. arXiv:1907.02230   
5 S. Wyatt et al., "Environmental Sound Classification with Tiny Transformers in Noisy Edge Environments," 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), pp. 309-314, 2021. doi: http://doi.org/10.1109/WF-IoT51360.2021.9596007      DOI
6 Alexey Dosovitskiy, Lucas Beyer et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," 2020. arXiv:2010.11929   
7 Yue, Zhihan, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yu Tong and Bixiong Xu. "TS2Vec: Towards Universal Representation of Time Series.," 2021. arXiv:2106.10466   
8 Sepp Hochreiter, Jurgen Schmidhuber, "Long Short-Term Memory," Neural computation 9, 1735-80, 1997. doi: http://doi.org/10.1162/neco.1997.9.8.1735      DOI
9 B. Shi, X. Bai and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 1 Nov. 2017. doi: http://doi.org/10.1109/TPAMI.2016.2646371      DOI
10 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, "Attention is All you Need," 31st Conference on Neural Information Processing Systems (NIPS), 2017.   
11 Shaojie Bai, J. Zico Kolter, Vladlen Koltun, "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling," 2018. arXiv:1803.01271   
12 Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, Volume 60, Issue 6, pp 84-90, June 2017. doi: http://doi.org/10.1145/3065386      DOI
13 Bengio, Y., Louradour, J., Collobert, R., & Weston, J. "Curriculum learning. In Proceedings of the 26th annual international conference on machine learning," pp. 41-48, June, 2009.   
14 Kumar, M., Packer, B., & Koller, D. "Self-paced learning for latent variable models," Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 1189-1197, 2010.   
15 Qingsong Wen, Liang Sun et al, "Time Series Data Augmentation for Deep Learning: A Survey," Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence Survey Track, pp. 4653-4660, 2021. doi: http://doi.org/10.24963/ijcai.2021/631      DOI
16 Park, Daniel S., et al. "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition," Proc. Interspeech 2019, pp. 2613-2617, 2019.   
17 Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis "Decoupling Representation and Classifier for Long-Tailed Recognition," ICLR 2020.
18 D. S. Park et al., "Specaugment on Large Scale Datasets," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6879-6883, 2020. doi: http://doi.org/10.1109/ICASSP40776.2020.9053205      DOI
19 Xingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng, "SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition," 2019. arXiv:1912.05533    
20 Helin Wang and Yuexian Zou and Wenwu Wang. "SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification", arXiv:2103.16858v3   
21 Y. Cui, M. Jia, T. -Y. Lin, Y. Song and S. Belongie, "Class-Balanced Loss Based on Effective Number of Samples," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9260-9269, 2019. doi: http://doi.org/10.1109/CVPR.2019.00949   DOI
22 T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollar, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999-3007, 2017. doi: http://doi.org/10.1109/ICCV.2017.324   DOI
23 Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma, "Learning imbalanced datasets with label-distribution-aware margin loss," Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No:140, pp.1567-1578, December 2019.
24 W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj and L. Song, "SphereFace: Deep Hypersphere Embedding for Face Recognition," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6738-6746, 2017. doi: http://doi.org/10.1109/CVPR.2017.713   DOI
25 Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. doi: http://doi.org/10.1371/journal.pone.0196391   DOI
26 H. Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5265-5274, 2018. doi: http://doi.org/10.1109/CVPR.2018.00552   DOI
27 J. Deng, J. Guo, N. Xue and S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685-4694, 2019. doi: http://doi.org/10.1109/CVPR.2019.00482   DOI
28 J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.
29 Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. "A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals", in Proc. ISMIR, pp. 559-564, 2012.