[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5909/JBE.2022.27.6.885

A Study on the Classification of Fault Motors using Sound Data

Il-Sik, Chang (Graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology)
Gooman, Park (Graduate School of Nano IT Design Fusion, Seoul National University of Science and Technology)

Publication Information

Journal of Broadcast Engineering / v.27, no.6, 2022 , pp. 885-896 More about this Journal

Abstract

Motor failure in manufacturing plays an important role in future A/S and reliability. Motor failure is detected by measuring sound, current, and vibration. For the data used in this paper, the sound of the car's side mirror motor gear box was used. Motor sound consists of three classes. Sound data is input to the network model through a conversion process through MelSpectrogram. In this paper, various methods were applied, such as data augmentation to improve the performance of classifying fault motors and various methods according to class imbalance were applied resampling, reweighting adjustment, change of loss function and representation learning and classification into two stages. In addition, the curriculum learning method and self-space learning method were compared through a total of five network models such as Bidirectional LSTM Attention, Convolutional Recurrent Neural Network, Multi-Head Attention, Bidirectional Temporal Convolution Network, and Convolution Neural Network, and the optimal configuration was found for motor sound classification.

Keywords

MelSpectrogram; Data Augmentation; Class Imbalance; Sound Classification;

Citations & Related Records

Reference

1	A. Khamparia, D. Gupta, N. G. Nguyen, A. Khanna, B. Pandey and P. Tiwari, "Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network," in IEEE Access, vol. 7, pp. 7717-7727, 2019. doi: http://doi.org/10.1109/ACCESS.2018.2888882 DOI
2	K. Jaiswal and D. Kalpeshbhai Patel, "Sound Classification Using Convolutional Neural Networks," 2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 81-84, 2018. doi: http://doi.org/10.1109/CCEM.2018.00021 DOI
3	P. Tzirakis, J. Zhang and B. W. Schuller, "End-to-End Speech Emotion Recognition Using Deep Neural Networks," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5089-5093, 2018. doi: http://doi.org/10.1109/ICASSP.2018.8462677 DOI
4	Zhichao Zhang, Shugong Xu, Tianhao Qiao, Shunqing Zhang, Shan Cao, "Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification", 2019. arXiv:1907.02230
5	S. Wyatt et al., "Environmental Sound Classification with Tiny Transformers in Noisy Edge Environments," 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), pp. 309-314, 2021. doi: http://doi.org/10.1109/WF-IoT51360.2021.9596007 DOI
6	Alexey Dosovitskiy, Lucas Beyer et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," 2020. arXiv:2010.11929
7	Yue, Zhihan, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yu Tong and Bixiong Xu. "TS2Vec: Towards Universal Representation of Time Series.," 2021. arXiv:2106.10466
8	Sepp Hochreiter, Jurgen Schmidhuber, "Long Short-Term Memory," Neural computation 9, 1735-80, 1997. doi: http://doi.org/10.1162/neco.1997.9.8.1735 DOI
9	B. Shi, X. Bai and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 1 Nov. 2017. doi: http://doi.org/10.1109/TPAMI.2016.2646371 DOI
10	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, "Attention is All you Need," 31st Conference on Neural Information Processing Systems (NIPS), 2017.
11	Shaojie Bai, J. Zico Kolter, Vladlen Koltun, "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling," 2018. arXiv:1803.01271
12	Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, Volume 60, Issue 6, pp 84-90, June 2017. doi: http://doi.org/10.1145/3065386 DOI
13	Bengio, Y., Louradour, J., Collobert, R., & Weston, J. "Curriculum learning. In Proceedings of the 26th annual international conference on machine learning," pp. 41-48, June, 2009.
14	Kumar, M., Packer, B., & Koller, D. "Self-paced learning for latent variable models," Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 1189-1197, 2010.
15	Qingsong Wen, Liang Sun et al, "Time Series Data Augmentation for Deep Learning: A Survey," Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence Survey Track, pp. 4653-4660, 2021. doi: http://doi.org/10.24963/ijcai.2021/631 DOI
16	Park, Daniel S., et al. "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition," Proc. Interspeech 2019, pp. 2613-2617, 2019.
17	Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis "Decoupling Representation and Classifier for Long-Tailed Recognition," ICLR 2020.
18	D. S. Park et al., "Specaugment on Large Scale Datasets," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6879-6883, 2020. doi: http://doi.org/10.1109/ICASSP40776.2020.9053205 DOI
19	Xingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng, "SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition," 2019. arXiv:1912.05533
20	Helin Wang and Yuexian Zou and Wenwu Wang. "SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification", arXiv:2103.16858v3
21	Y. Cui, M. Jia, T. -Y. Lin, Y. Song and S. Belongie, "Class-Balanced Loss Based on Effective Number of Samples," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9260-9269, 2019. doi: http://doi.org/10.1109/CVPR.2019.00949 DOI
22	T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollar, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999-3007, 2017. doi: http://doi.org/10.1109/ICCV.2017.324 DOI
23	Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma, "Learning imbalanced datasets with label-distribution-aware margin loss," Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No:140, pp.1567-1578, December 2019.
24	W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj and L. Song, "SphereFace: Deep Hypersphere Embedding for Face Recognition," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6738-6746, 2017. doi: http://doi.org/10.1109/CVPR.2017.713 DOI
25	Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. doi: http://doi.org/10.1371/journal.pone.0196391 DOI
26	H. Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5265-5274, 2018. doi: http://doi.org/10.1109/CVPR.2018.00552 DOI
27	J. Deng, J. Guo, N. Xue and S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685-4694, 2019. doi: http://doi.org/10.1109/CVPR.2019.00482 DOI
28	J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.
29	Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. "A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals", in Proc. ISMIR, pp. 559-564, 2012.

KSCI

A Study on the Classification of Fault Motors using Sound Data 소리 데이터를 이용한 불량 모터 분류에 관한 연구

A Study on the Classification of Fault Motors using Sound Data