Browse > Article
http://dx.doi.org/10.3745/KTSDE.2019.8.5.193

Enhanced Sound Signal Based Sound-Event Classification  

Choi, Yongju (CJ대한통운 정보전략팀)
Lee, Jonguk (고려대학교 컴퓨터융합소프트웨어학과)
Park, Daihee (고려대학교 컴퓨터융합소프트웨어학과)
Chung, Yongwha (고려대학교 컴퓨터융합소프트웨어학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.8, no.5, 2019 , pp. 193-204 More about this Journal
Abstract
The explosion of data due to the improvement of sensor technology and computing performance has become the basis for analyzing the situation in the industrial fields, and various attempts to detect events based on such data are increasing recently. In particular, sound signals collected from sensors are used as important information to classify events in various application fields as an advantage of efficiently collecting field information at a relatively low cost. However, the performance of sound-event classification in the field cannot be guaranteed if noise can not be removed. That is, in order to implement a system that can be practically applied, robust performance should be guaranteed even in various noise conditions. In this study, we propose a system that can classify the sound event after generating the enhanced sound signal based on the deep learning algorithm. Especially, to remove noise from the sound signal itself, the enhanced sound data against the noise is generated using SEGAN applied to the GAN with a VAE technique. Then, an end-to-end based sound-event classification system is designed to classify the sound events using the enhanced sound signal as input data of CNN structure without a data conversion process. The performance of the proposed method was verified experimentally using sound data obtained from the industrial field, and the f1 score of 99.29% (railway industry) and 97.80% (livestock industry) was confirmed.
Keywords
Noise Robustness; Sound Signal Generation; End-to-End Architecture; Deep Learning;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Y. Choi, J. Lee, D. Park, and Y. Chung, “Noise-Robust Porcine Respiratory Diseases Classification Using Texture Analysis and CNN,” KIPS Transactions on Software and Data Engineering, Vol. 7, No. 3, pp. 91-98, 2018.   DOI
2 Y. Kim, J. Sa, Y. Chung, D. Park, and S. Lee, “Resource-Efficient Pet Dog Sound Events Classification Using LSTMFCN Based on Time-Series Data,” Sensors, Vol. 8, No. 18, pp. 4019, 2018.
3 J. Sa, Y. Choi, Y. Chung, H. Kim, D. Park, and S. Yoon, "Replacement Condition Detection of Railway Point Machines Using an Electric Current Sensor," Sensors, Vol. 17, pp. 263, 2017.   DOI
4 J. Salamon and J.P. Bello, “Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification,” IEEE Signal Processing Letters, Vol. 24, No. 3, pp. 279-283, 2017.   DOI
5 J. Lee, H. Choi, D. Park, Y. Chung, H.Y. Kim, and S. Yoon, “Fault Detection and Diagnosis of Railway Point Machines by Sound Analysis,” Sensors, Vol. 16, No. 4, pp. 549, 2016.   DOI
6 Y. Choi, J. Lee, D. Park, J. Lee, Y. Chung, H.Y. Kim, and S. Yoon, “Stress Detection of Railway Point Machine Using Sound Analysis,” KIPS Transactions on Software and Data Engineering, Vol. 5, No. 9, pp. 433-440, 2016.   DOI
7 M. Guarino, P. Jans, A. Costa, J.M. Aerts, and D. Berckmans, “Field Test of Algorithm for Automatic Cough Detection in Pig Houses,” Computers and Electronics in Agriculture, Vol. 62, No. 1, pp. 22-28, 2008.   DOI
8 Y. Chung, S. Oh, J. Lee, D. Park, H. Chang, and S. Kim, “Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance,” Sensors, Vol. 13, No. 10, pp. 12929-12942, 2013.   DOI
9 J. Lee, L. Jin, D. Park, Y. Chung, and H. Chang, “Acoustic Features for Pig Wasting Disease Detection,” International Journal of Information Processing and Management, Vol. 6, No. 1, pp. 37-46, 2015.
10 R. Zazo, T.N. Sainath, G. Simko, and C. Parada, "Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection," In Proceeding of Interspeech, pp. 3668-3672, 2016.
11 H. Zhang, I. McLoughlin, Y. Song, "Robust Sound Event Recognition Using Convolutional Neural Networks," IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 559-563, 2015.
12 Y. Choi, O. Atif, J. Lee, D. Park, and Y. Chung, “Noise-Robust Sound-Event Classification System with Texture Analysis,” Symmetry, Vol. 10, No. 9, pp. 402, 2018.   DOI
13 Z. Zhang, J. Geiger, J. Pohjalainen, A.E.D. Mousa, W. Jin, and B. Schuller, “Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments,” ACM Transactions on Intelligent Systems and Technology, Vol. 9, No. 5, pp. 49, 2018.
14 Y. Choi, Y. Jung, Y. Kim, Y. Suh, and H. Kim, “An Endto-End Method for Korean Text-to-Speech Systems,” Phonetics and Speech Sciences, Vol. 10, No. 1, pp. 39-48, 2018.
15 S. Pascual, A. Bonafonte, and J. Serra, "SEGAN: Speech Enhancement Generative Adversarial Network," In Proceedings of Interspeech, pp. 3642-3646, 2017.
16 S. Dieleman and B. Schrauwen, "End-to-End Learning for Music Audio," IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6964-6968, 2014.
17 R.V. Sharan and T.J. Moir, "Noise Robust Audio Surveillance Using Reduced Spectrogram Image Feature and Oneagainst-all SVM," Neurocomputing, Vol. 158, pp. 90-99, 2015.   DOI
18 R. Collobert, C. Puhrsch, and G. Synnaeve, "Wav2Letter: An End-to-End ConvNet-Based Speech Recognition System," arXiv preprint arXiv:1609.03193, 2016.
19 Y. Zhang, W. Chan, and N. Jaitly, "Very Deep Convolutional Networks for End-to-End Speech Recognition," IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4845-4849, 2017.
20 S. Kim, T. Hori, and S. Watanabe, "Joint CTC-Attention Based End-to-End Speech Recognition Using Multi-Task Learning," IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4835-4839, 2017.
21 J. Lee, Y. Choi, D. Park, and Y. Chung, “Sound Noise-Robust Porcine Wasting Diseases Detection and Classification System Using Convolutional Neural Network,” Journal of Korean Institute of Information Technology, Vol. 16, No. 5, pp. 1-13, 2018.   DOI
22 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Bengio, "Generative Adversarial Nets," In Advances in Neural Information Processing Systems, pp. 2672-2680, 2014.
23 C. Zhang and Y. Peng, "Stacking VAE and GAN for Context-aware Text-to-Image Generation," IEEE Fourth International Conference on Multimedia Big Data, pp. 1-5, 2018.
24 T. Asada, C. Roberts, and T. Koseki, "An Algorithm for Improved Performance of Railway Condition Monitoring Equipment: Alternating-Current Point Machine Case Study," Transportation Research Part C: Emerging Technologies, Vol. 30, pp. 81-92, 2013.   DOI
25 J. Han, M. Kamber, and J. Pei, "Data Mining: Concepts and Techniques," 3rd ed., Morgan Kaufman, San Francisco, CA, USA, 2012.
26 A.W. Rix, J.G. Beerends, M.P. Hollier, and A.P. Hekstra, "Perceptual Evaluation of Speech Quality (PESQ)-A New Method for Speech Quality Assessment of Telephone Networks and Codecs," IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, pp. 749-752, 2001.
27 J. Hansen and B. Pellom, "An Effective Quality Evaluation Protocol for Speech Enhancement Algorithms," International Conference on Spoken Language Processing, Vol. 7, pp. 2819-2822, 1998.
28 B. Shao, D. Wang, T. Li, and M. Ogihara, “Music Recommendation Based on Acoustic Features and User Access Patterns,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 17, No. 8, pp. 1602-1611, 2009.   DOI
29 S. Theodoridis and K. Koutroumbas, "Pattern Recognition," 4th ed., Academic Press: Kidlington, Oxford, UK, 2009.
30 D.M. Powers, “Evaluation: From Precision, Recall and FFactor to ROC, Informedness Markedness and Correlation,” Journal of Machine Learning Technologies, Vol. 2, No. 1, pp. 37-63, 2011.