[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.6109/jkiice.2019.23.12.1506

CNN based dual-channel sound enhancement in the MAV environment

Kim, Young-Jin (Department of Computer Science & Engineering, Graduate School, Korea University of Technology and Education)
Kim, Eun-Gyung (School of Computer Science & Engineering, Korea University of Technology and Education)

Publication Information

Journal of the Korea Institute of Information and Communication Engineering / v.23, no.12, 2019 , pp. 1506-1513 More about this Journal

Abstract

Recently, as the industrial scope of multi-rotor unmanned aerial vehicles(UAV) is greatly expanded, the demands for data collection, processing, and analysis using UAV are also increasing. However, the acoustic data collected by using the UAV is greatly corrupted by the UAV's motor noise and wind noise, which makes it difficult to process and analyze the acoustic data. Therefore, we have studied a method to enhance the target sound from the acoustic signal received through microphones connected to UAV. In this paper, we have extended the densely connected dilated convolutional network, one of the existing single channel acoustic enhancement technique, to consider the inter-channel characteristics of the acoustic signal. As a result, the extended model performed better than the existed model in all evaluation measures such as SDR, PESQ, and STOI.

Keywords

Dual-Channel speech Enhancement; Unmanned Aerial Vehicle(UAV); Convolutional Neural Network(CNN); Dense Connectivity; Dilated Convolution;

Citations & Related Records

Reference

1	S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, Apr. 1979. DOI
2	J. S. Lim, and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 2005. DOI
3	Y. Ephraim, and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Transactions on acoustics, speech, and signal processing, vol. 32, no. 6, pp. 1109-1121, Dec. 1984. DOI
4	Y. Li, X. Li, Y. Dong, M. Li, S. Xu and S. Xiong, "Densely Connected Network with Time-frequency Dilated Convolution for Speech Enhancement," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6860-6864, May. 2019.
5	D. Wang, and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1702-1726, May. 2018 DOI
6	T. Gao, J. Du, Y. Xu, C. Liu, L. R. Dai, and C. H. Lee, "Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments," In International Conference on Latent Variable Analysis and Signal Separation, pp. 75-82, 2015.
7	G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261-2269, 2017.
8	G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," In Advances in neural information processing systems, pp. 971-980, 2017.
9	J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1," Nasa Sti/recon Technical Report N, vol. 93, Feb. 1993.
10	M. Strauss, P. Mordel, V. Miguet, and A. Deleforge, "DREGON: Dataset and Methods for UAV-Embedded Sound Source Localization," In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1-8, 2018.
11	D. Mirabilii, and E. A. Habets, "Simulating Multi-Channel Wind Noise Based on the Corcos Model," In International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 560-564, 2018.
12	D. Diaz-Guerra, A. Miguel, and J. R. Beltran, "gpuRIR: A python library for Room Impulse Response simulation with GPU acceleration," arXiv preprint 1810.11359, 2018.
13	Korea Embedded Software and System Industry Association. KESSIA ISSUE REPORT [Internet]. Available: http://www.fkii.or.kr.
14	L. Wang, and A. Cavallaro, "Acoustic sensing from a multi-rotor drone," IEEE Sensors Journal, vol. 18, no. 11, pp. 4570-4582, Apr. 2018. DOI
15	D. Floreano, and R. J. Wood, "Science, technology and the future of small autonomous drones," Nature, vol 521, no. 7553, pp. 460-466, May. 2015. DOI
16	K. Daniel, S. Rohde, N. Goddemeier, and C. Wietfeld, "Cognitive agent mobility for aerial sensor networks," IEEE Sensors Journal, vol. 11, no.11, pp. 2671-2682, Jun. 2011. DOI
17	G. Sinibaldi, and L. Marino, "Experimental analysis on the noise of propellers for small UAV," Applied Acoustics, vol. 74, no. 1, pp. 79-88, Jan. 2013. DOI
18	C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech," IEEE Transactions on Audio Speech and Language Processing, vol. 19, no. 7, pp. 2125-2136, Feb. 2011. DOI
19	A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," In IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, 2001.

KSCI

CNN based dual-channel sound enhancement in the MAV environment MAV 환경에서의 CNN 기반 듀얼 채널 음향 향상 기법

CNN based dual-channel sound enhancement in the MAV environment