Browse > Article
http://dx.doi.org/10.6109/jkiice.2019.23.12.1506

CNN based dual-channel sound enhancement in the MAV environment  

Kim, Young-Jin (Department of Computer Science & Engineering, Graduate School, Korea University of Technology and Education)
Kim, Eun-Gyung (School of Computer Science & Engineering, Korea University of Technology and Education)
Abstract
Recently, as the industrial scope of multi-rotor unmanned aerial vehicles(UAV) is greatly expanded, the demands for data collection, processing, and analysis using UAV are also increasing. However, the acoustic data collected by using the UAV is greatly corrupted by the UAV's motor noise and wind noise, which makes it difficult to process and analyze the acoustic data. Therefore, we have studied a method to enhance the target sound from the acoustic signal received through microphones connected to UAV. In this paper, we have extended the densely connected dilated convolutional network, one of the existing single channel acoustic enhancement technique, to consider the inter-channel characteristics of the acoustic signal. As a result, the extended model performed better than the existed model in all evaluation measures such as SDR, PESQ, and STOI.
Keywords
Dual-Channel speech Enhancement; Unmanned Aerial Vehicle(UAV); Convolutional Neural Network(CNN); Dense Connectivity; Dilated Convolution;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, Apr. 1979.   DOI
2 J. S. Lim, and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 2005.   DOI
3 Y. Ephraim, and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Transactions on acoustics, speech, and signal processing, vol. 32, no. 6, pp. 1109-1121, Dec. 1984.   DOI
4 Y. Li, X. Li, Y. Dong, M. Li, S. Xu and S. Xiong, "Densely Connected Network with Time-frequency Dilated Convolution for Speech Enhancement," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6860-6864, May. 2019.
5 D. Wang, and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1702-1726, May. 2018   DOI
6 T. Gao, J. Du, Y. Xu, C. Liu, L. R. Dai, and C. H. Lee, "Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments," In International Conference on Latent Variable Analysis and Signal Separation, pp. 75-82, 2015.
7 G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261-2269, 2017.
8 G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," In Advances in neural information processing systems, pp. 971-980, 2017.
9 J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1," Nasa Sti/recon Technical Report N, vol. 93, Feb. 1993.
10 M. Strauss, P. Mordel, V. Miguet, and A. Deleforge, "DREGON: Dataset and Methods for UAV-Embedded Sound Source Localization," In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1-8, 2018.
11 D. Mirabilii, and E. A. Habets, "Simulating Multi-Channel Wind Noise Based on the Corcos Model," In International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 560-564, 2018.
12 D. Diaz-Guerra, A. Miguel, and J. R. Beltran, "gpuRIR: A python library for Room Impulse Response simulation with GPU acceleration," arXiv preprint 1810.11359, 2018.
13 K. Daniel, S. Rohde, N. Goddemeier, and C. Wietfeld, "Cognitive agent mobility for aerial sensor networks," IEEE Sensors Journal, vol. 11, no.11, pp. 2671-2682, Jun. 2011.   DOI
14 A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," In IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, 2001.
15 C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech," IEEE Transactions on Audio Speech and Language Processing, vol. 19, no. 7, pp. 2125-2136, Feb. 2011.   DOI
16 Korea Embedded Software and System Industry Association. KESSIA ISSUE REPORT [Internet]. Available: http://www.fkii.or.kr.
17 L. Wang, and A. Cavallaro, "Acoustic sensing from a multi-rotor drone," IEEE Sensors Journal, vol. 18, no. 11, pp. 4570-4582, Apr. 2018.   DOI
18 D. Floreano, and R. J. Wood, "Science, technology and the future of small autonomous drones," Nature, vol 521, no. 7553, pp. 460-466, May. 2015.   DOI
19 G. Sinibaldi, and L. Marino, "Experimental analysis on the noise of propellers for small UAV," Applied Acoustics, vol. 74, no. 1, pp. 79-88, Jan. 2013.   DOI