Browse > Article
http://dx.doi.org/10.6109/jkiice.2020.24.4.459

CNN based Complex Spectrogram Enhancement in Multi-Rotor UAV Environments  

Kim, Young-Jin (Department of Computer Science & Engineering, Korea University of Technology and Education)
Kim, Eun-Gyung (School of Computer Science & Engineering, Korea University of Technology and Education)
Abstract
The sound collected through the multi-rotor unmanned aerial vehicle (UAV) includes the ego noise generated by the motor or propeller, or the wind noise generated during the flight, and thus the quality is greatly impaired. In a multi-rotor UAV environment, both the magnitude and phase of the target sound are greatly corrupted, so it is necessary to enhance the sound in consideration of both the magnitude and phase. However, it is difficult to improve the phase because it does not show the structural characteristics. in this study, we propose a CNN-based complex spectrogram enhancement method that removes noise based on complex spectrogram that can represent both magnitude and phase. Experimental results reveal that the proposed method improves enhancement performance by considering both the magnitude and phase of the complex spectrogram.
Keywords
Sound Enhancement; UAV; Deep Learning; Convolutional Neural Network (CNN); Acoustic Signal Processing;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 L. Wang, and A. Cavallaro, "Acoustic sensing from a multi-rotor drone," IEEE Sensors Journal, vol. 18, no. 11, pp. 4570-4582, Apr. 2018.   DOI
2 D. Floreano, and R. J. Wood, "Science, technology and the future of small autonomous drones," Nature, vol 521, no. 7553, pp. 460-466, May. 2015.   DOI
3 K. Daniel, S. Rohde, N. Goddemeier, and C. Wietfeld, "Cognitive agent mobility for aerial sensor networks," IEEE Sensors Journal, vol. 11, no.11, pp. 2671-2682, Jun. 2011.   DOI
4 S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, Apr. 1979.   DOI
5 J. S. Lim, and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 2005.
6 Y. Ephraim, and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Transactions on acoustics, speech, and signal processing, vol. 32, no. 6, pp. 1109-1121, Dec. 1984.   DOI
7 Y. Wang, N. Arun, and W. DeLiang, "On training targets for supervised speech separation," IEEE/ACM transactions on audio, speech, and language processing, vol. 22, no. 12, pp. 1849-1858, Aug. 2014.   DOI
8 J. Lee, and H. J. Kang, "A Joint Learning Algorithm for Complex-Valued TF Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6, pp. 1098-1108, June. 2019   DOI
9 S. J. Park, S. M. Choi, H. J. Lee and J. B. Kim, "Spatial analysis using R based Deep Learning," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 6, no. 4, pp. 1-8, April. 2016   DOI
10 D. Kim, "Acquiring Real Time Traffic Information Using Deep Learning Neural Networks," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 6, no. 5, pp. 435-444, May. 2016   DOI
11 H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, "Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks," IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 708-712, Apr. 2015.
12 D. S. Williamson, Y. Wang, and D. Wang, "complex ratio masking for monaural speech separation," IEEE/ACM transactions on Audio, Speech, and Language Processing, vol. 24, no. 3, pp. 483-492, Dec. 2015.   DOI
13 Y. Xu, J. Du, L. R. Dai, and C. H. Lee, "A regression approach to speech enhancement based on deep neural networks," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 1, pp. 7-19, Oct. 2014.
14 Y. Li, X. Li, Y. Dong, M. Li, S. Xu and S. Xiong, "Densely Connected Network with Time-frequency Dilated Convolution for Speech Enhancement," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6860-6864, May. 2019.
15 S. W. Fu, T. Y. Hu, Y. Tsao, and X. Lu, "Complex spectrogram enhancement by convolutional neural network with multi-metrics learning," 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1-6, Sep. 2017
16 A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," In IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, 2001.
17 Y. J. Kim and E. K. Kim, "CNN based dual-channel sound enhancement in the MAV environment," Journal of the Korea Institute of Information and Communication Engineering, vol. 33, no. 12, pp. 1506-1513, Dec. 2019.
18 J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1," Nasa Sti/recon Technical Report N, vol. 93, Feb. 1993.
19 E. Vincent, R. Gribonval and C. Fevotte, "Performance measurement in blind audio source separation," IEEE transactions on audio, speech, and language processing, vol. 14, no. 4, pp. 1462-1469, Jun. 2006.   DOI
20 C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech," IEEE Transactions on Audio Speech and Language Processing, vol. 19, no. 7, pp. 2125-2136, Feb. 2011.   DOI