Browse > Article
http://dx.doi.org/10.7776/ASK.2020.39.4.292

Improved CycleGAN for underwater ship engine audio translation  

Ashraf, Hina (Foundation for Advancement of Science and Technology (NUCES-FAST))
Jeong, Yoon-Sang (Jeju National University)
Lee, Chong Hyun (Jeju National University)
Abstract
Machine learning algorithms have made immense contributions in various fields including sonar and radar applications. Recently developed Cycle-Consistency Generative Adversarial Network (CycleGAN), a variant of GAN has been successfully used for unpaired image-to-image translation. We present a modified CycleGAN for translation of underwater ship engine sounds with high perceptual quality. The proposed network is composed of an improved generator model trained to translate underwater audio from one vessel type to other, an improved discriminator to identify the data as real or fake and a modified cycle-consistency loss function. The quantitative and qualitative analysis of the proposed CycleGAN are performed on publicly available underwater dataset ShipsEar by evaluating and comparing Mel-cepstral distortion, pitch contour matching, nearest neighbor comparison and mean opinion score with existing algorithms. The analysis results of the proposed network demonstrate the effectiveness of the proposed network.
Keywords
Generative Adversarial Networks (GANs); Style transfer; Cycle-Consistency GAN (CycleGAN); Mel-Cepstrum (MCEP); Mean Opinion Score (MOS);
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 J. Luo and Y. Yang, "Simulation model of ship-radiated broadband noise." Proc. IEEE ICSPCC. 1-5 (2011).
2 C. Verron and G. Drettakis, "Procedural audio modeling for particle-based environmental effects," 133rd AES Convention (2012).
3 CycleGAN with Better Cycles, https://ssnl.github.io/better_cycles/report.pdf, (Last viewed July 21, 2020).
4 D. Santos-Dominguez, S. Torres-Guijarro, A. Cardenal-Lopez, and A. Pena-Gimenez, "ShipsEar: An underwater vessel noise database," Applied Acoustics, 113, 64-69 (2016).   DOI
5 J. Nirmal, P. Kachare, S. Patnaik, and M. Zaveri, "Cepstrum liftering based voice conversion using RBF and GMM," Proc. ICCSP. IEEE 570-575 (2013).
6 W. Zhou, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image qualifty assessment: from error visibility to structural similarity," IEEE Trans. on Image Processing, 13, 600-612 (2004).   DOI
7 M. Chu and H. Peng, "Objective measure for estimating mean opinion score of synthesized speech," U.S. Patent 7024362, 2006.
8 J. Choi, Y. Choo, and K. Lee, "Acoustic classification of surface and underwater vessels in the ocean using supervised machine learning," Sensors, 19, 3492 (2019).   DOI
9 A. Tesei, S. Fioravanti, V. Grandi, P. Guerrini, and A. Maguer, "Localization of small surface vessels through acoustic data fusion of two tetrahedral arrays of hydrophones," Proc. Meetings on Acoustics, 17, 070050 (2012).
10 R. Diamant and Y. Jin, "A machine learning approach for dead-reckoning navigation at sea using a single accelerometer," IEEE J. Oceanic Engineering, 39, 672-684 (2013).   DOI
11 Y. Tan, J. K. Tan, H. S. Kim, and S. Ishikawa, "Detection of underwater objects based on machine learning," Proc. The SICE Annual Conference 2013, IEEE 2104-2109 (2013).
12 H. Yang, K. Lee, Y. Choo, and K. Kim, "Underwater acoustic research trends with machine learning: Passive SONAR applications," JOET. 34, 227-236 (2020).
13 B. McFee, E. J. Humphrey, and J. P. Bello, "A software framework for musical data augmentation," Proc. ISMIR. 248-254 (2015).
14 C. Albaladejo, F. Soto, R. Torres, P. Sanchez, and J. A. Lopez, "A low-cost sensor buoy system for monitoring shallow marine environments," Sensors, 12, 9613-9634 (2012).   DOI
15 D. G. Hathaway and R. M. Bridges, "Underwater sonar array," U.S. Patent 4901287, 1990.
16 J. Schluter and T. Grill, "Exploring data augmentation for improved singing voice detection with neural networks," Proc. ISMIR. 121-126 (2015).
17 N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," JAIR. 16, 321-357 (2002).   DOI
18 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," Proc. NPIS. 2672-2680 (2014).
19 C. Donahue, J. McAuley, and M. Puckette, "Adversarial audio synthesis," arXiv preprint arXiv:1802.04208 (2018).
20 S. Mangal, R. Modak, and P. Joshi, "LSTM Based Music Generation System," arXiv preprint arXiv:1908.01080 (2019).
21 F. H. K. dos S. Tanaka and C. Aranha, "Data augmentation using GANs," arXiv preprint arXiv:1904.09135 (2019).
22 Y. Qian, H. Hu, and T. Tan, "Data augmentation using generative adversarial networks for robust speech recognition," Speech Communication, 114, 1-9 (2019).   DOI
23 J.-Y., Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," Proc. the IEEE int. conf. on computer vision, 2223-2232 (2017).
24 S. H. Dumpala, I. Sheikh, R. Chakraborty, and S. K. Kopparapu, "A Cycle-GAN approach to model natural perturbations in speech for ASR applications," arXiv preprint arXiv:1912.11151 (2019).
25 H. Yang, K. Lee, Y. Choo, and K. Kim, "Underwater Acoustic Research Trends with Machine Learning: General Background," JOET. 34, 147-154 (2020).
26 V. Sandfort, K. Yan, P. J. Pickhardt, and R. M. Summers, "Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks," Scientific reports, 9, 1-9 (2019).   DOI
27 T. Kaneko and H. Kameoka, "Parallel data-free voice conversion using cycle-consistent adversarial networks," arXiv:1711.11293 (2017).
28 T. Kaneko and H. Kameoka, "Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks," Proc. 26th EUSIPCO. IEEE 2100-2104 (2018).
29 Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," Proc. ICML. 933-941 (2017).
30 Y. Taigman, A. Polyak, L. Wolf, "Unsupervised cross-domain image generation," Proc. ICLR. arXiv preprint arXiv:1607.08022 (2017).
31 D. Ulyanov, A. Vedaldi, and V. S. Lempitsky. Instance normalization: The missing ingredient for fast stylization. CoRR. abs/1607.08022 (2016).
32 F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," CoRR. abs/1511.07122 (2015).
33 A. Odena, V. Dumoulin, and C. Olah, "Deconvolution and checkerboard artifacts," Distill, 1, e3 (2016).
34 U. Demir and G. Unal, "Patch-based image inpainting with generative adversarial networks," arXiv preprint arXiv:1803.07422 (2018).