Browse > Article
http://dx.doi.org/10.7471/ikeee.2020.24.4.1122

Deep Learning based Raw Audio Signal Bandwidth Extension System  

Kim, Yun-Su (Dept. of Information and Communication Engineering, Changwon National University)
Seok, Jong-Won (Dept. of Information and Communication Engineering, Changwon National University)
Publication Information
Journal of IKEEE / v.24, no.4, 2020 , pp. 1122-1128 More about this Journal
Abstract
Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.
Keywords
Audio; Bandwidth Extension; Deep Learning; Convolutional Neural Network; Autoencoder;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 M. Dietz, L. Liljeryd, K. Kjorling, and O. Kunz, "Spectral Band Replication, a Novel Approach in Audio Coding," in Audio Engineering Society 112th Convention, p.553, 2002.
2 Volodymyr K., S. Zayd Enam, Stefano E., "Audio Super Resolution using Neural Networks" Presented at the 5th International Conference on Learning Representations(ICLR), 2017, arXiv: 1708.00853v1
3 Yu Gu, Z. Ling, Li-Rong Dai, "Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks," INTERSPEECH, 2016. DOI: 10.21437/Interspeech.2016-678   DOI
4 Ian Goodfellow et al., "Generative Adversarial Nets," Advances in Neural Information Processing Systems, vol.27, pp.2672-2680. 2014.
5 Hyo-Jin Cho et al, Seong-Hyeon Shin, Seung Kwon Beack, Taejin Lee, Hochong Park, "Audio High-Band Coding based on Autoencoder with Side Information," Journal of Broadcast Engineering (JBE), Vol.24, No.3, pp.387-394, 2019. DOI: 10.5909/JBE.2019.24.3.387   DOI
6 B. Pramod, T. Massimiliano, E. Nicholas, "Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders," INTERSPEECH, pp.1185-1189, 2018. DOI: 10.21437/Interspeech.2018-2213   DOI
7 Olaf R., Philipp F., Thomas B., "U-Net: Convolutional Networks for Biomedical Image Segmentation," Medical Image Computing and Computer-Assisted Intervention(MICCAI), pp. 234-241, 2015.
8 C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015. DOI: 2015, 10.1109/CVPR.2015.7298594.   DOI
9 Sugn K. Visvesh S., "Bandwidth Extension on Raw Audio via Generative Adversarial Networks," 2019, arXiv:1903.09027
10 W. Shi et al., "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1874-1883, 2016. DOI: 10.1109/CVPR.2016.207.   DOI
11 Veaux, Christophe; Yamagishi, Junichi; MacDonald, Kirsten. "CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit," University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017. DOI: 10.7488/ds/1994   DOI
12 Diederik P Kingma, Max Welling, "Auto-Encoding Variational Bayes," 2014, arXiv preprint arXiv:1312.6114