[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7471/ikeee.2020.24.4.1122

Deep Learning based Raw Audio Signal Bandwidth Extension System

Kim, Yun-Su (Dept. of Information and Communication Engineering, Changwon National University)
Seok, Jong-Won (Dept. of Information and Communication Engineering, Changwon National University)

Publication Information

Journal of IKEEE / v.24, no.4, 2020 , pp. 1122-1128 More about this Journal

Abstract

Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.

Keywords

Audio; Bandwidth Extension; Deep Learning; Convolutional Neural Network; Autoencoder;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	M. Dietz, L. Liljeryd, K. Kjorling, and O. Kunz, "Spectral Band Replication, a Novel Approach in Audio Coding," in Audio Engineering Society 112th Convention, p.553, 2002.
2	Volodymyr K., S. Zayd Enam, Stefano E., "Audio Super Resolution using Neural Networks" Presented at the 5th International Conference on Learning Representations(ICLR), 2017, arXiv: 1708.00853v1
3	Yu Gu, Z. Ling, Li-Rong Dai, "Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks," INTERSPEECH, 2016. DOI: 10.21437/Interspeech.2016-678 DOI
4	Ian Goodfellow et al., "Generative Adversarial Nets," Advances in Neural Information Processing Systems, vol.27, pp.2672-2680. 2014.
5	Hyo-Jin Cho et al, Seong-Hyeon Shin, Seung Kwon Beack, Taejin Lee, Hochong Park, "Audio High-Band Coding based on Autoencoder with Side Information," Journal of Broadcast Engineering (JBE), Vol.24, No.3, pp.387-394, 2019. DOI: 10.5909/JBE.2019.24.3.387 DOI
6	B. Pramod, T. Massimiliano, E. Nicholas, "Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders," INTERSPEECH, pp.1185-1189, 2018. DOI: 10.21437/Interspeech.2018-2213 DOI
7	Olaf R., Philipp F., Thomas B., "U-Net: Convolutional Networks for Biomedical Image Segmentation," Medical Image Computing and Computer-Assisted Intervention(MICCAI), pp. 234-241, 2015.
8	C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015. DOI: 2015, 10.1109/CVPR.2015.7298594. DOI
9	Sugn K. Visvesh S., "Bandwidth Extension on Raw Audio via Generative Adversarial Networks," 2019, arXiv:1903.09027
10	W. Shi et al., "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1874-1883, 2016. DOI: 10.1109/CVPR.2016.207. DOI
11	Veaux, Christophe; Yamagishi, Junichi; MacDonald, Kirsten. "CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit," University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017. DOI: 10.7488/ds/1994 DOI
12	Diederik P Kingma, Max Welling, "Auto-Encoding Variational Bayes," 2014, arXiv preprint arXiv:1312.6114

KSCI

Deep Learning based Raw Audio Signal Bandwidth Extension System 딥러닝 기반 음향 신호 대역 확장 시스템

Deep Learning based Raw Audio Signal Bandwidth Extension System