Deep Learning based Raw Audio Signal Bandwidth Extension System

Kim, Yun-Su;Seok, Jong-Won;

doi:10.7471/ikeee.2020.24.4.1122

전기전자학회논문지 (Journal of IKEEE)

제24권4호
/
Pages.1122-1128
/
2020
/
1226-7244(pISSN)
/
2288-243X(eISSN)

한국전기전자학회 (Institute of Korean Electrical and Electronics Engineers)

DOI QR Code

딥러닝 기반 음향 신호 대역 확장 시스템

Deep Learning based Raw Audio Signal Bandwidth Extension System

김윤수 ;
석종원

Kim, Yun-Su (Dept. of Information and Communication Engineering, Changwon National University) ;
Seok, Jong-Won (Dept. of Information and Communication Engineering, Changwon National University)

투고 : 2020.11.30
심사 : 2020.12.16
발행 : 2020.12.31

https://doi.org/10.7471/ikeee.2020.24.4.1122 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

대역 확장(Bandwidth Extension)이란 채널 용량 부족 혹은 이동통신 기기에 탑재된 코덱의 특성으로 인해 부호화 및 복호화 과정에서 대역 제한(band limited)되거나 손상된 협대역 신호(NB, Narrow Band)를 복원, 확장하여 광대역 신호(WB, Wide Band)로 전환 시켜주는 것을 의미한다. 대역 확장 연구는 주로 음성 신호 위주로 대역 복제(SBR, Spectral Band Replication), IGF(Intelligent Gap Filling)과 같이 고대역을 주파수 영역으로 변환하여 복잡한 특징 추출 과정을 거쳐 이를 바탕으로 사라지거나 손상된 고대역을 복원한다. 본 논문에서는 딥러닝 모델 중 오토인코더(Autoencoder)를 바탕으로 1차원 합성곱 신경망(CNN, Convolutional Neural Network)들의 잔차 연결을 활용하여 복잡한 사전 전처리 과정 없이 일정한 길이의 시간 영역 신호를 입력시켜 대역 확장 시킨 음향 신호를 출력하는 모델을 제안한다. 또한 음성 영역에 제한되지 않는 음악을 포함한 여러 종류의 음원을 포함하는 데이터셋에 훈련시켜도 손상된 고대역을 복원할 수 있음을 확인하였다.

Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.

키워드

참고문헌

M. Dietz, L. Liljeryd, K. Kjorling, and O. Kunz, "Spectral Band Replication, a Novel Approach in Audio Coding," in Audio Engineering Society 112th Convention, p.553, 2002.
Volodymyr K., S. Zayd Enam, Stefano E., "Audio Super Resolution using Neural Networks" Presented at the 5th International Conference on Learning Representations(ICLR), 2017, arXiv: 1708.00853v1
Yu Gu, Z. Ling, Li-Rong Dai, "Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks," INTERSPEECH, 2016. DOI: 10.21437/Interspeech.2016-678
Ian Goodfellow et al., "Generative Adversarial Nets," Advances in Neural Information Processing Systems, vol.27, pp.2672-2680. 2014.
Hyo-Jin Cho et al, Seong-Hyeon Shin, Seung Kwon Beack, Taejin Lee, Hochong Park, "Audio High-Band Coding based on Autoencoder with Side Information," Journal of Broadcast Engineering (JBE), Vol.24, No.3, pp.387-394, 2019. DOI: 10.5909/JBE.2019.24.3.387
B. Pramod, T. Massimiliano, E. Nicholas, "Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders," INTERSPEECH, pp.1185-1189, 2018. DOI: 10.21437/Interspeech.2018-2213
Olaf R., Philipp F., Thomas B., "U-Net: Convolutional Networks for Biomedical Image Segmentation," Medical Image Computing and Computer-Assisted Intervention(MICCAI), pp. 234-241, 2015.
C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015. DOI: 2015, 10.1109/CVPR.2015.7298594.
Sugn K. Visvesh S., "Bandwidth Extension on Raw Audio via Generative Adversarial Networks," 2019, arXiv:1903.09027
W. Shi et al., "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1874-1883, 2016. DOI: 10.1109/CVPR.2016.207.
Veaux, Christophe; Yamagishi, Junichi; MacDonald, Kirsten. "CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit," University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017. DOI: 10.7488/ds/1994
Diederik P Kingma, Max Welling, "Auto-Encoding Variational Bayes," 2014, arXiv preprint arXiv:1312.6114

전기전자학회논문지 (Journal of IKEEE)

딥러닝 기반 음향 신호 대역 확장 시스템

Deep Learning based Raw Audio Signal Bandwidth Extension System

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)