Browse > Article
http://dx.doi.org/10.5909/JBE.2019.24.3.387

Audio High-Band Coding based on Autoencoder with Side Information  

Cho, Hyo-Jin (Dept. of Electronics Engineering, Kwangwoon University)
Shin, Seong-Hyeon (Dept. of Electronics Engineering, Kwangwoon University)
Beack, Seung Kwon (Electronics and Telecommunications Research Institute)
Lee, Taejin (Electronics and Telecommunications Research Institute)
Park, Hochong (Dept. of Electronics Engineering, Kwangwoon University)
Publication Information
Journal of Broadcast Engineering / v.24, no.3, 2019 , pp. 387-394 More about this Journal
Abstract
In this study, a new method of audio high-band coding based on autoencoder with side information is proposed. The proposed method operates in the MDCT domain, and improves the performance by using additional side information consisting of the previous and current low bands, which is different from the conventional autoencoder that only inputs information to be encoded. Moreover, the side information in a time-frequency domain enables the high-band coder to utilize temporal characteristics of the signal. In the proposed method, the encoder transmits a 4-dimensional latent vector computed by the autoencoder and a gain variable using 12 bits for each frame. The decoder reconstructs the high band by applying the decoded low bands in the previous and current frames and the transmitted information to the autoencoder. Subjective evaluation confirms that the proposed method provides equivalent performance to the SBR at approximately half the bit rate of the SBR.
Keywords
autoencoder; neural network; audio high-band coding; side information;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 ISO/IEC 11172-3, "Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s - Part 3," 1993.
2 M. Dietz, L. Liljeryd, K. Kjorling, and O. Kunz, "Spectral band replication, a novel approach in audio coding," 112th Conv. Audio Eng. Soc., May 2002.
3 C. R. Helmrich, et al., "Spectral envelope reconstruction via IGF for audio transform coding," Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brisbane, Australia, pp. 389-393, 2015.
4 L. Jiang, R. Hu, X. Wang, W. Tu, and M. Zhang, "Nonlinear prediction with deep recurrent neural networks for non-blind audio bandwidth extension," China Communication, vol. 15, no. 1, pp. 72-85. Jan. 2018.   DOI
5 K. Schmidt and B. Edler, "Blind bandwidth extension based on convolutional and recurrent deep neural networks," Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Calgary, Canada, pp. 5444-5448, 2018.
6 G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, 313.5786, pp. 504-507, 2006.   DOI
7 Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, 521.7553, pp. 436-444, 2015.   DOI
8 Y. N. Dauphin, et al., "Language modeling with gated convolutional networks," Proc. of the 34th Int. Conf. on Machine Learning, vol 70, Sydney, Australia, pp. 933-941, 2017.
9 D. P. Kingma and J. L. Ba, "Adam: A method for stochastic optimization," Proc. of Int. Conf. on Learning Representation, San Diego, USA, 2015.
10 C. Veaux, et al., "Superseded-CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit," 2016.
11 M. Goto, "Development of the RWC music database," Proc. of Int. Congress on Acoustics, vol. 1, pp. 553-556, April 2004.
12 ISO/IEC JTC1/SC29/WG11 N9927, "Workplan for subjective testing of Unified Speech and Audio Coding proposals," April 2008.
13 S. Beack, et al., "Single-mode-based Unified Speech and Audio Coding by extending the linear prediction domain coding mode," ETRI Journal, vol. 39, no. 3, pp. 310-318, 2017.   DOI
14 ITU-R BS.1534-3, "Method for the subjective assessment of intermediate quality level of audio systems," 2015.