[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5909/JBE.2019.24.3.387

Audio High-Band Coding based on Autoencoder with Side Information

Cho, Hyo-Jin (Dept. of Electronics Engineering, Kwangwoon University)
Shin, Seong-Hyeon (Dept. of Electronics Engineering, Kwangwoon University)
Beack, Seung Kwon (Electronics and Telecommunications Research Institute)
Lee, Taejin (Electronics and Telecommunications Research Institute)
Park, Hochong (Dept. of Electronics Engineering, Kwangwoon University)

Publication Information

Journal of Broadcast Engineering / v.24, no.3, 2019 , pp. 387-394 More about this Journal

Abstract

In this study, a new method of audio high-band coding based on autoencoder with side information is proposed. The proposed method operates in the MDCT domain, and improves the performance by using additional side information consisting of the previous and current low bands, which is different from the conventional autoencoder that only inputs information to be encoded. Moreover, the side information in a time-frequency domain enables the high-band coder to utilize temporal characteristics of the signal. In the proposed method, the encoder transmits a 4-dimensional latent vector computed by the autoencoder and a gain variable using 12 bits for each frame. The decoder reconstructs the high band by applying the decoded low bands in the previous and current frames and the transmitted information to the autoencoder. Subjective evaluation confirms that the proposed method provides equivalent performance to the SBR at approximately half the bit rate of the SBR.

Keywords

autoencoder; neural network; audio high-band coding; side information;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	ISO/IEC 11172-3, "Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s - Part 3," 1993.
2	M. Dietz, L. Liljeryd, K. Kjorling, and O. Kunz, "Spectral band replication, a novel approach in audio coding," 112th Conv. Audio Eng. Soc., May 2002.
3	C. R. Helmrich, et al., "Spectral envelope reconstruction via IGF for audio transform coding," Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brisbane, Australia, pp. 389-393, 2015.
4	L. Jiang, R. Hu, X. Wang, W. Tu, and M. Zhang, "Nonlinear prediction with deep recurrent neural networks for non-blind audio bandwidth extension," China Communication, vol. 15, no. 1, pp. 72-85. Jan. 2018. DOI
5	K. Schmidt and B. Edler, "Blind bandwidth extension based on convolutional and recurrent deep neural networks," Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Calgary, Canada, pp. 5444-5448, 2018.
6	G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, 313.5786, pp. 504-507, 2006. DOI
7	Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, 521.7553, pp. 436-444, 2015. DOI
8	Y. N. Dauphin, et al., "Language modeling with gated convolutional networks," Proc. of the 34th Int. Conf. on Machine Learning, vol 70, Sydney, Australia, pp. 933-941, 2017.
9	D. P. Kingma and J. L. Ba, "Adam: A method for stochastic optimization," Proc. of Int. Conf. on Learning Representation, San Diego, USA, 2015.
10	C. Veaux, et al., "Superseded-CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit," 2016.
11	M. Goto, "Development of the RWC music database," Proc. of Int. Congress on Acoustics, vol. 1, pp. 553-556, April 2004.
12	ISO/IEC JTC1/SC29/WG11 N9927, "Workplan for subjective testing of Unified Speech and Audio Coding proposals," April 2008.
13	S. Beack, et al., "Single-mode-based Unified Speech and Audio Coding by extending the linear prediction domain coding mode," ETRI Journal, vol. 39, no. 3, pp. 310-318, 2017. DOI
14	ITU-R BS.1534-3, "Method for the subjective assessment of intermediate quality level of audio systems," 2015.

KSCI

Audio High-Band Coding based on Autoencoder with Side Information 부가 정보를 이용하는 오토 인코더 기반의 오디오 고대역 부호화 기술

Audio High-Band Coding based on Autoencoder with Side Information