A Sound Interpolation Method Using Deep Neural Network for Virtual Reality Sound

Choi, Jaegyu;Choi, Seung Ho;

doi:10.5909/JBE.2019.24.2.227

Journal of Broadcast Engineering (방송공학회논문지)

Volume 24 Issue 2
/
Pages.227-233
/
2019
/
1226-7953(pISSN)
/
2287-9137(eISSN)

The Korean Institute of Broadcast and Media Engineers (한국방송∙미디어공학회)

DOI QR Code

A Sound Interpolation Method Using Deep Neural Network for Virtual Reality Sound

가상현실 음향을 위한 심층신경망 기반 사운드 보간 기법

Choi, Jaegyu (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology) ;
Choi, Seung Ho (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)

최재규 (서울과학기술대학교 전자IT미디어공학과) ;
최승호 (서울과학기술대학교 전자IT미디어공학과)

Received : 2019.01.08
Accepted : 2019.03.18
Published : 2019.03.30

https://doi.org/10.5909/JBE.2019.24.2.227 Citation PDF KSCI KPUBS HTML

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a deep neural network-based sound interpolation method for realizing virtual reality sound. Through this method, sound between two points is generated by using acoustic signals obtained from two points. Sound interpolation can be performed by statistical methods such as arithmetic mean or geometric mean, but this is insufficient to reflect actual nonlinear acoustic characteristics. In order to solve this problem, in this study, the sound interpolation is performed by training the deep neural network based on the acoustic signals of the two points and the target point, and the experimental results show that the deep neural network-based sound interpolation method is superior to the statistical methods.

본 논문은 가상현실 음향 구현을 위한 심층신경망 기반 사운드 보간 방법에 관한 것으로서, 이를 통해 두 지점에서 취득한 음향 신호들을 사용하여 두 지점 사이의 음향을 생성한다. 산술평균이나 기하평균 같은 통계적 방법으로 사운드 보간을 수행할 수 있지만 이는 실제 비선형 음향 특성을 반영하기에 미흡하다. 이러한 문제를 해결하기 위해서 본 연구에서는 두 지점과 목표 지점의 음향신호를 기반으로 심층신경망을 훈련하여 사운드 보간을 시도하였으며, 실험결과 통계적 방법에 비해 심층신경망 기반 사운드 보간 방법의 성능이 우수함을 보였다.

Keywords

BSGHC3_2019_v24n2_227_f0001.png 이미지

그림 1. 시스템 흐름도 Fig. 1. System flowchart

BSGHC3_2019_v24n2_227_f0002.png 이미지

그림 2. 심층신경망 구성도 Fig. 2. Structure of deep neural network

BSGHC3_2019_v24n2_227_f0004.png 이미지

그림 4. 스피커와 마이크의 배치 [6] FIg. 4. Array of speaker and microphone

BSGHC3_2019_v24n2_227_f0005.png 이미지

그림 5. 잔향 환경의 음원에 대한 사운드 보간 기법의 스펙트럼 예시 Fig. 5. Spectrum example based on the sound generated by room impulse response

BSGHC3_2019_v24n2_227_f0006.png 이미지

그림 3. 머리전달함수 합성 음원에 대한 스펙트럼 예시 Fig. 3. Spectrum example based on the sound generated by head-related transfer function

표 1. 머리전달함수 기반 합성음원에 대한 사운드 보간 기법의 음성 데이터 RMSE 결과 Table 1. RMSE result of speech data based on the sound generated by head-related transfer function

BSGHC3_2019_v24n2_227_t0001.png 이미지

표 2. 머리전달함수 기반 합성음원에 대한 사운드 보간 기법의 스펙트럼 RMSE 결과 Table 2. RMSE result of spectrum magnitude based on the sound generated by head-related transfer function

BSGHC3_2019_v24n2_227_t0002.png 이미지

표 3. 잔향 환경 음원에 대한 사운드 보간 기법의 RMSE 결과 Table 3. RMSE result based on the sound generated by room impulse response

BSGHC3_2019_v24n2_227_t0003.png 이미지

References

Veaux Christophe, Yamagishi Junichi, and MacDonald Kirsten, "CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit," The Centre for Speech Technology Research (CSTR), 2016.
V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines," in Proc. 27th Int. Conf. Machine Learning, pp. 807-814, 2010.
Vu Pham, Theodore Bluche, Christopher Kermorvant, and Jerome Louradour, "Dropout improves recurrent neural networks for handwriting recognition," Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference, pp. 285-290, IEEE, 2014.
D. P. Kingma and J. L. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
T. Qu, Z. Xiao, M. Gong, Y. Huang, X. Li, and X. Wu, "Distance dependent head-related transfer functions measured with high spatial resolution using a spark gap," IEEE Trans. on Audio, Speech and Language Processing, vol. 17, no. 6, pp. 1124-1132, 2009. https://doi.org/10.1109/TASL.2009.2020532
J. Wen, N. Gaubitch, E. Habets, T. Myatt, P. Naylor, "Evaluation of speech dereverberation algorithms using the MARDY database", Proc. Int. Workshop Acoust. Echo Noise Control, pp. 1-4, 2006.