Search | Korea Science

Real-time 3D Audio Downmixing System based on Sound Rendering for the Immersive Sound of Mobile Virtual Reality Applications

Hong, Dukki;Kwon, Hyuck-Joo;Kim, Cheong Ghil;Park, Woo-Chan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.12
- /
- pp.5936-5954
- /
- 2018
Eight out of the top ten the largest technology companies in the world are involved in some way with the coming mobile VR revolution since Facebook acquired Oculus. This trend has allowed the technology related with mobile VR to achieve remarkable growth in both academic and industry. Therefore, the importance of reproducing the acoustic expression for users to experience more realistic is increasing because auditory cues can enhance the perception of the complicated surrounding environment without the visual system in VR. This paper presents a audio downmixing system for auralization based on hardware, a stage of sound rendering pipelines that can reproduce realiy-like sound but requires high computation costs. The proposed system is verified through an FPGA platform with the special focus on hardware architectural designs for low power and real-time. The results show that the proposed system on an FPGA can downmix maximum 5 sources in real-time rate (52 FPS), with 382 mW low power consumptions. Furthermore, the generated 3D sound with the proposed system was verified with satisfactory results of sound quality via the user evaluation.
https://doi.org/10.3837/tiis.2018.12.018 인용 PDF KSCI

A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering

Martin, J.C.;Jacquemin, C.;Pointal, L.;Katz, B.
- 한국정보컨버전스학회:학술대회논문집
- /
- 2008.06a
- /
- pp.53-56
- /
- 2008
This paper reports on the ACQA(Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent(ACA) for conducting research along two main lines: 1/ perceptual experiments(eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head models at different resolutions and the integration of the talking head in virtual scenes. The target application of this expressive ACA is a real-time question and answer speech based system developed at LIMSI(RITEL). The architecture of the system is based on distributed modules exchanging messages through a network protocol. The main components of the system are: RITEL a question and answer system searching raw text, which is able to produce a text(the answer) and attitudinal information; this attitudinal information is then processed for delivering expressive tags; the text is converted into phoneme, viseme, and prosodic descriptions. Audio speech is generated by the LIMSI selection-concatenation text-to-speech engine. Visual speech is using MPEG4 keypoint-based animation, and is rendered in real-time by Virtual Choreographer (VirChor), a GPU-based 3D engine. Finally, visual and audio speech is played in a 3D audio and visual scene. The project also puts a lot of effort for realistic visual and audio 3D rendering. A new model of phoneme-dependant human radiation patterns is included in the speech synthesis system, so that the ACA can move in the virtual scene with realistic 3D visual and audio rendering.
PDF

'EVE-Sound^TM' Toolkit for Interactive Sound in Virtual Environment (가상환경의 인터랙티브 사운드를 위한 'EVE-Sound^TM' 툴킷)

Nam, Yang-Hee;Sung, Suk-Jeong
- The KIPS Transactions:PartB
- /
- v.14B no.4
- /
- pp.273-280
- /
- 2007
This paper presents a new 3D sound toolkit called $EVE-Sound^{TM}$ that consists of pre-processing tool for environment simplification preserving sound effect and 3D sound API for real-time rendering. It is designed so that it can allow users to interact with complex 3D virtual environments by audio-visual modalities. $EVE-Sound^{TM}$ toolkit would serve two different types of users: high-level programmers who need an easy-to-use sound API for developing realistic 3D audio-visually rendered applications, and the researchers in 3D sound field who need to experiment with or develop new algorithms while not wanting to re-write all the required code from scratch. An interactive virtual environment application is created with the sound engine constructed using $EVE-Sound^{TM}$ toolkit, and it shows the real-time audio-visual rendering performance and the applicability of proposed $EVE-Sound^{TM}$ for building interactive applications with complex 3D environments.
https://doi.org/10.3745/KIPSTB.2007.14-B.4.273 인용 PDF KSCI

Standardization of MPEG-I Immersive Audio and Related Technologies (MPEG-I Immersive Audio 표준화 및 기술 동향)

Jang, D.Y.;Kang, K.O.;Lee, Y.J.;Yoo, J.H.;Lee, T.J.
- Electronics and Telecommunications Trends
- /
- v.37 no.3
- /
- pp.52-63
- /
- 2022
Immersive media, also known as spatial media, has become essential with the decrease in face-to-face activities in the COVID-19 pandemic era. Teleconference, metaverse, and digital twin have been developed with high expectations as immersive media services, and the demand for hyper-realistic media is increasing. Under these circumstances, MPEG-I Immersive Media is being standardized as a technologies of navigable virtual reality, which is expected to be launched in the first half of 2024, and the Audio Group is working to standardize the immersive audio technology. Following this trend, this article introduces the trend in MPEG-I immersive audio standardization. Further, it describes the features of the immersive audio rendering technology, focusing on the structure and function of the RM0 base technology, which was chosen after evaluating all the technologies proposed in the January 2022 "MPEG Audio Meeting."
https://doi.org/10.22648/ETRI.2022.J.370306 인용 PDF

A Real Time 6 DoF Spatial Audio Rendering System based on MPEG-I AEP (MPEG-I AEP 기반 실시간 6 자유도 공간음향 렌더링 시스템)

Kyeongok Kang;Jae-hyoun Yoo;Daeyoung Jang;Yong Ju Lee;Taejin Lee
- Journal of Broadcast Engineering
- /
- v.28 no.2
- /
- pp.213-229
- /
- 2023
In this paper, we introduce a spatial sound rendering system that provides 6DoF spatial sound in real time in response to the movement of a listener located in a virtual environment. This system was implemented using MPEG-I AEP as a development environment for the CfP response of MPEG-I Immersive Audio and consists of an encoder and a renderer including a decoder. The encoder serves to offline encode metadata such as the spatial audio parameters of the virtual space scene included in EIF and the directivity information of the sound source provided in the SOFA file and deliver them to the bitstream. The renderer receives the transmitted bitstream and performs 6DoF spatial sound rendering in real time according to the position of the listener. The main spatial sound processing technologies applied to the rendering system include sound source effect and obstacle effect, and other ones for the system processing include Doppler effect, sound field effect and etc. The results of self-subjective evaluation of the developed system are introduced.
https://doi.org/10.5909/JBE.2023.28.2.213 인용 PDF

MPEG-H 3D Audio Decoder Structure and Complexity Analysis (MPEG-H 3D 오디오 표준 복호화기 구조 및 연산량 분석)

Moon, Hyeongi;Park, Young-cheol;Lee, Yong Ju;Whang, Young-soo
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.42 no.2
- /
- pp.432-443
- /
- 2017
The primary goal of the MPEG-H 3D Audio standard is to provide immersive audio environments for high-resolution broadcasting services such as UHDTV. This standard incorporates a wide range of technologies such as encoding/decoding technology for multi-channel/object/scene-based signal, rendering technology for providing 3D audio in various playback environments, and post-processing technology. The reference software decoder of this standard is a structure combining several modules and can operate in various modes. Each module is composed of independent executable files and executed sequentially, real time decoding is impossible. In this paper, we make DLL library of the core decoder, format converter, object renderer, and binaural renderer of the standard and integrate them to enable frame-based decoding. In addition, by measuring the computation complexity of each mode of the MPEG-H 3D-Audio decoder, this paper also provides a reference for selecting the appropriate decoding mode for various hardware platforms. As a result of the computational complexity measurement, the low complexity profiles included in Korean broadcasting standard has a computation complexity of 2.8 times to 12.4 times that of the QMF synthesis operation in case of rendering as a channel signals, and it has a computation complexity of 4.1 times to 15.3 times of the QMF synthesis operation in case of rendering as a binaural signals.
https://doi.org/10.7840/kics.2017.42.2.432 인용 PDF KSCI

Overview of MPEG 3D Audio Standard Activities for High-Order Multichannel Realistic Audio Service (고차 다채널 실감 오디오 서비스를 위한 MPEG 3D Audio 표준화 동향)

Seo, Jeongil;Kang, Kyeongok;Jeong, Dae-Gwon
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2012.07a
- /
- pp.171-173
- /
- 2012
본 논문에서는 최근 MPEG 오디오 서브그룹에서 활발히 논의 중인 3D Audio 표준화 동향에 대해서 소개하고, 관련한 국내외 기관들의 기술개발 현황에 대해서 알아본다. MPEG 3D Audio 는 NHK 22.2 채널방송과 같은 실감 오디오 서비스를 고다채널(High-Order Multichannel)로 특징짓고, 이러한 서비스를 위한 다채널 오디오 부호화 및 복호화 기술과 다양한 출력채널 환경에 적응할 수 있는 렌더링(rendering) 기술을 표준화 대상으로 규정하고 있다.
PDF

A Spatial Audio System Using Multiple Microphones on a Rigid Sphere

Lee, Tae-Jin;Jang, Dae-Young;Kang, Kyeong-Ok;Kim, Jin-Woong;Jeong, Dae-Gwon;Hamada, Hareo
- ETRI Journal
- /
- v.27 no.2
- /
- pp.153-165
- /
- 2005
The main purpose of a spatial audio system is to give a listener the same impression as if he/she were present in a recorded environment. A dummy head microphone is generally used for such purposes. Because of its human-like shape, we can obtain good spatial sound images. However, its shape is a restriction on its public use and it is difficult to convert a 2-channel recording into multi-channel signals for an efficient rendering over a multi-speaker arrangement. In order to solve the problems mentioned above, a spatial audio system is proposed that uses multiple microphones on a rigid sphere. The system has five microphones placed on special points of the rigid sphere, and it generates audio signals for headphone, stereo, stereo dipole, 4-channel, and 5-channel reproduction environments. Subjective localization experiments show that front/back confusion, which is a common limitation of spatial audio systems using the dummy head microphone, can be reduced dramatically in 4-channel and 5-channel reproduction environments and can be reduced slightly in a headphone reproduction.
PDF

음성인식 기반 인터렉티브 미디어아트의 연구 - 소리-시각 인터렉티브 설치미술 "Water Music" 을 중심으로-

Lee, Myung-Hak;Jiang, Cheng-Ri;Kim, Bong-Hwa;Kim, Kyu-Jung
- 한국HCI학회:학술대회논문집
- /
- 2008.02a
- /
- pp.354-359
- /
- 2008
This Audio-Visual Interactive Installation is composed of a video projection of a video Projection and digital Interface technology combining with the viewer's voice recognition. The Viewer can interact with the computer generated moving images growing on the screen by blowing his/her breathing or making sound. This symbiotic audio and visual installation environment allows the viewers to experience an illusionistic spacephysically as well as psychologically. The main programming technologies used to generate moving water waves which can interact with the viewer in this installation are visual C++ and DirectX SDK For making water waves, full-3D rendering technology and particle system were used.
PDF

A 3D Audio Broadcasting Terminal for Interactive Broadcasting Services (대화형 방송을 위한 3차원 오디오 방송단말)

Park Gi Yoon;Lee Taejin;Kang Kyeongok;Hong Jinwoo
- Journal of Broadcast Engineering
- /
- v.10 no.1 s.26
- /
- pp.22-30
- /
- 2005
We implement an interactive 3D audio broadcasting terminal which synthesizes an audio scene according to the request of a user. Audio scene structure is described by the MPEG-4 AudioBIFS specifications. The user updates scene attributes and the terminal synthesizes the corresponding sound images in the 3D space. The terminal supports the MPEG-4 Audio top nodes and some visual nodes. Instead of using sensor nodes and route elements, we predefine node type-specific user interfaces to support BIFS commands for field replacement. We employ sound spatialization, directivity/shape modeling, and reverberation effects for 3D audio rendering and realistic feedback to user inputs. We also introduce a virtual concert program as an application scenario of the interactive broadcasting terminal.
PDF KSCI

Search Result 29, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)