통합 검색 | Korea Science

Real-time 3D Audio Downmixing System based on Sound Rendering for the Immersive Sound of Mobile Virtual Reality Applications

Hong, Dukki;Kwon, Hyuck-Joo;Kim, Cheong Ghil;Park, Woo-Chan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제12권12호
- /
- pp.5936-5954
- /
- 2018
Eight out of the top ten the largest technology companies in the world are involved in some way with the coming mobile VR revolution since Facebook acquired Oculus. This trend has allowed the technology related with mobile VR to achieve remarkable growth in both academic and industry. Therefore, the importance of reproducing the acoustic expression for users to experience more realistic is increasing because auditory cues can enhance the perception of the complicated surrounding environment without the visual system in VR. This paper presents a audio downmixing system for auralization based on hardware, a stage of sound rendering pipelines that can reproduce realiy-like sound but requires high computation costs. The proposed system is verified through an FPGA platform with the special focus on hardware architectural designs for low power and real-time. The results show that the proposed system on an FPGA can downmix maximum 5 sources in real-time rate (52 FPS), with 382 mW low power consumptions. Furthermore, the generated 3D sound with the proposed system was verified with satisfactory results of sound quality via the user evaluation.
https://doi.org/10.3837/tiis.2018.12.018 인용 PDF KSCI

A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering

Martin, J.C.;Jacquemin, C.;Pointal, L.;Katz, B.
- 한국정보컨버전스학회:학술대회논문집
- /
- 한국정보컨버전스학회 2008년도 International conference on information convergence
- /
- pp.53-56
- /
- 2008
This paper reports on the ACQA(Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent(ACA) for conducting research along two main lines: 1/ perceptual experiments(eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head models at different resolutions and the integration of the talking head in virtual scenes. The target application of this expressive ACA is a real-time question and answer speech based system developed at LIMSI(RITEL). The architecture of the system is based on distributed modules exchanging messages through a network protocol. The main components of the system are: RITEL a question and answer system searching raw text, which is able to produce a text(the answer) and attitudinal information; this attitudinal information is then processed for delivering expressive tags; the text is converted into phoneme, viseme, and prosodic descriptions. Audio speech is generated by the LIMSI selection-concatenation text-to-speech engine. Visual speech is using MPEG4 keypoint-based animation, and is rendered in real-time by Virtual Choreographer (VirChor), a GPU-based 3D engine. Finally, visual and audio speech is played in a 3D audio and visual scene. The project also puts a lot of effort for realistic visual and audio 3D rendering. A new model of phoneme-dependant human radiation patterns is included in the speech synthesis system, so that the ACA can move in the virtual scene with realistic 3D visual and audio rendering.
PDF

가상환경의 인터랙티브 사운드를 위한 'EVE-Sound^TM' 툴킷 ('EVE-Sound^TM' Toolkit for Interactive Sound in Virtual Environment)

남양희;성숙정
- 정보처리학회논문지B
- /
- 제14B권4호
- /
- pp.273-280
- /
- 2007
본 논문은 2D/3D 가상환경에서 실감 사운드의 실시간 합성을 위한 $EVE-Sound^{TM}$ 툴킷의 설계와 개발결과를 제시한다. $EVE-Sound^{TM}$는 사운드 효과 계산에 필요한 장면요소를 간추리는 전처리 도구와 실시간 렌더링 API들로 구성되며, 다중 모달리티의 실감 재현을 필요로 하는 가상현실, 게임 등의 각종 인터랙티브 콘텐츠에서 사용자에게 고급 그래픽 환경을 허용하면서 동시에 사운드의 실시간 재현을 가능하게 함을 목표로 하였다. 3차원 사운드 계산이 매우 복잡하기 때문에 기존의 사운드 API들이 간단한 사운드 효과와 재생을 제공하는 것과 달리, 복잡한 가상환경에서 3차원 사운드의 원리를 반영하되 그 렌더링을 실시간화하는 데 초점을 두었고, 이를 위해 새로운 장면 간략화 및 공간사운드 계산 방법이 제시되었다. 응용 사례 및 실험, 알고리즘 분석을 통해 제시된 툴킷의 활용성 및 향상된 실시간성을 확인하였다.
https://doi.org/10.3745/KIPSTB.2007.14-B.4.273 인용 PDF KSCI

MPEG-I Immersive Audio 표준화 및 기술 동향 (Standardization of MPEG-I Immersive Audio and Related Technologies)

장대영;강경옥;이용주;유재현;이태진
- 전자통신동향분석
- /
- 제37권3호
- /
- pp.52-63
- /
- 2022
Immersive media, also known as spatial media, has become essential with the decrease in face-to-face activities in the COVID-19 pandemic era. Teleconference, metaverse, and digital twin have been developed with high expectations as immersive media services, and the demand for hyper-realistic media is increasing. Under these circumstances, MPEG-I Immersive Media is being standardized as a technologies of navigable virtual reality, which is expected to be launched in the first half of 2024, and the Audio Group is working to standardize the immersive audio technology. Following this trend, this article introduces the trend in MPEG-I immersive audio standardization. Further, it describes the features of the immersive audio rendering technology, focusing on the structure and function of the RM0 base technology, which was chosen after evaluating all the technologies proposed in the January 2022 "MPEG Audio Meeting."
https://doi.org/10.22648/ETRI.2022.J.370306 인용 PDF

MPEG-I AEP 기반 실시간 6 자유도 공간음향 렌더링 시스템 (A Real Time 6 DoF Spatial Audio Rendering System based on MPEG-I AEP)

강경옥;유재현;장대영;이용주;이태진
- 방송공학회논문지
- /
- 제28권2호
- /
- pp.213-229
- /
- 2023
본 논문에서는 가상환경에 위치한 청취자의 움직임에 대응하여 실시간으로 6DoF 공간음향을 제공하는 공간음향 렌더링 시스템에 대해 소개한다. 본 시스템은 MPEG-I Immersive Audio CfP 대응을 위하여 MPEG-I AEP를 개발환경으로 사용하여 구현되었으며 인코더와, 디코더를 포함하는 렌더러로 구성된다. 인코더는 인코더 입력 포맷(EIF) 파일에 포함된 가상공간 장면의 공간적 오디오 파라미터와, SOFA 파일로 제공되는 음원의 지향성 정보 등의 메타데이터를 오프라인으로 부호화하여 비트스트림으로 전달하는 역할을 하며, 렌더러는 전달된 비트스트림을 수신하여 청취자의 위치에 따라 실시간으로 6DoF 공간음향 렌더링을 수행한다. 개발된 렌더링 시스템에 적용한 주요 공간음향 처리 기술로는 음원 효과 및 장애물 효과 처리 기술이 있으며, 그 외 시스템 동작에 필요한 기술로는 도플러 효과 및 음장효과 처리 기술 등이 있다. 개발된 시스템에 대한 성능평가 결과로서 자체 주관평가 결과를 소개한다.
https://doi.org/10.5909/JBE.2023.28.2.213 인용 PDF

MPEG-H 3D 오디오 표준 복호화기 구조 및 연산량 분석 (MPEG-H 3D Audio Decoder Structure and Complexity Analysis)

문현기;박영철;이용주;황영수
- 한국통신학회논문지
- /
- 제42권2호
- /
- pp.432-443
- /
- 2017
MPEG-H 3D 오디오 표준은 UHDTV 등의 초고해상도 방송서비스에 대응하는 실감음향 서비스의 제공을 목표로 한다. 이를 위해 본 표준은 다채널 신호, 객체 신호, 장면 기반 신호의 부호화/복호화 기술과 다양한 재생 환경에서 3차원 오디오 제공을 위한 렌더링 기술, 후처리 기술 등 방대한 기술을 통합하였다. 본 표준의 참조 소프트웨어 복호화기는 여러 모듈들이 결합된 구조로 다양한 모드에서 동작이 가능하며, 각 모듈들이 독립된 실행파일로 순차적으로 실행되어 실시간 처리가 불가능하다. 본 논문에서는 MPEG-H 3D 오디오의 코어 복호화기, 포맷 변환기, 객체 렌더러, 바이노럴 렌더러의 각 함수를 동적 라이브러리화 및 통합하여 프레임 기반 복호화가 가능하도록 하였다. 또한 MPEG-H 3D 오디오의 각 모드별 연산량을 측정하여 다양한 하드웨어 플랫폼에서 적합한 모드를 선택하기 위한 참고 자료를 제공한다. 연산량 분석 결과, 한국 방송 표준에 포함된 저연산량 프로파일은 채널 신호로 렌더링을 할 경우 QMF 합성 연산의 2.8배에서 12.4배의 연산량을 가지며, 바이노럴 렌더링을 할 경우 QMF 합성 연산의 4.1배에서 15.3배의 연산량을 가진다.
https://doi.org/10.7840/kics.2017.42.2.432 인용 PDF KSCI

고차 다채널 실감 오디오 서비스를 위한 MPEG 3D Audio 표준화 동향 (Overview of MPEG 3D Audio Standard Activities for High-Order Multichannel Realistic Audio Service)

서정일;강경옥;정대권
- 한국방송∙미디어공학회:학술대회논문집
- /
- 한국방송공학회 2012년도 하계학술대회
- /
- pp.171-173
- /
- 2012
본 논문에서는 최근 MPEG 오디오 서브그룹에서 활발히 논의 중인 3D Audio 표준화 동향에 대해서 소개하고, 관련한 국내외 기관들의 기술개발 현황에 대해서 알아본다. MPEG 3D Audio 는 NHK 22.2 채널방송과 같은 실감 오디오 서비스를 고다채널(High-Order Multichannel)로 특징짓고, 이러한 서비스를 위한 다채널 오디오 부호화 및 복호화 기술과 다양한 출력채널 환경에 적응할 수 있는 렌더링(rendering) 기술을 표준화 대상으로 규정하고 있다.
PDF

A Spatial Audio System Using Multiple Microphones on a Rigid Sphere

Lee, Tae-Jin;Jang, Dae-Young;Kang, Kyeong-Ok;Kim, Jin-Woong;Jeong, Dae-Gwon;Hamada, Hareo
- ETRI Journal
- /
- 제27권2호
- /
- pp.153-165
- /
- 2005
The main purpose of a spatial audio system is to give a listener the same impression as if he/she were present in a recorded environment. A dummy head microphone is generally used for such purposes. Because of its human-like shape, we can obtain good spatial sound images. However, its shape is a restriction on its public use and it is difficult to convert a 2-channel recording into multi-channel signals for an efficient rendering over a multi-speaker arrangement. In order to solve the problems mentioned above, a spatial audio system is proposed that uses multiple microphones on a rigid sphere. The system has five microphones placed on special points of the rigid sphere, and it generates audio signals for headphone, stereo, stereo dipole, 4-channel, and 5-channel reproduction environments. Subjective localization experiments show that front/back confusion, which is a common limitation of spatial audio systems using the dummy head microphone, can be reduced dramatically in 4-channel and 5-channel reproduction environments and can be reduced slightly in a headphone reproduction.
PDF

음성인식 기반 인터렉티브 미디어아트의 연구 - 소리-시각 인터렉티브 설치미술 "Water Music" 을 중심으로-

이명학;강성일;김봉화;김규정
- 한국HCI학회:학술대회논문집
- /
- 한국HCI학회 2008년도 학술대회 1부
- /
- pp.354-359
- /
- 2008
소리-시각 인터랙티브 설치미술인 "Water Music" 은 관객의 음성에 따라서 변하는 물결의 파동을 표현한다. 음정인식 기반 인터페이스 기술을 이용하여 벽면에 비디오 프로젝션 된 시각적 물결이미지로 나타난다. 물결이미지는 동양화의 붓으로 그린 물결과 작은 원형의 입자들을 생성하여 표현된 영상으로 구성된다. 관객은 입김을 불어 넣거나 소리를 냄으로써 화면에서 연속적으로 생성되는 컴퓨터 프로그램 기반 물결의 움직임과 상호 반응할 수 있다. 이러한 공생적인 소리 시각 환경은 관객에게 생각으로 그리고 신체적으로 환영적 공간을 경험하도록 한다. 본 설치작업에서 관객과 상호 반응 할 수 있는 움직이는 물결을 생성하기 위하여 적용된 주요 프로그램은 Visual C++ and DirectX SDK이며, 풀 프레임 3D 렌더링 기술과 파티클 시스템이 사용되었다.
PDF

대화형 방송을 위한 3차원 오디오 방송단말 (A 3D Audio Broadcasting Terminal for Interactive Broadcasting Services)

박기윤;이태진;강경옥;홍진우
- 방송공학회논문지
- /
- 제10권1호
- /
- pp.22-30
- /
- 2005
본 논문에서는 사용자 제어에 따라 3차원 오디오 장면을 재구성할 수 있는 대화형 오디오 방송단말에 관하여 기술한다. MPEG-4 AudioBIFS 규격에 따라 계층적으로 표현한 오디오 장면의 속성을 사용자 제어에 따라 갱신하고, 주어진 속성을 참조하여 오디오 데이터를 3차원 공간상에 재합성하는 방식을 취한다. 단말은 MPEG-4 Audio 최상위 노드와 몇몇 비디오 노드를 지원하며, 센서 노드와 라우트 요소를 이용하는 대신에 노드 유형에 따른 사용자 인터페이스를 미리 정의함으로써 필드 갱신 BIFS 명령어를 지원한다. 3차원 오디오 데이터를 재생하는 기능은 사용자의 입력에 대한 피드백을 풍부하게 하여 대화형 방송의 효과를 극대화하고, 사실감을 제고하는 데 중요한 역할을 담당한다. 본 단말에서는 3차원 오디오 기술을 이용하여 음상의 위치, 지향성, 모양, 잔향특성 등을 사용자가 제어할 수 있다. 본 논문에서는 가상 합주 프로그램 등의 서비스 예를 통해 대화형 방송단말의 서비스 모델을 제시한다.
PDF KSCI

검색결과 29건 처리시간 0.02초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)