Search | Korea Science

Unified coding scheme of speech and music (음악 및 음성 신호의 융합 압축 기술)

O, Eun-Mi
- Broadcasting and Media Magazine
- /
- v.16 no.4
- /
- pp.59-71
- /
- 2011
오디오와 음성 압축 기술적 근간은 서로 다르지만, 최근의 모바일 멀티미디어 기기 시장의 컨버전스 현상에 따라 압축하고자 하는 신호가 혼용되고 있으며, 비슷한 목표 전송률과 음질로 수렴하고 있다. 현재는 동일 기기에서 서로 다른 압축 기술을 적용하고 있으나, 음성과 음악이 동시에 서비스 되는 멀티미디어 기기에서는 단일 압축 방식으로 처리하고자 하는 이슈가 부각되고 있다. 특히, 스마트 폰 및 음악 콘텐츠 포탈 서비스의 대중화를 고려할 때, 음성 및 음악 신호 모두를 효율적으로 압축하는 음악 및 음성 신호의 융합 압축 기술이 더욱 필요해 보인다. 본 고에서는 MPEG 오디오 그룹에서 가장 최근 진행한 Unified Speech and Audio Coding(USAC)의 탄생 배경 및 표준화 현황을 소개한다. USAC는 64kbps 이하에서 기술적으로 최고 성능을 지닌 AMR-WB+ 및 HE-AAC v2보다도 우월한 음질을 보이며, 높은 비트율에서도 동등한 음질을 보장한다. 이런 우수한 음질에 기여한 USAC의 스위칭 구조와 더불어 기술적으로 향상된 주요 모듈인 파라미터 기반 스테레오 및 고주파 압축, 그리고 엔트로피 코딩 방식에 대해서 살펴 본다. 향후, 다양한 오디오 신호를 효율적으로 압축하는 USAC는 디지털 라디오, 모바일 TV, 그리고 오디오 북과 같은 사용자 시나리오에서 사용될 확률이 높아 보인다. 또한, USAC는 배경 잡음이나 배경 음악이 있는 경우에도 성능이 우수하기 때문에 YouTube 및 podcast 등과 같이 사용자가 콘텐츠를 생성할 때도 유용하게 사용 될 수 있다.
PDF KSCI

Wavelet Based Video/Audio Player for Cellular Phone (휴대 전화를 위한 웨이블릿 기반의 비디오/오디오 플레이어)

Jeong, Jin-Hwan;Han, Sang-Beom;Ryu, Eun-Seok;Yoo, Hyuck;Kim, Il-Jin
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10b
- /
- pp.493-495
- /
- 2003
최근의 휴대 전화는 단순한 음성 통신 기기 역할 뿐만 아니라 데이터 통신 기기로도 쓰이고 있으며, CDMA-2000 망 보급으로 인하여 데이터 통신 대역폭이 멀티미디어 데이터를 처리 할 수 있을 만큼 증대 되었다. 하지만 휴대 전화는 하드웨어 성능이 음성 통신 기기로 최적화 되어 있고 휴대성을 높이기 위해 저전력의 저 성능 프로세서를 탑재 하였기 때문에 소프트웨어 방식의 비디오/오디오 재생이 매우 힘들다. 특히. 널리 사용되는 비디오/오디오 표준(MPEG-x, H.26x, 등등)은 압축 최우선의 방식으로써 계산량이 매우 크기 때문에 휴대 전화에서 하드웨어 도움 없이 소프트웨어로만 재생하기에는 적합하지 않다. 본 논문에서는 이러한 문제를 해결하기 위해 먼저 일반 목적의 널리 사용되는 코덱의 문제점과 휴대전화의 하드웨어 자원에 관해 알아 보고, 연산량을 효과적으로 조절할 수 있는 웨이블릿 함수를 이용하여 휴대 전화 시스템에 적합한 비디오/오디오 코덱을 제안한다. 또한 비디오 디코딩에 필요한 연산을 측정하고 실제 휴대 전화에 적용하여 그 성능을 확인 한다.
PDF

Development of Audio Watermark Decoding Model Using Support Vector Machine (Support Vector Machine을 이용한 오디오 워터마크 디코딩 모델 개발)

Seo, Yejin;Cho, Sangjin
- The Journal of the Acoustical Society of Korea
- /
- v.33 no.6
- /
- pp.400-406
- /
- 2014
This paper describes a robust watermark decoding model using a SVM(Support Vector Machine). First, the embedding process is performed inversely for a watermarked signal. And then the watermark is extracted using the proposed model. For SVM training of the proposed model, data are generated that are watermarks extracted from sounds containing watermarks by four different embedding schemes. BER(Bit Error Rate) values of the data are utilized to determine a threshold value employed to create training set. To evaluate the robustness, 14 attacks selected in StirMark, SMDI and STEP2000 benchmarking are applied. Consequently, the proposed model outperformed previous method in PSNR(Peak Signal to Noise Ratio) and BER. It is noticeable that the proposed method achieves BER 1% below in the case of PSNR greater than 10 dB.
https://doi.org/10.7776/ASK.2014.33.6.400 인용 PDF KSCI

MPEG Surround for Multi-Channel Audio Coding-Part 2: Various Modes and Tools (다채널 오디오 코딩을 위한 MPEG Surround-2부: 다양한 모드 및 툴들)

Pang, Hee-Suk
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.7
- /
- pp.610-617
- /
- 2009
An overview of various modes and tools of MPEG Surround is provided Because the binaural mode of MPEG Surround supports the virtual 5.1-channel playback based on HRTFs, it can be played via headphones and earphones for portable audio devices. MPEG Surround also supports the enhanced matrix mode which converts stereo signals to 5.1-channel signals without side information, the 3D stereo mode which deals with 3D-coded signals, the low power version which greatly reduces the computational load in the decoding process. Besides, MPEG Surround provides the arbitrary downmix gains (ADGs) tool which is applied to artistic downmix signals, the matrix compatibility tool which is applied to downmix signals by conventional matrix-based methods, the residual coding tool -which can be used at high bit rates, and the GES tool which is applied to specific sound such as applause. The listening test results by various companies and organizations are also presented for important modes and tools.
https://doi.org/10.7776/ASK.2009.28.7.610 인용 PDF KSCI

A Study on RTP-based Lip Synchronization Control for Very Low Delay in Video Communication (초저지연 비디오 통신을 위한 RTP 기반 립싱크 제어 기술에 관한 연구)

Kim, Byoung-Yong;Lee, Dong-Jin;Kwon, Jae-Cheol;Sim, Dong-Gyu
- Journal of Korea Multimedia Society
- /
- v.10 no.8
- /
- pp.1039-1051
- /
- 2007
In this paper, a new lip synchronization control method is proposed to achieve very low delay in the video communication. The lip control is so much vital in video communication as delay reduction. In a general way, to control the lip synchronization, both the playtime and capture time calculated from RTP time stamp are used. RTP timestamp is created by stream sender and sent to the receiver along the stream. It is extracted from the received packet by stream receiver to calculate playtime and capture time. In this paper, we propose the method of searching most adjacent corresponding frame of the audio signal, which is assumed to be played with uniform speed. Encoding buffer of stream sender is removed to reduce the buffering delay. Besides, decoder buffer of receiver, which is used to correct the cracked packet, is resulted to process only 3 frames. These mechanisms enable us to achieve ultra low delay less than 100 ms, which is essential to video communication. Through simulations, the proposed method shows below the 100 ms delay and controlled the lip synchronization between audio and video.
PDF

Visible Light Communication based Multi-hop Multimedia Data Transmission Networks System (VLC 기반 멀티 홉 멀티미디어 데이터 전송 네트워크 시스템)

Park, In-Chul;Shin, Jung-Jin;Park, Joo-Young;Dung, Le The;An, Beongku
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.14 no.3
- /
- pp.21-31
- /
- 2014
In this paper, we propose VLC(visible light communication) based multi-hop multimedia data transmission system. The main contributions and features of the proposed system are as follows. First, the contribution of this research is to develope the LED communication based multi-hop transmission network system which can transmit multimedia data(audio data, video data) with long distance. Second, the developed system has the following features: In transmitter, audio data and video data are transmitted via multi-hops using two channels. The relay in audio channel receives digital audio signal by using photo diode and then transmits the signal to receiver after error checking and amplifying. The receiver receives the encoded audio data via photo diode and then converts to analog audio signal by using decoding and amplifying. The relay in video channel receives video signal by using photo diode and then amplify the video signal using OP-AMP and then transmits the signal to receiver. The receiver amplifies the received signal from photo diode and then sends it to the monitor. The performance evaluation of the proposed system is conducted in the laboratory with fluorescent light source. The results of the performance evaluation confirm that the system can provide high quality multimedia data transmission from transmiter to receiver via multi-hop relays in a long distance while we can see there are differences in the transmitted multimedia(audio and video) quality according to the used LED colors.
https://doi.org/10.7236/JIIBC.2014.14.3.21 인용 PDF KSCI

Salience of Envelope Interaural Time Difference of High Frequency as Spatial Feature (공간감 인자로서의 고주파 대역 포락선 양이 시간차의 유효성)

Seo, Jeong-Hun;Chon, Sang-Bae;Sung, Koeng-Mo
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.6
- /
- pp.381-387
- /
- 2010
Both timbral features and spatial features are important in the assessment of multichannel audio coding systems. The prediction model, extending the ITU-R Rec. BS. 1387-1 to multichannel audio coding systems, with the use of spatial features such as ITDDist (Interaural Time Difference Distortion), ILDDist (Interaural Level Difference Distortion), and IACCDist (InterAural Cross-correlation Coefficient Distortion) was proposed by Choi et al. In that model, ITDDistswere only computed for low frequency bands (below 1500Hz), and ILDDists were computed only for high frequency bands (over 2500Hz) according to classical duplex theory. However, in the high frequency range, information in temporal envelope is also important in spatial perception, especially in sound localization. A new model to compute the ITD distortions of temporal envelopes in high frequency components is introduced in this paper to investigate the role of such ITD on spatial perception quantitatively. The computed ITD distortions of temporal envelopes in high frequency components were highly correlated with perceived sound quality of multichannel audio sounds.
https://doi.org/10.7776/ASK.2010.29.6.381 인용 PDF KSCI

Bluetooth Audio Gateway and Headset including Connection Function to the Mobile Phone (휴대폰 접속 기능을 포함한 블루투스 오디오 게이트웨이 및 헤드셋)

Chung, J.S.;Chung, T.Y.;Jung, K.W.
- The KIPS Transactions:PartC
- /
- v.11C no.4
- /
- pp.539-544
- /
- 2004
This paper presents the implementation of the bluetooth headset and the audio gateway connected to the mobile Phone in the embedded environment. The bluetooth module includes the BC02 processor chip, the BCSP02 firmware and the bluelab software Including bluetooth protocol stack. The above components in the bluetooth module developed at CSR company are used as the development environment. The application program using API functions supported by bluelab is coded by C language and loaded on the flash ROM of the bluetooth module. The cail processing capacity measuring the call setup time and the clearing time between the audio gateway and the headset is considered as the performance parameter of the developed systems. As a call setup and clearing time between the audio gateway and the headset is about 88.8ms, the call processing capacity is about 11 calls per second. Therefore the performance result is satisfied in the aspect of the call processing time.
https://doi.org/10.3745/KIPSTC.2004.11C.4.539 인용 PDF KSCI

Advanced Timing Model Design for MMT System (MMT 시스템을 위한 개선된 타이밍 모델 설계)

Jung, Tae-Jun;Lee, Hong-rae;Seo, Kwang-deok
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2016.06a
- /
- pp.68-69
- /
- 2016
ISO/IEC 13818-1 MPEG2 시스템의 타이밍 모델은 인코더에 들어간 비디오와 오디오 샘플들이 일정한 딜레이가 지난 후 디코더에서 정확히 한 번씩 나타나는 식으로 구현된다. 해당되는 디코더는 타이밍 모델에 부합하여 대응되는 비트스트림을 전달받는다. 이를 통해서 적절하게 동기화가 이루어진 고품질 오디오와 비디오를 위한 디코더 구현을 쉽게 할 수 있다. 반면에, RTP 타이밍 모델은 실제 프리젠테이션 시간에 관한 타이밍 정보를 가지고 있지 않다. 데이터 패킷의 타임스탬프는 상대적 타이밍을 제공하고, RTCP 송신자는 스트림 간 동기화에 대한 정보를 제공하지만 RTP 수신기에서는 버퍼링의 량이나 패킷의 디코딩 시간에 대한 정보를 주지 않는다. 따라서 RTP는 유동적인 전송 지향적인 타이밍 모델을 가지고 있다. 반면에 MPEG-2 시스템은 정확한 타이밍 모델을 수신측을 위해 제공하고 있다. 본 논문에서는 MPEG-2 시스템과 RTP의 타이밍 모델의 이점을 가져와 MMT 시스템을 위한 타이밍 모델을 제안한다.
PDF

An efficient multichannel spatial audio coding method based on inter channel correlation (채널상관성에 기반한 효율적인 멀티채널 spatial audio coding 방법)

Lee Byonghwa;Beack Seungkwon;Seo Jeongil;Hahn Minsoo
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.157-160
- /
- 2004
Spatial Audio Coding 방법 중 하나인 Binaural Cue Coding 방법은 다채널 다객체 오디오 신호를 모노나 스테레오로 다운 믹스한 신호와 spatial 큐를 전송해 디코더에서 복원하는 기술로 작은 비트 율로 다채널 오디오 신호를 전송 복원해 내는 기술이다. 본 논문은 BCC 코딩 방법에서 채널 상관도를 나타내는 ICC 파라메터에 따라 spatial cue 종류를 달리함으로써 전송되는 부가정보의 비트 율을 줄이는 방법을 제안한다.
PDF

Search Result 94, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)