Search | Korea Science

A Scalable Audio Coder for High-quality Speech and Audio Services

Lee, Gil-Ho;Lee, Young-Han;Kim, Hong-Kook;Kim, Do-Young;Lee, Mi-Suk
- MALSORI
- /
- no.61
- /
- pp.75-86
- /
- 2007
In this paper, we propose a scalable audio coder, which has a variable bandwidth from the narrowband speech bandwidth to the audio bandwidth and also has a bit-rate from 8 to 320 kbits/s, in order to cope with the quality of service(QoS) according to the network load. First of all, the proposed scalable coder splits bandwidth of the input audio into narrowband up to around 4 kHz and above. Next, the narrowband signals are compressed by a speech coding method compatible to an existing standard speech coder such as G.729, and the other signals whose bandwidth is above the narrowband are compressed on the basis of a psychoacoustic model. It is shown from the objective quality tests using the signal-to-noise ratio(SNR) and the perceptual evaluation of audio quality(PEAQ) that the proposed scalable audio coder provides a comparable quality to the MPEG-1 Layer III (MP3) audio coder.
PDF

Joint Spatial-Temporal Quality Improvement Scheme for H.264 Low Bit Rate Video Coding via Adaptive Frameskip

Cui, Ziguan;Gan, Zongliang;Zhu, Xiuchang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.6 no.1
- /
- pp.426-445
- /
- 2012
Conventional rate control (RC) schemes for H.264 video coding usually regulate output bit rate to match channel bandwidth by adjusting quantization parameter (QP) at fixed full frame rate, and the passive frame skipping to avoid buffer overflow usually occurs when scene changes or high motions exist in video sequences especially at low bit rate, which degrades spatial-temporal quality and causes jerky effect. In this paper, an active content adaptive frame skipping scheme is proposed instead of passive methods, which skips subjectively trivial frames by structural similarity (SSIM) measurement between the original frame and the interpolated frame via motion vector (MV) copy scheme. The saved bits from skipped frames are allocated to coded key ones to enhance their spatial quality, and the skipped frames are well recovered based on MV copy scheme from adjacent key ones at the decoder side to maintain constant frame rate. Experimental results show that the proposed active SSIM-based frameskip scheme acquires better and more consistent spatial-temporal quality both in objective (PSNR) and subjective (SSIM) sense with low complexity compared to classic fixed frame rate control method JVT-G012 and prior objective metric based frameskip method.
https://doi.org/10.3837/tiis.2012.01.024 인용 PDF KSCI

Human Perception of Asymmetrical Three-Dimensional Image (비대칭적 3차원 영상에 대한 인간의 인지 특성)

Ha, Chang-Woo;Lee, Wan-Jae;Jin, Soon-Jong;Jeong, Je-Chang
- Journal of Broadcast Engineering
- /
- v.12 no.1 s.34
- /
- pp.41-52
- /
- 2007
The 3DTV services can be seen as a general case of the multi-view video that has been receiving a significant attention lately. However, the key factors that influence the success of 3DTV are the availability of content, the ease of use, the quality of contents, and the reduction of cost. This paper deals primarily with the perceptual improvement in image quality, especially based on human factors. An optimal asymmetrical coding method for binocular and multi-view images is presented. The quantitative value of asymmetrical rate to maintain optimized subjective image quality is explored. Also we analyze how edges of 2D images affect on 3D perceptions and propose an edge-preserving algorithm to perform perceptual improvements. Experimental results demonstrate that the proposed algorithm enhances subjective image quality much better than conventional methods.
https://doi.org/10.5909/JBE.2007.12.1.41 인용 PDF KSCI

Visual-Attention-Aware Progressive RoI Trick Mode Streaming in Interactive Panoramic Video Service

Seok, Joo Myoung;Lee, Yonghun
- ETRI Journal
- /
- v.36 no.2
- /
- pp.253-263
- /
- 2014
In the near future, traditional narrow and fixed viewpoint video services will be replaced by high-quality panorama video services. This paper proposes a visual-attention-aware progressive region of interest (RoI) trick mode streaming service (VA-PRTS) that prioritizes video data to transmit according to the visual attention and transmits prioritized video data progressively. VA-PRTS enables the receiver to speed up the time to display without degrading the perceptual quality. For the proposed VA-PRTS, this paper defines a cutoff visual attention metric algorithm to determine the quality of the encoded video slice based on the capability of visual attention and the progressive streaming method based on the priority of RoI video data. Compared to conventional methods, VA-PRTS increases the bitrate saving by over 57% and decreases the interactive delay by over 66%, while maintaining a level of perceptual video quality. The experiment results show that the proposed VA-PRTS improves the quality of the viewer experience for interactive panoramic video streaming services. The development results show that the VA-PRTS has highly practical real-field feasibility.
https://doi.org/10.4218/etrij.14.2113.0012 인용 PDF KSCI KPUBS

Digital Audio Watermarking Scheme Using Perceptual Modeling (지각 모델링을 이용한 디지털 오디오 워터마킹 방법)

석종원;홍진우
- Journal of Broadcast Engineering
- /
- v.6 no.2
- /
- pp.195-202
- /
- 2001
As a solution for copyright protection of digital multimedia contents, digital watermark technology is now drawing the attention. In this paper, we presented two novel audio watermarking algorithms as a solution for protecting unauthorized copy of digital audio. Proposed watermarking schemes include the psychoacoustic model of MPEG audio coding to achieve the perceptual transparency after watermark embedding and preprocessing procedure before correlation in watermark detection to extract copyright information without access to the original audio signal. Experimental results show that our watermarking scheme is robust to common signal Processing attacks and it Introduces no audible distortion after watermark insertion.
PDF

Perceptual Video Coding using Deep Convolutional Neural Network based JND Model (심층 합성곱 신경망 기반 JND 모델을 이용한 인지 비디오 부호화)

Kim, Jongho;Lee, Dae Yeol;Cho, Seunghyun;Jeong, Seyoon;Choi, Jinsoo;Kim, Hui-Yong
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2018.06a
- /
- pp.213-216
- /
- 2018
본 논문에서는 사람의 인지 시각 특성 중 하나인 JND(Just Noticeable Difference)를 이용한 인지 비디오 부호화 기법을 제안한다. JND 기반 인지 부호화 방법은 사람의 인지 시각 특성을 이용해 시각적으로 인지가 잘 되지 않는 인지 신호를 제거함으로 부호화 효율을 높이는 방법이다. 제안된 방법은 기존 수학적 모델 기반의 JND 기법이 아닌 최근 각광 받고 있는 데이터 중심(data-driven) 모델링 방법인 심층 신경망 기반 JND 모델 생성 기법을 제안한다. 제안된 심층 신경망 기반 JND 모델은 비디오 부호화 과정에서 입력 영상에 대한 전처리를 통해 입력 영상의 인지 중복(perceptual redundancy)를 제거하는 역할을 수행한다. 부호화 실험에서 제안된 방법은 동일하거나 유사한 인지화질을 유지한 상태에서 평균 16.86 %의 부호화 비트를 감소 시켰다.
PDF

Improvement of the TCX Module in AMR-WB+ Codec Using Pyramid VQ (Pyramid VQ를 이용한 AMR-WB+ 코덱 내 TCX 모듈의 성능 개선)

Park, Sang-Kuk;Park, Jung-Eun;Baik, Seung-Kweon;Seo, Jung-Il;Kang, Sang-Won
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.3
- /
- pp.109-114
- /
- 2007
In this paper, we Propose a pyramid VQ to quantize the transform coefficients of TCX module for the audio improvement of AMR-WB+ codec. The Proposed pyramid VQ is compared to the $RE_8$ Lattice VQ used in the AMR-WB+ standard codec. demonstrating improvement 4% and 5.7%. respectively, in Mean Squared Error (MSE) and 3.3% and 4.7%. respectively, in Perceptual Evaluation of Audio Quality (PEAQ) by 8-dimensional and 16-dimensional Pyramid VQ.
https://doi.org/10.7776/ASK.2007.26.3.109 인용 PDF KSCI

Audio Quality Enhancement at a Low-bit Rate Perceptual Audio Coding (저비트율로 압축된 오디오의 음질 개선 방법)

서정일;서진수;홍진우;강경옥
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.6
- /
- pp.566-575
- /
- 2002
Low-titrate audio coding enables a number of Internet and mobile multimedia streaming service more efficiently. For the help of next-generation mobile telephone technologies and digital audio/video compression algorithm, we can enjoy the real-time multimedia contents on our mobile devices (cellular phone, PDA notebook, etc). But the limited available bandwidth of mobile communication network prohibits transmitting high-qualify AV contents. In addition, most bandwidth is assigned to transmit video contents. In this paper, we design a novel and simple method for reproducing high frequency components. The spectrum of high frequency components, which are lost by down-sampling, are modeled by the energy rate with low frequency band in Bark scale, and these values are multiplexed with conventional coded bitstream. At the decoder side, the high frequency components are reconstructed by duplicating with low frequency band spectrum at a rate of decoded energy rates. As a result of segmental SNR and MOS test, we convinced that our proposed method enhances the subjective sound quality only 10%∼20% additional bits. In addition, this proposed method can apply all kinds of frequency domain audio compression algorithms, such as MPEG-1/2, AAC, AC-3, and etc.
PDF KSCI

Effects of Association and Imagery on Word Recognition (단어재인에 미치는 연상과 심상성의 영향)

Kim, Min-Jung;Lee, Seung-Bok;Jung, Bum-Suk
- Korean Journal of Cognitive Science
- /
- v.20 no.3
- /
- pp.243-274
- /
- 2009
The association, word frequency and imagery have been considered as the main factors that affect the word recognition. The present study aimed to examine the imagery effect and the interaction of the association effect while controlling the frequency effect. To explain the imagery effect, we compared the two theories (dual-coding theory, context availability model). The lexical decision task using priming paradigm was administered. The duration of prime words was manipulated as 20ms, 50ms, and 450ms in experiments 1, 2, and 3, respectively. The association and imagery of prime words were manipulated as the main factors in each of the three experiments. In experiment 1, the duration of prime words (20ms) which is expected to not activate the semantic context enough to affects the word recognition was used. As a result, only imagery effect was statically significant. In experiment 2, the duration of prime word was 50ms, which we expected to activate the semantic context without perceptual awareness. The result showed both the association and imagery effects. The interaction between the two effects was also significant. In experiment 3, to activate the semantic context with perceptual awareness, the prime words were presented for 450ms. Only association effect was statically significant in this experimental condition. The results of the three experiments suggest that the influence of the imagery was at the early stages of word recognition, while the association effect appeared rather later than the imagery. These results implied that the two theories are not contrary to each other. The dual-coding theory just concerned imagery effect which affects the early stage of word recognition, and context-availability model is more for the semantic context effect which affects rather later stage of word recognition. To explain the word recognition process more completely, some integrated model need to be developed considering not only the main 3 effects but also the stages which extends along the time course of the process.
PDF

Selective Quantization Based on Band Property for Wideband Signal Codec (광대역 신호 압축기를 위한 주파수 대역 특성에 선택적인 양자화 방법)

송재종;박호종;김무영;김도석;김정수
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.7
- /
- pp.76-82
- /
- 2001
In this paper, a novel quantization method for wideband signal codec with 7 kHz bandwidth is proposed. In the transform-based wideband signal codecs, the signal is transformed to frequency domain and the spectral coefficients in each frequency band are quantized based on human perceptual model, followed by Huffman coding. However, the property of each band varies with frequency, and the codec has poor performance when all bands are quantized with the same method. Therefore, a selective quantization method is proposed, which analyzes the band property and selects the quantization domain between frequency domain and time domain based on the quantization efficiency. It is confirmed that the proposed method has better performance than the quantizer of G722.1 codec.
PDF

Search Result 76, Processing Time 0.035 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)