• Title/Summary/Keyword: memory bandwidth reduction

Search Result 22, Processing Time 0.029 seconds

BLOCK-BASED ADAPTIVE BIT ALLOCATION FOR REFENCE MEMORY REDUCTION

  • Park, Sea-Nae;Nam, Jung-Hak;Sim, Dong-Gy;Joo, Young-Hun;Kim, Yong-Serk;Kim, Hyun-Mun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.258-262
    • /
    • 2009
  • In this paper, we propose an effective memory reduction algorithm to reduce the amount of reference frame buffer and memory bandwidth in video encoder and decoder. In general video codecs, decoded previous frames should be stored and referred to reduce temporal redundancy. Recently, reference frames are recompressed for memory efficiency and bandwidth reduction between a main processor and external memory. However, these algorithms could hurt coding efficiency. Several algorithms have been proposed to reduce the amount of reference memory with minimum quality degradation. They still suffer from quality degradation with fixed-bit allocation. In this paper, we propose an adaptive block-based min-max quantization that considers local characteristics of image. In the proposed algorithm, basic process unit is $8{\times}8$ for memory alignment and apply an adaptive quantization to each $4{\times}4$ block for minimizing quality degradation. We found that the proposed algorithm could improve approximately 37.5% in coding efficiency, compared with an existing memory reduction algorithm, at the same memory reduction rate.

  • PDF

A New Embedded Compression Algorithm for Memory Size and Bandwidth Reduction in Wavelet Transform Appliable to JPEG2000 (JPEG2000의 웨이블릿 변환용 메모리 크기 및 대역폭 감소를 위한 새로운 Embedded Compression 알고리즘)

  • Son, Chang-Hoon;Song, Sung-Gun;Kim, Ji-Won;Park, Seong-Mo;Kim, Young-Min
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.1
    • /
    • pp.94-102
    • /
    • 2011
  • To alleviate the size and bandwidth requirement in JPEG2000 system, a new Embedded Compression(EC) algorithm with minor image quality drop is proposed. For both random accessibility and low latency, very simple and efficient hadamard transform based compression algorithm is devised. We reduced LL intermediate memory and code-block memory to about half size and achieved significant memory bandwidth reductions(about 52~73%) through proposed multi-mode algorithms, without requiring any modification in JPEG2000 standard algorithm.

Memory-Efficient Belief Propagation for Stereo Matching on GPU (GPU 에서의 고속 스테레오 정합을 위한 메모리 효율적인 Belief Propagation)

  • Choi, Young-Kyu;Williem, Williem;Park, In Kyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.11a
    • /
    • pp.52-53
    • /
    • 2012
  • Belief propagation (BP) is a commonly used global energy minimization algorithm for solving stereo matching problem in 3D reconstruction. However, it requires large memory bandwidth and data size. In this paper, we propose a novel memory-efficient algorithm of BP in stereo matching on the Graphics Processing Units (GPU). The data size and transfer bandwidth are significantly reduced by storing only a part of the whole message. In order to maintain the accuracy of the matching result, the local messages are reconstructed using shared memory available in GPU. Experimental result shows that there is almost an order of reduction in the global memory consumption, and 21 to 46% saving in memory bandwidth when compared to the conventional algorithm. The implementation result on a recent GPU shows that we can obtain 22.8 times speedup in execution time compared to the execution on CPU.

  • PDF

Multi-mode Embedded Compression Algorithm and Architecture for Code-block Memory Size and Bandwidth Reduction in JPEG2000 System (JPEG2000 시스템의 코드블록 메모리 크기 및 대역폭 감소를 위한 Multi-mode Embedded Compression 알고리즘 및 구조)

  • Son, Chang-Hoon;Park, Seong-Mo;Kim, Young-Min
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.41-52
    • /
    • 2009
  • In Motion JPEG2000 encoding, huge bandwidth requirement of data memory access is the bottleneck in required system performance. For the alleviation of this bandwidth requirement, a new embedded compression(EC) algorithm with a little bit of image quality drop is devised. For both random accessibility and low latency, very simple and efficient entropy coding algorithm is proposed. We achieved significant memory bandwidth reductions (about 53${\sim}$81%) and reduced code-block memory to about half size through proposed multi-mode algorithms, without requiring any modification in JPEG2000 standard algorithm.

Block-based Adaptive Bit Allocation for Reference Memory Reduction (효율적인 참조 메모리 사용을 위한 블록기반 적응적 비트할당 알고리즘)

  • Park, Sea-Nae;Nam, Jung-Hak;Sim, Dong-Gy;Joo, Young-Hun;Kim, Yong-Serk;Kim, Hyun-Mun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.3
    • /
    • pp.68-74
    • /
    • 2009
  • In this paper, we propose an effective memory reduction algorithm to reduce the amount of reference frame buffer and memory bandwidth in video encoder and decoder. In general video codecs, decoded previous frames should be stored and referred to reduce temporal redundancy. Recently, reference frames are recompressed for memory efficiency and bandwidth reduction between a main processor and external memory. However, these algorithms could hurt coding efficiency. Several algorithms have been proposed to reduce the amount of reference memory with minimum quality degradation. They still suffer from quality degradation with fixed-bit allocation. In this paper, we propose an adaptive block-based min-max quantization that considers local characteristics of image. In the proposed algorithm, basic process unit is $8{\times}8$ for memory alignment and apply an adaptive quantization to each $4{\times}4$ block for minimizing quality degradation. We found that the proposed algorithm can obtain around 1.7% BD-bitrate gain and 0.03dB BD-PSNR gain, compared with the conventional fixed-bit min-max algorithm with 37.5% memory saving.

A novel hardware design for SIFT generation with reduced memory requirement

  • Kim, Eung Sup;Lee, Hyuk-Jae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.13 no.2
    • /
    • pp.157-169
    • /
    • 2013
  • Scale Invariant Feature Transform (SIFT) generates image features widely used to match objects in different images. Previous work on hardware-based SIFT implementation requires excessive internal memory and hardware logic [1]. In this paper, a new hardware organization is proposed to implement SIFT with less memory and hardware cost than the previous work. To this end, a parallel Gaussian filter bank is adopted to eliminate the buffers that store intermediate results because parallel operations allow all intermediate results available at the same time. Furthermore, the processing order is changed from the raster-scan order to the block-by-block order so that the line buffer size storing the source image is also reduced. These techniques trade the reduction of memory size with a slight increase of the execution time and external memory bandwidth. As a result, the memory size is reduced by 94.4%. The proposed hardware for SIFT implementation includes the Descriptor generation block, which is omitted in the previous work [1]. The addition of the hardwired descriptor generation improves the computation speed by about 30 times when compared with the previous work.

Design of Memory-Access-Efficient H.264 Intra Predictor Integrated with Motion Compensator (H.264 복호기에서 움직임 보상기와 연계하여 메모리 접근면에서 효율적인 인트라 예측기 설계)

  • Park, Jong-Sik;Lee, Seong-Soo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.6
    • /
    • pp.37-42
    • /
    • 2008
  • In H.264/AVC decoder, intra predictor, motion compensator, and deblocking filter need to read reference images in external frame memory in decoding process. They read external frame memory very frequently, which lowers system operation speed and increases power consumption. This paper proposes a intra predictor integrated with motion compensator without external frame memory. It achieves power reduction and memory bandwidth minimization by exploiting data reuse of common and repetitive pixels. The proposed infra predictor achieves more than $45%\;{\sim}\;75%$ cycle time reduction compared with conventional intra predictors.

A New Predictive EC Algorithm for Reduction of Memory Size and Bandwidth Requirements in Wavelet Transform (웨이블릿 변환의 메모리 크기와 대역폭 감소를 위한 Prediction 기반의 Embedded Compression 알고리즘)

  • Choi, Woo-Soo;Son, Chang-Hoon;Kim, Ji-Won;Na, Seong-Yu;Kim, Young-Min
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.7
    • /
    • pp.917-923
    • /
    • 2011
  • In this paper, a new prediction based embedded compression (EC) codec algorithm for the JPEG2000 encoder system is proposed to reduce excessive memory requirements. The EC technique can reduce the 50 % memory requirement for intermediate low-frequency coefficients during multiple discrete wavelet transform (DWT) stages compared with direct implementation of the DWT engine of this paper. The LOCO-I predictor and MAP are widely used in many lossless picture compression codec. The proposed EC algorithm use these predictor which are very simple but surprisingly effective. The predictive EC scheme adopts a forward adaptive quantization and fixed length coding to encoding the prediction error. Simulation results show that our LOCO-I and MAP based EC codecs present only PSNR degradation of 0.48 and 0.26 dB in average, respectively. The proposed algorithm improves the average PSNR by 1.39 dB compared to the previous work in [9].

Improving Parallel Testing Efficiency of Memory Chips using NOC Interconnect (NOC 인터커넥트를 활용한 메모리 반도체 병렬 테스트 효율성 개선)

  • Hong, Chaneui;Ahn, Jin-Ho
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.68 no.2
    • /
    • pp.364-369
    • /
    • 2019
  • Generally, since memory chips should be tested all, considering its volume, the reduction in test time for detecting faults plays an important role in reducing the overall production cost. The parallel testing of chips in one ATE is a competitive solution to solve it. In this paper, NOC is proposed as test interface architecture between DUTs and ATE. Because NOC can be extended freely, there is no limit on the number of DUTs tested at the same time. Thus, more memory can be tested with the same bandwidth of ATE. Furthermore, the proposed NOC-based parallel test method can increase the efficiency of channel usage by packet type data transmission.

Memory Access Reduction Scheme for H.264/AVC Decoder Motion Compensation (H.264/AVC 디코더의 움직임 보상을 위한 메모리 접근 감소 기법)

  • Park, Kyoung-Oh;Hong, You-Pyo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.4C
    • /
    • pp.349-354
    • /
    • 2009
  • In this paper, a new motion compensation scheme to reduce external memory access frequency which is one of the major bottlenecks for real-time decoding is proposed. Most H.264/AVC decoders store reference pictures in external memories due to the large size and reference blocks are read into the decoder core as needed during decoding. If the reference data access is done for each reference block in decoding sequence, the memory bandwidth can be unacceptable for real-time decoding. This paper presents a memory access scheme for motion compensation to read as many reference data as possible with reduced memory access frequency by analyzing reference data access pattern for each macroblock. Experimental results show that the proposed motion compensation scheme leads to approximately 30% improvement in memory bandwidth requirement.