• Title/Summary/Keyword: Buffer(Memory)

Search Result 369, Processing Time 0.025 seconds

An Efficient H.264/AVC Entropy Decoder Design (효율적인 H.264/AVC 엔트로피 복호기 설계)

  • Moon, Jeon-Hak;Lee, Seong-Soo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.12
    • /
    • pp.102-107
    • /
    • 2007
  • This paper proposes a H.264/AVC entropy decoder without embedded processor nor memory fabrication process. Many researches on H.264/AVC entropy decoders require ROM or RAM fabrication process, which is difficult to be implemented in general digital logic fabrication process. Furthermore, many researches require embedded processors for bitstream manipulation, which increases area and power consumption. This papers proposes hardwired H.264/AVC entropy decoder without embedded processor, which improves data processing speed and reduces power consumption. Furthermore, its CAVLC decoder optimizes lookup table and internal buffer without embedded memory, which reduces hardware size and can be implemented in general digital logic fabrication process without ROM or RAM fabrication process. Designed entropy decoder was embedded in H.264/AVC video decoder, and it was verified to operate correctly in the system. Synthesized in TSMC 90nm fabrication process, its maximum operation frequency is 125MHz. It supports QCIF, CIF, and QVGA image format. Under slight modification of nC register and other blocks, it also support VGA image format.

Accurate and efficient GPU ray-casting algorithm for volume rendering of unstructured grid data

  • Gu, Gibeom;Kim, Duksu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.608-618
    • /
    • 2020
  • We present a novel GPU-based ray-casting algorithm for volume rendering of unstructured grid data. Our volume rendering system uses a ray-casting method that guarantees accurate rendering results. We also employ the per-pixel intersection list concept in the Bunyk algorithm to guarantee an accurate result for non-convex meshes. For efficient memory access for the lists on the GPU, we represent the intersection lists for all faces as an array with our novel construction algorithm. With the intersection lists, we perform ray-casting on a GPU, and a GPU thread handles each ray. To increase ray-coherency in a thread block and improve memory access efficiency, we extend a prior image-tile-based work distribution method to fit modern GPU architectures. We also show that a prior approach using a per-thread local buffer to reduce redundant computation is not appropriate for modern GPU architectures. Instead, we take an on-demand calculation strategy that achieves better performance even though it allows duplicate computations. We applied our method to three unstructured grid datasets with different characteristics. With a GPU, our method achieved up to 36.5 times higher performance for the ray-casting process and 19.7 times higher performance for the whole volume rendering process compared with the Bunyk algorithm using a CPU core. Also, our approach showed up to 8.2 times higher performance than a GPU-based cell projection method while generating more accurate rendering results. These results demonstrate the efficiency and accuracy of our method.

Low-power VLSI Architecture Design for Image Scaler and Coefficients Optimization (영상 스케일러의 저전력 VLSI 구조 설계 및 계수 최적화)

  • Han, Jae-Young;Lee, Seong-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.6
    • /
    • pp.22-34
    • /
    • 2010
  • Existing image scalers generally adopt simple interpolation methods such as bilinear method to take cost-benefit, or highly complex architectures to achieve high quality resulting images. However, demands for a low power, low cost, and high performance image scaler become more important because of emerging high quality mobile contents. In this paper we propose the novel low power hardware architecture for a high quality raster scan image scaler. The proposed scaler architecture enhances the existing cubic interpolation look-up table architecture by reducing and optimizing memory access and hardware components. The input data buffer of existing image scaler is replaced with line memories to reduce the number of memory access that is critical to power consumption. The cubic interpolation formula used in existing look-up table architecture is also rearranged to reduce the number of the multipliers and look-up table size. Finally we analyze the optimized parameter sets of look-up table, which is a trade-off between quality of result image and hardware size.

A Low Cost Instruction Set for Bit Stream Process (비트열 처리를 위한 저비용 명령어 세트)

  • Ham, Dong-Hyeon;Lee, Hyoung-Pyo;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.2
    • /
    • pp.41-47
    • /
    • 2008
  • Most of media compression CODECs adopts the variable length coding method. This paper proposes special registers and instruction set for bit stream process in order to accelerate the decoding process of the variable length code. The instruction set shares the conventional data path to minimize additional costs. And bit stream is read from the memory instead of the special port. Therefore the instruction set minimizes the change of the processor, and is adopted without any additional input controller and buffer, and accelerate decoding process of variable length code. The data path of the instruction set needs additional 65 bits memory and 344 equivalent gates, 0.19 ns delay under TSMC $0.25{\mu}m$ technology. The instruction set reduced the execution time of the variable length code decoding process in H.264/AVC by about 55%.

Cache Coherency Schemes for Database Sharing Systems with Primary Copy Authority (주사본 권한을 지원하는 공유 데이터베이스 시스템을 위한 캐쉬 일관성 기법)

  • Kim, Shin-Hee;Cho, Haeng-Rae;Kim, Byeong-Uk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1390-1403
    • /
    • 1998
  • Database sharing system (DSS) refers to a system for high performance transaction processing. In DSS, the processing nodes are locally coupled via a high speed network and share a common database at the disk level. Each node has a local memory, a separate copy of operating system, and a DB'\fS. To reduce the number of disk accesses, the node caches database pages in its local memory buffer. However, since multiple nodes may be simultaneously cached a page, cache consistency must be cnsured so that every node can always access the'latest version of pages. In this paper, we propose efficient cache consistency schemes in DSS, where the database is logically partitioned using primary copy authority to reduce locking overhead, The proposed schemes can improve performance by reducing the disk access overhead and the message overhead due to maintaining cache consistency, Furthermore, they can show good performance when database workloads are varied dynamically.

  • PDF

Delayed Dual Buffering: Reducing Page Fault Latency in Demand Paging for OneNAND Flash Memory (지연 이중 버퍼링: OneNAND 플래시를 이용한 페이지 반입 비용 절감 기법)

  • Joo, Yong-Soo;Park, Jae-Hyun;Chung, Sung-Woo;Chung, Eui-Young;Chang, Nae-Hyuck
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.3 s.357
    • /
    • pp.43-51
    • /
    • 2007
  • OneNAND flash combines the advantages of NAND and NOR flash, and has become an alternative to the former. But the advanced features of OneNAND flash are not utilized effectively in demand paging systems designed for NAND flash. We propose delayed dual buffering, a demand paging system which fully exploits the random-access I/O interface and dual page buffers of OneNAND flash demand paging system. It effectively reduces the time of page transfer from the OneNAND page buffer to the main memory. On average, it achieves and 28.5% reduction in execution time and 4.4% reduction in paging system energy consumption.

A New Architecture of High-Performance Digital Hologram Generator based on Independent Calculation of a Holographic Pixel (독립적 홀로그램 화소 연산 방식의 고성능 디지털 홀로그램 생성기의 하드웨어 구조)

  • Lee, Yoon-Huyk;Seo, Young-Ho;Choi, Hyun-Jun;Kim, Dong-Wook
    • Journal of Broadcast Engineering
    • /
    • v.16 no.3
    • /
    • pp.403-415
    • /
    • 2011
  • In this paper, we proposed a hardware architecture to generate digital holograms at high speed. It used the modified computer-generated hologram (CGH) algorithm and adapted the pipeline-based hardware to be able to remove memory bottleneck problem. It uses not the method which generates a hologram by accumulating intermittent holograms but the one which independently generates a pixel of a final hologram and uses the appropriate CGH algorithm for the selected method. Based on the CGH algorithm we proposed the architecture of the digital hologram generator which consists of input interface part, calculating part, and normalizing part. The hardware can decrease memory usage because it repeatedly use object light sources which is stored in the internal buffer. It is also operationally parallelized by vertically adding unit cells. It can generate 86 frames of HD digital hologram per 1 second for 1K light sources.

Performance Analysis of Tree-based Indexing Scheme for Trajectories Processing of Moving Objects (이동객체의 궤적처리를 위한 트리기반 색인기법의 성능분석)

  • Shim, Choon-Bo;Shin, Yong-Won
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.7 no.4
    • /
    • pp.1-14
    • /
    • 2004
  • In this study, we propose Linktable based on extended TB-Tree(LTB-Tree) which can improve the performance of existing TB (Trajectory-Bundle)-tree proposed for indexing the trajectory of moving objects in GIS Applications. In addition, in order to evaluate proposed indexing scheme, we take into account as follows. At first, we select existing R*-tree, TB-tree, and LTB-tree as the subject of performance evaluation. Secondly, we make use of random data set and real data set as experimental data. Thirdly, we evaluate the performance with respect to the variation of size of memory buffer by considering the restriction of available memory of a given system. Fourth, we test them by using the experimental data set with a variation of data distribution. Finally, we think over insertion and retrieval performance of trajectory query and range query as experimental measures. The experimental results show that the proposed indexing scheme, LTB-tree, gains better performance than traditional other schemes with respect to the insertion and retrieval of trajectory query.

  • PDF

A Cache Consistency Control for B-Tree Indices in a Database Sharing System (데이타베이스 공유 시스템에서 B-트리 인덱스를 위한 캐쉬 일관성 제어)

  • On, Gyeong-O;Jo, Haeng-Rae
    • The KIPS Transactions:PartD
    • /
    • v.8D no.5
    • /
    • pp.593-604
    • /
    • 2001
  • A database sharing system (DSS) refers to a system for high performance transaction processing. In the DSS, the processing nodes are coupled via a high speed network and share a common database at the disk level. Each node has a local memory and a separate copy of operating system. To reduce the number of disk accesses, the node caches data pages and index pages in its memory buffer. In general, B-tree index pages are accessed more often and thus cached at more processing nodes, than their corresponding data pages. There are also complicated operations in the B-tree such as Fetch, Fetch Next, Insertion and Deletion. Therefore, an efficient cache consistency scheme supporting high level concurrency is required. In this paper, we propose cache consistency schemes using identifiers of index pages and page_LSN of leaf page. The propose schemes can improve the system throughput by reducing the required message traffic between nodes and index re-traversal.

  • PDF

A 8192-point pipelined FFT/IFFT processor using two-step convergent block floating-point scaling technique (2단계 수렴 블록 부동점 스케일링 기법을 이용한 8192점 파이프라인 FFT/IFFT 프로세서)

  • 이승기;양대성;신경욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.10C
    • /
    • pp.963-972
    • /
    • 2002
  • An 8192-point pipelined FFT/IFFT processor core is designed, which can be used in multi-carrier modulation systems such as DUf-based VDSL modem and OFDM-based DVB system. In order to improve the signal-to-quantization-noise ratio (SQNR) of FFT/IFFT results, two-step convergent block floating-point (TS_CBFP) scaling is employed. Since the proposed TS_CBFP scaling does not require additional buffer memory, it reduces memory as much as about 80% when compared with conventional CBFP methods, resulting in area-and power-efficient implementation. The SQNR of about 60-㏈ is achieved with 10-bit input, 14-bit internal data and twiddle factors, and 16-bit output. The core synthesized using 0.25-$\mu\textrm{m}$ CMOS library has about 76,300 gates, 390K bits RAM, and twiddle factor ROM of 39K bits. Simulation results show that it can safely operate up to 50-㎒ clock frequency at 2.5-V supply, resulting that a 8192-point FFT/IFFT can be computed every 164-${\mu}\textrm{s}$. It was verified by Xilinx FPGA implementation.