• Title/Summary/Keyword: Hardware Compression

Search Result 194, Processing Time 0.055 seconds

Two-dimensional DCT arcitecture for imprecise computation model (중간 결과값 연산 모델을 위한 2차원 DCT 구조)

  • 임강빈;정진군;신준호;최경희;정기현
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.9
    • /
    • pp.22-32
    • /
    • 1997
  • This paper proposes an imprecise compuitation model for DCT considering QOS of images and a two dimensional DCT architecture for imprecise computations. In case that many processes are scheduling in a hard real time system, the system resources are shared among them. Thus all processes can not be allocated enough system resources (such as processing power and communication bandwidth). The imprecise computtion model can be used to provide scheduling flexibility and various QOS(quality of service)levels, to enhance fault tolerance, and to ensure service continuity in rela time systems. The DCT(discrete cosine transform) is known as one of popular image data compression techniques and adopted in JPEG and MPEG algorithms since the DCT can remove the spatial redundancy of 2-D image data efficiently. Even though many commercial data compression VLSI chips include the DCST hardware, the DCT computation is still a very time-consuming process and a lot of hardware resources are required for the DCT implementation. In this paper the DCT procedure is re-analyzed to fit to imprecise computation model. The test image is simulated on teh base of this model, and the computation time and the quality of restored image are studied. The row-column algorithm is used ot fit the proposed imprecise computation DCT which supports pipeline operatiions by pixel unit, various QOS levels and low speed stroage devices. The architecture has reduced I/O bandwidth which could make its implementation feasible in VLSI. The architecture is proved using a VHDL simulator in architecture level.

  • PDF

An efficient Hardware Architecture of Lempel-Ziv Compressor for Real Time Data Compression (실시간 데이터 압축을 위한 Lempel-Ziv 압축기의 효과적인 구조의 제안)

  • 진용선;정정화
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.3
    • /
    • pp.37-44
    • /
    • 2000
  • In this paper, an efficient hardware architecture of Lempel-Ziv compressor for real time data compression is proposed. The accumulated shift operations in the Lempel-Ziv algorithm are the major problem, because many shift operations are needed to prepare a dictionary buffer and matching symbols. A new efficient architecture for the fast processing of Lempel-Ziv algorithm is presented in this paper. In this architecture, the optimization technique for dictionary size, a new comparing method of multi symbol and a rotational FIFO structure are used to control shift operations easily. For the functional verification, this architecture was modeled by C programming language, and its operation was verified by running on commercial DSP processor. Also, the design of overall architecture in VHDL was synthesized on commercial FPGA chip. The result of critical path analysis shows that this architecture runs well at the input bit rate of 256kbps with 33MHz clock frequency.

  • PDF

Structured Pruning for Efficient Transformer Model compression (효율적인 Transformer 모델 경량화를 위한 구조화된 프루닝)

  • Eunji Yoo;Youngjoo Lee
    • Transactions on Semiconductor Engineering
    • /
    • v.1 no.1
    • /
    • pp.23-30
    • /
    • 2023
  • With the recent development of Generative AI technology by IT giants, the size of the transformer model is increasing exponentially over trillion won. In order to continuously enable these AI services, it is essential to reduce the weight of the model. In this paper, we find a hardware-friendly structured pruning pattern and propose a lightweight method of the transformer model. Since compression proceeds by utilizing the characteristics of the model algorithm, the size of the model can be reduced and performance can be maintained as much as possible. Experiments show that the structured pruning proposed when pruning GPT-2 and BERT language models shows almost similar performance to fine-grained pruning even in highly sparse regions. This approach reduces model parameters by 80% and allows hardware acceleration in structured form with 0.003% accuracy loss compared to fine-tuned pruning.

VLSI Design of DWT-based Image Processor for Real-Time Image Compression and Reconstruction System (실시간 영상압축과 복원시스템을 위한 DWT기반의 영상처리 프로세서의 VLSI 설계)

  • Seo, Young-Ho;Kim, Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.102-110
    • /
    • 2004
  • In this paper, we propose a VLSI structure of real-time image compression and reconstruction processor using 2-D discrete wavelet transform and implement into a hardware which use minimal hardware resource using ASIC library. In the implemented hardware, Data path part consists of the DWT kernel for the wavelet transform and inverse transform, quantizer/dequantizer, the huffman encoder/huffman decoder, the adder/buffer for the inverse wavelet transform, and the interface modules for input/output. Control part consists of the programming register, the controller which decodes the instructions and generates the control signals, and the status register for indicating the internal state into the external of circuit. According to the programming condition, the designed circuit has the various selective output formats which are wavelet coefficient, quantization coefficient or index, and Huffman code in image compression mode, and Huffman decoding result, reconstructed quantization coefficient, and reconstructed wavelet coefficient in image reconstructed mode. The programming register has 16 stages and one instruction can be used for a horizontal(or vertical) filtering in a level. Since each register automatically operated in the right order, 4-level discrete wavelet transform can be executed by a programming. We synthesized the designed circuit with synthesis library of Hynix 0.35um CMOS fabrication using the synthesis tool, Synopsys and extracted the gate-level netlist. From the netlist, timing information was extracted using Vela tool. We executed the timing simulation with the extracted netlist and timing information using NC-Verilog tool. Also PNR and layout process was executed using Apollo tool. The Implemented hardware has about 50,000 gate sizes and stably operates in 80MHz clock frequency.

A hardware design of Rate control algorithm for H.264 (H.264 율제어 알고리듬의 하드웨어 설계)

  • Suh, Ki-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.1
    • /
    • pp.175-181
    • /
    • 2010
  • In this paper, we propose a novel hardware architecture for Rate control module for real time full HD video compression. In the proposed architecture, QP is updated by using the rate control algorithm to every the macroblock line(120MB for Full HD, 20MB for CIF image). Since there are many complex arithmetic and floating point arithmetic in rate control algorithm of JM for H.264, it is impossible to process the rate control algorithm using the integer arithmetic CPU core. So we adopted floating point arithmetic unit in our architecture, and implemented the rate control algorithm using the floating unit. With this implemented hardware, the implemented hardware is verified to be operated in real time.

A Study on Image Data Compression by using Hadamard Transform (Hadamard변환을 이용한 영상신호의 전송량 압축에 관한 연구)

  • 박주용;이문호;김동용;이광재
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.4
    • /
    • pp.251-258
    • /
    • 1986
  • There is much redundancy in image data such as TV signals and many techniques to redice it have been studied. In this paper, Hadamard transform is studied through computer simulation and experimental model. Each element of hadamard matrix is either +1 or -1, and the row vectors are orthogonal to another. Its hardware implementation is the simplest of the usual orthogonal transforms because addition and sulbraction are necessary to calculate transformed signals, while not only addition but multiplication are necessary in digital Fourier transform, etc. Linclon data (64$ imes$64) are simulated using 8th-order and 16th-order Hadamard transform, and 8th-order is implemented to hardware. Theoretical calculation and experimental result of 8th-order show that 2.0 bits/sample are required for good quality.

  • PDF

A Study on FPGA utilization For PC-based Full-HD DVR System Implementation (Full-HD급 PC기반 DVR System 구현을 위한 FPGA 활용에 관한 연구)

  • Kim, Ki-Hwa
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.4
    • /
    • pp.2363-2369
    • /
    • 2014
  • The DVR system supports multiple cameras and should be able to receive images at 30 frames per channel in real time. Thus, The system is using Full-HD-grade Multiplexer and Hardware compression codec. In this paper, Describing the design and implementation for the 4-channel Full-HD-grade PC-based DVR using FPGA and GPU inside CPU without Multiplexer and Hardware codec. The existing DVR system for Full-HD-grade has drawbacks to acquire images of about only 20 frames per channel in real time. The system to acquire images of multiple channel in real time was designed using FPGA. The software for the system was implemented using Intel Media SDK. At the result of performance evaluation, It was satisfied all for the required conditions. The practicality of the system was confirmed as implementation the system without using hardware compression.

An Efficient 2D Discrete Wavelet Transform Filter Design Using Lattice Structure (Lattice 구조를 갖는 효율적인 2차원 이산 웨이블렛 변환 필터 설계)

  • Park, Tae-Geun;Jeong, Seon-Gyeong
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.6
    • /
    • pp.59-68
    • /
    • 2002
  • In this paper, we design the two-dimensional Discrete Wavelet Transform (2D DWT) filter that is widely used in various applications such as image compression because it has no blocking effects and relatively high compression rate. The filter that we used here is two-channel four-taps QMF(Quadrature Mirror Filter) Lattice filter with PR (Perfect Reconstruction) property. The proposed DWT architecture, with two consecutive inputs shows an efficient performance with a minimum of such hardware resources as multipliers, adders, and registers due to a simple scheduling. The proposed architecture was verified by the RTL simulation, and utilizes the hardware 100%. Our architecture shows a relatively high performance with a minimum hardware when compared with other approaches. An efficient memory mapping and address generation techniques are introduced and the fixed-point arithmetic analysis for minimizing the PSNR degradation due to quantization is discussed.

An Efficient Test Data Compression/Decompression for Low Power Testing (저전력 테스트를 고려한 효율적인 테스트 데이터 압축 방법)

  • Chun Sunghoon;Im Jung-Bin;Kim Gun-Bae;An Jin-Ho;Kang Sungho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.2 s.332
    • /
    • pp.73-82
    • /
    • 2005
  • Test data volume and power consumption for scan vectors are two major problems in system-on-a-chip testing. Therefore, this paper proposes a new test data compression/decompression method for low power testing. The method is based on analyzing the factors that influence test parameters: compression ratio, power reduction and hardware overhead. To improve the compression ratio and the power reduction ratio, the proposed method is based on Modified Statistical Coding (MSC), Input Reduction (IR) scheme and the algorithms of reordering scan flip-flops and reordering test pattern sequence in a preprocessing step. Unlike previous approaches using the CSR architecture, the proposed method is to compress original test data, not $T_{diff}$, and decompress the compressed test data without the CSR architecture. Therefore, the proposed method leads to better compression ratio with lower hardware overhead and lower power consumption than previous works. An experimental comparison on ISCAS '89 benchmark circuits validates the proposed method.

Medical Image Compression Using JPEG International Standard (JPEG 표준안을 이용한 의료 영상 압축)

  • Ahn, Chang-Beom;Han, Sang-Woo;Kim, Il-Yoen
    • Proceedings of the KIEE Conference
    • /
    • 1993.07a
    • /
    • pp.504-506
    • /
    • 1993
  • The Joint Photographic Experts Group (JPEG) standard was proposed by the International Standardization Organization (ISO/SC 29/WG 10) and the CCITT SG VIII as an international standard for digital continuous-tone still image compression. The JPEG standard has been widely accepted in electronic imaging, computer graphics, and multi-media applications, however, due to the lossy character of the JPEG compression its application in the field of medical imaging has been limited. In this paper, the JPEG standard was applied to a series of head sections of magnetic resonance (MR) images (256 gray levels, $256{\times}256$ size) and its performance was investigated. For this purpose, DCT-based sequential mode of the JPEG standard was implemented using the CL550 compression chip and progressive and lossless coding was implemented by software without additional hardware. From the experiment, it appears that the compression ratio of about 10 to 20 was obtained for the MR images without noticeable distortion. It is also noted that the error signal between the reconstructed image by the JPEG and the original image was nearly random noise without causing any special-pattern-related artifact. Although the coding efficiency of the progressive and hierarchical coding is identical to that of the sequential coding in compression ratio and SNR, it has useful features In fast search of patient Image from huge image data base and in remote diagnosis through slow public communication channel.

  • PDF