• Title/Summary/Keyword: 곱셈적 구조

Search Result 227, Processing Time 0.025 seconds

The Design of Transform and Quantization Hardware for High-Performance HEVC Encoder (고성능 HEVC 부호기를 위한 변환양자화기 하드웨어 설계)

  • Park, Seungyong;Jo, Heungseon;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.2
    • /
    • pp.327-334
    • /
    • 2016
  • In this paper, we propose a hardware architecture of transform and quantization for high-perfornamce HEVC(High Efficiency VIdeo Coding) encoder. HEVC transform decides the transform mode by comparing RDCost to search for the best mode of them. But, RDCost is computed using the bit-rate and distortion which is computed by transform, quantization, de-quantization, and inverse transform. Due to the many calculations and encoding time, it is hard to process high resolution and high definition image in real-time. This paper proposes the method of transform mode decision by comparing sum of coefficient after transform only. We use BD-PSNR and BD-Bitrate which is performance indicator. Based on the experimental result, We confirmed that the decision of transform mode can process images with no significant change in the image quality. We reduced hardware area by assigning different values at the same output according to the transform mode and overlapping coefficient multiplied as much as possible. Also, we raise performance by implementing sequential pipeline operation. In view of the larger process that we used compared with the process of reference paper, Our design has reduced by half the hardware area and has increased performance 2.3 times.

An Efficient Array Algorithm for VLSI Implementation of Vector-radix 2-D Fast Discrete Cosine Transform (Vector-radix 2차원 고속 DCT의 VLSI 구현을 위한 효율적인 어레이 알고리듬)

  • 신경욱;전흥우;강용섬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1970-1982
    • /
    • 1993
  • This paper describes an efficient array algorithm for parallel computation of vector-radix two-dimensional (2-D) fast discrete cosine transform (VR-FCT), and its VLSI implementation. By mapping the 2-D VR-FCT onto a 2-D array of processing elements (PEs), the butterfly structure of the VR-FCT can be efficiently importanted with high concurrency and local communication geometry. The proposed array algorithm features architectural modularity, regularity and locality, so that it is very suitable for VLSI realization. Also, no transposition memory is required, which is invitable in the conventional row-column decomposition approach. It has the time complexity of O(N+Nnzp-log2N) for (N*N) 2-D DCT, where Nnzd is the number of non-zero digits in canonic-signed digit(CSD) code, By adopting the CSD arithmetic in circuit desine, the number of addition is reduced by about 30%, as compared to the 2`s complement arithmetic. The computational accuracy analysis for finite wordlength processing is presented. From simulation result, it is estimated that (8*8) 2-D DCT (with Nnzp=4) can be computed in about 0.88 sec at 50 MHz clock frequency, resulting in the throughput rate of about 72 Mega pixels per second.

  • PDF

Design of Partial Product Accumulator using Multi-Operand Decimal CSA and Improved Decimal CLA (다중 피연산자 십진 CSA와 개선된 십진 CLA를 이용한 부분곱 누산기 설계)

  • Lee, Yang;Park, TaeShin;Kim, Kanghee;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.56-65
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

Implementation of Hardware Data Prefetcher Adaptable for Various State-of-the-Art Workload (다양한 최신 워크로드에 적용 가능한 하드웨어 데이터 프리페처 구현)

  • Kim, KangHee;Park, TaeShin;Song, KyungHwan;Yoon, DongSung;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.20-35
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

Low-power Lattice Wave Digital Filter Design Using CPL (CPL을 이용한 저전력 격자 웨이브 디지털 필터의 설계)

  • 김대연;이영중;정진균;정항근
    • Journal of the Korean Institute of Telematics and Electronics D
    • /
    • v.35D no.10
    • /
    • pp.39-50
    • /
    • 1998
  • Wide-band sharp-transition filters are widely used in applications such as wireless CODEC design or medical systems. Since these filters suffer from large sensitivity and roundoff noise, large word-length is required for the VLSI implementation, which increases the hardware size and the power consumption of the chip. In this paper, a low-power implementation technique for digital filters with wide-band sharp-transition characteristics is proposed using CPL (Complementary Pass-Transistor Logic), LWDF (Lattice Wave Digital Filter) and a modified DIFIR (Decomposed & Interpolated FIR) algorithm. To reduce the short-circuit current component in CPL circuits due to threshold voltage reduction through the pass transistor, three different approaches can be used: cross-coupled PMOS latch, PMOS body biasing and weak PMOS latch. Of the three, the cross-coupled PMOS latch approach is the most realistic solution when the noise margin as well as the energy-delay product is considered. To optimize CPL transistor size with insight, the empirical formulas for the delay and energy consumption in the basic structure of CPL circuits were derived from the simulation results. In addition, the filter coefficients are encoded using CSD (Canonic Signed Digit) format and optimized by a coefficient quantization program. The hardware cost is minimized further by a modified DIFIR algorithm. Simulation result shows that the proposed method can achieve about 38% reductions in power consumption compared with the conventional method.

  • PDF

Implementation of Analog Signal Processing ASIC for Vibratory Angular Velocity Detection Sensor (진동형 각속도 검출 센서를 위한 애널로그 신호처리 ASIC의 구현)

  • 김청월;이병렬;이상우;최준혁
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.4
    • /
    • pp.65-73
    • /
    • 2003
  • This paper presents the implementation of an analog signal-processing ASIS to detect an angular velocity signal from a vibrator angular velocity detection sensor. The output of the sensor to be charge appeared as the variation of the capacitance value in the structure of the sensor was detected using charge amplifiers and a self oscillation circuit for driving the sensor was implemented with a sinusoidal self oscillation circuit using the resonance characteristics of the sensor. Specially an automatic gain control circuit was utilized to prevent the deterioration of self-oscillation characteristics due to the external elements such as the characteristic variation of the sensor process and the temperature variation. The angular velocity signal, amplitude-mod)Hated in the operation characteristics of the sensor, was demodulated using a synchronous detection circuit. A switching multiplication circuit was used in the synchronous detection circuit to prevent the magnitude variation of detected signal caused by the amplitude variation of the carrier signal. The ASIC was designed and implemented using 0.5${\mu}{\textrm}{m}$ CMOS process. The chip size was 1.2mm x 1mm. In the experiment under the supply voltage of 3V, the ASIC consumed the supply current of 3.6mA and noise spectrum density from dc to 50Hz was in the range of -95 dBrms/√Hz and -100 dBrms/√Hz when the ASIC, coupled with the sensor, was in normal operation.

Exploring fraction knowledge of the stage 3 students in proportion problem solving (단위 조정 3단계 학생의 비례 문제 해결에서 나타나는 분수 지식)

  • Lee, Jin Ah;Lee, Soo Jin
    • The Mathematical Education
    • /
    • v.61 no.1
    • /
    • pp.1-28
    • /
    • 2022
  • The purpose of this study is to explore how students' fractional knowledge is related to their solving of proportion problems. To this end, 28 clinical interviews with four middle-grade students, each lasting about 30~50 minutes, were carried out from May 2021 to August 2021. The present study focuses on two 7th grade students who exhibited their ability to coordinate three levels of units prior to solving whole number problems. Although the students showed interiorization of three levels of units in solving whole number problems, how they coordinated three levels of units were different in solving proportion problems depending on whether the problems required reasoning with whole numbers or fractions. The students could coordinate three levels of units prior to solving the problems involving whole numbers, they coordinated three levels of units in activity for the problems involving fractions. In particular, the ways the two students employed partitioning operations and how they coordinated quantitative unit structures were different in solving proportion problems involving improper fractions. The study contributes to the field by adding empirical data corroborating the hypotheses that students' ability to transform one three levels of units structure into another one may not only be related to their interiorization of recursive partitioning operations, but it is an important foundation for their construction of splitting operations for composite units.