Search | Korea Science

Design of 1-D DCT processor using a new efficient computation sharing multiplier (새로운 연산 공유 승산기를 이용한 1차원 DCT 프로세서의 설계)

Lee, Tae-Wook;Cho, Sang-Bock
- The KIPS Transactions:PartA
- /
- v.10A no.4
- /
- pp.347-356
- /
- 2003
The OCT algorithm needs efficient hardware architecture to compute inner product. The conventional methods have large hardware complexity. Because of this reason. a computation sharing multiplier was proposed for implementing inner product. However, the existing multiplier has inefficient hardware architecture in precomputer and select units. Therefore it degrades the performance of the multiplier. In this paper, we proposed a new efficient computation sharing multiplier and applied it to implementation of 1-D DCT processor. The comparison results show that the new multiplier is more efficient than an old one when hardware architectures and logic synthesis results were compared. The designed 1-D DCT processor by using the proposed multiplier is more high performance than typical design methods.
https://doi.org/10.3745/KIPSTA.2003.10A.4.347 인용 PDF KSCI

An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder (HEVC 복호기의 연산 복잡도 감소를 위한 화면내 예측 하드웨어 구조 설계)

Jung, Hongkyun;Ryoo, Kwangki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.17 no.5
- /
- pp.1203-1212
- /
- 2013
In this paper, an intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder. The architecture uses shared operation units and common operation units and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter. The shared operation unit shares adders processing common equations to remove the computational redundancy. The unit computes an average value in DC mode for reducing the number of execution cycles in DC mode. In order to reduce operation units, the common operation unit uses one operation unit generating predicted pixels and filtered pixels in all prediction modes. In order to reduce processing time and operators, the decision algorithm uses only bit-comparators and the fast algorithm uses LUT instead of multiplication operators. The proposed architecture using four shared operation units and eight common operation units which can reduce execution cycles of intra prediction. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency are 40.5k and 164MHz, respectively. As the result of measuring the performance of the proposed architecture using the extracted data from HM 7.1, the execution cycle of the architecture is about 93.7% less than the previous design.
https://doi.org/10.6109/jkiice.2013.17.5.1203 인용 PDF KSCI

An Efficient Hardware Design of Intra Predictor for High Performance HEVC Decoder (고성능 HEVC 복호기를 위한 화면내 예측기의 효율적인 하드웨어 설계)

Jung, Hongkyun;Kang, Sukmin;Ryoo, Kwangki
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.11a
- /
- pp.668-671
- /
- 2012
본 논문에서는 차세대 비디오 압축 표준인 HEVC(High Efficiency Video Coding) 복호기의 연산량과 하드웨어 면적을 감소시키기 위하여 화면내 예측 하드웨어 구조를 제안한다. 제안하는 하드웨어 구조는 공통 수식에 대한 연산을 공유하는 공유 연산기를 사용하여 연산량 및 연산기 개수를 감소시키고, $4{\times}4$ PU와 $64{\times}64$ PU의 필터링 수행 여부에 대한 연산을 수행하지 않고 나머지 PU에 대해서는 LUT를 이용하여 연산을 수행하기 때문에 연산량 및 연산 시간을 감소시킨다. 또한 하나의 공통 연산기만을 사용하여 예측 픽셀을 생성하기 때문에 하드웨어 면적이 감소한다. 제안하는 구조를 TSMC 0.18um 공정을 이용하여 합성한 결과 최대 동작 주파수는 100MHz이고, 게이트 수는 140,697이다. $4{\times}4$ PU를 기준으로 제안하는 구조의 처리 사이클 수는 11 사이클로 기존 구조 대비 54% 감소하였고, 16개 참조 픽셀의 필터링 처리를 기준으로 제안하는 구조의 덧셈 연산기 개수는 37개로 표준 draft 6에 비해 22.9% 감소하였다.
https://doi.org/10.3745/PKIPS.y2012m11a.668 인용 PDF

An ASIC implementation of Phasor Measurement Unit based on Sliding-DFT (순환 DFT에 기초한 페이저 연산 장치의 ASIC 구현)

김종윤;김석훈;장태규;김재화
- Proceedings of the IEEK Conference
- /
- 2001.06d
- /
- pp.143-146
- /
- 2001
본 논문에서는 다 채널 페이저 연산 장치를 전용하드웨어로 구현하기 위한 설계 구조에 대하여 제시하였으며, 이를 연산량이 많은 곱셈기를 시분할에 의해 공유하는 구조를 제시하였다. 또한 페이저 측정을 위한 Sliding-DFT 알고리즘을 순환 구현할 경우의 근사구현 오차에 관한 정량적인 연구를 수행하였다. 이러한 오차 영향의 해석을 기반으로 하여 곱셈기 공유 구조를 적용한 페이저 연산 장치를 설계하고, 설계한 하드웨어의 내부동작을 보여주는 시뮬레이션을 통해 설계의 정확성을 확인하였다
PDF

Adder-based Distributed Arithmetic DWT Processor Design (가산기-기반 분산연산 DWT 프로세서 설계)

김영진;장영진;이현수
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04a
- /
- pp.16-18
- /
- 2001
DWT(Discrete Wavelet Transform) 연산을 하는데 있어서, 가장 많은 연산을 수행하는 부분은 계수(Coefficient)값과 입력값의 내적 연산을 하는 부분이다. 내적 연산을 효율적으로 줄이기 위해서 시스톨릭, 파이프라인, 병렬구조등이 연구되었으나, 이러한 기존의 방법들은 내적 연산에 들어가는 곱셈의 수는 줄이지 못했다. 본 연구에서 가산기 기반 분산연산을 이용하여 곱셈연산을 제거하고, 동일한 연산과정을 공유함으로써 가산기의 수를 최대한 줄일 수 있었다. 또한, 한 개의 1-레벨 분해 모듈을 재사용하기 위해서 스케줄링을 사용하였다. 그 결과 기존의 구조보다 게이트 수를 50%이상 줄일 수 있었으며, 속도의 향상을 얻을 수 있었다.

Design of SIMD-DSP/PPU for a High-Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서를 위한 SIMD-DSP/FPU의 설계)

정우경;홍인표;이용주;이용석
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.27 no.4C
- /
- pp.388-397
- /
- 2002
We designed a SIMD-DSP/FPU that can efficiently improve multimedia processing performance when integrated into high-performance embedded microprocessors. We proposed partitioned architectures and new schemes for several functional units to reduce chip area. Sharing functional units reduces the area of FPU significantly. The proposed architecture is modeled in HDL and synthesized with a 0.35$\mu\textrm{m}$ standard cell library. The chip area is estimated to be about 100,000 equivalent gates. The designed unit can run at higher than 50MHz clock frequency of CPU core under the worst-case operating conditions.
PDF KSCI

Development of an FPGA-based mum-channel phase measurement system (FPGA기반 다채널 위상 측정 시스템 개발)

정선용;안병선;최원섭;장태규
- Proceedings of the IEEK Conference
- /
- 2003.07e
- /
- pp.2160-2163
- /
- 2003
본 논문에서는 FPGA를 기반으로 하는 DFT 연산알고리즘을 적용한 다채널 위상 및 HDR(Harmonic Distortion Ratio) 측정 시스템을 설계하였다. DFT 연산 알고리즘은 많은 연산량이 요구되는데, 기존에는 고가의 DSP 프로세서를 사용하여 소프트웨어적으로 처리하였지만, FPGA를 기반으로 하는 전용의 하드웨어로 구현할 경우 DSP의 연산량에 대한 부담을 감소시킬 수 있다. DFT 연산 알고리즘은 전용 ASIC으로 구현 시 경제성을 고려하기 위해서 곱셈기 공유 구조를 적용하고, 효과적인 시스템 Integration울 위해서 범용인터페이스 방식을 채택하고 이렇게 설계한 시스템을 실제 다채널 톤 신호를 입력으로 하는 동작 시험을 통하여 검증하였다.
PDF

A New Arithmetic Unit Over GF(2$^{m}$ ) for Low-Area Elliptic Curve Cryptographic Processor (저 면적 타원곡선 암호프로세서를 위한 GF(2$^{m}$ )상의 새로운 산술 연산기)

김창훈;권순학;홍춘표
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.28 no.7A
- /
- pp.547-556
- /
- 2003
This paper proposes a novel arithmetic unit over GF(2$^{m}$ ) for low-area elliptic curve cryptographic processor. The proposed arithmetic unit, which is linear feed back shift register (LFSR) architecture, is designed by using hardware sharing between the binary GCD algorithm and the most significant bit (MSB)-first multiplication scheme, and it can perform both division and multiplication in GF(2$^{m}$ ). In other word, the proposed architecture produce division results at a rate of one per 2m-1 clock cycles in division mode and multiplication results at a rate of one per m clock cycles in multiplication mode. Analysis shows that the computational delay time of the proposed architecture, for division, is less than previously proposed dividers with reduced transistor counts. In addition, since the proposed arithmetic unit does not restrict the choice of irreducible polynomials and has regularity and modularity, it provides a high flexibility and scalability with respect to the field size m. Therefore, the proposed novel architecture can be used for both division and multiplication circuit of elliptic curve cryptographic processor. Specially, it is well suited to low-area applications such as smart cards and hand held devices.
PDF KSCI

The Optimal Extraction Method of Adder Sharing Component for Inner Product and its Application to DCT Design (내적연산을 위한 가산기 공유항의 최적 추출기법 제안 및 이를 이용한 DCT 설계)

Im, Guk-Chan;Jang, Yeong-Jin;Lee, Hyeon-Su
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.38 no.7
- /
- pp.503-512
- /
- 2001
The general DSP algorithm, like orthogonal transform or filter processing, needs efficient hardware architecture to compute inner product. The typical MAC architecture has high cost of silicon. Because of this reason, the distributed arithmetic without multiplier is widely used for implementing inner product. This paper presents the optimization to reduce required hardware in distributed arithmetic by using extraction method of adder sharing component. The optimization process uses Boltzmann-machine which is one of the neural network. This proposed method can solve problem that is increasing complexity depending on depth of inner product and compose optimal summation-network with the minimum FA and FF in a few time. The designed DCT by using Proposed method is more efficient than a ROM-based distributed arithmetic.
PDF

Design of an Efficient AES-ARIA Processor using Resource Sharing Technique (자원 공유기법을 이용한 AES-ARIA 연산기의 효율적인 설계)

Koo, Bon-Seok;Ryu, Gwon-Ho;Chang, Tae-Joo;Lee, Sang-Jin
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.18 no.6A
- /
- pp.39-49
- /
- 2008
AEA and ARIA are next generation standard block cipher of US and Korea, respectively, and these algorithms are used in various fields including smart cards, electronic passport, and etc. This paper addresses the first efficient unified hardware architecture of AES and ARIA, and shows the implementation results with 0.25um CMOS library. We designed shared S-boxes based on composite filed arithmetic for both algorithms, and also extracted common terms of the permutation matrices of both algorithms. With the $0.25-{\mu}m$ CMOS technology, our processor occupies 19,056 gate counts which is 32% decreased size from discrete implementations, and it uses 11 clock cycles and 16 cycles for AES and ARIA encryption, which shows 720 and 1,047 Mbps, respectively.
https://doi.org/10.13089/JKIISC.2008.18.6A.39 인용 PDF KSCI HTML

Search Result 60, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)