• Title/Summary/Keyword: arithmetic processors

Search Result 30, Processing Time 0.029 seconds

Efficient SAD Processor for Motion Estimation of H.264 (H.264 움직임 추정을 위한 효율적인 SAD 프로세서)

  • Jang, Young-Beom;Oh, Se-Man;Kim, Bee-Chul;Yoo, Hyeon-Joong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.2 s.314
    • /
    • pp.74-81
    • /
    • 2007
  • In this paper, an efficient SAD(Sum of Absolute Differences) processor structure for motion estimation of H.264 is proposed. SAD processors are commonly used both in full search methods for motion estimation and in fast search methods for motion estimation. Proposed structure consists of SAD calculator block, combinator block, and minimum value calculator block. Especially, proposed structure is simplified by using Distributed Arithmetic for addition operation. The Verilog-HDL(Hard Description Language) coding and FPGA(Field Programmable Gate Array) implementation results for the proposed structure show 39% and 32% gate count reduction in comparison with those of the conventional structure, respectively. Due to its efficient processing scheme, the proposed SAD processor structure can be widely used in size dominant H.264 chip.

VLSI Architecture for High Speed Implementation of Elliptic Curve Cryptographic Systems (타원곡선 암호 시스템의 고속 구현을 위한 VLSI 구조)

  • Kim, Chang-Hoon
    • The KIPS Transactions:PartC
    • /
    • v.15C no.2
    • /
    • pp.133-140
    • /
    • 2008
  • In this paper, we propose a high performance elliptic curve cryptographic processor over $GF(2^{163})$. The proposed architecture is based on a modified Lopez-Dahab elliptic curve point multiplication algorithm and uses Gaussian normal basis for $GF(2^{163})$ field arithmetic. To achieve a high throughput rates, we design two new word-level arithmetic units over $GF(2^{163})$ and derive a parallelized elliptic curve point doubling and point addition algorithm with uniform addressing based on the Lopez-Dahab method. We implement our design using Xilinx XC4VLX80 FPGA device which uses 24,263 slices and has a maximum frequency of 143MHz. Our design is roughly 4.8 times faster with 2 times increased hardware complexity compared with the previous hardware implementation proposed by Shu. et. al. Therefore, the proposed elliptic curve cryptographic processor is well suited to elliptic curve cryptosystems requiring high throughput rates such as network processors and web servers.

Flexible Prime-Field Genus 2 Hyperelliptic Curve Cryptography Processor with Low Power Consumption and Uniform Power Draw

  • Ahmadi, Hamid-Reza;Afzali-Kusha, Ali;Pedram, Massoud;Mosaffa, Mahdi
    • ETRI Journal
    • /
    • v.37 no.1
    • /
    • pp.107-117
    • /
    • 2015
  • This paper presents an energy-efficient (low power) prime-field hyperelliptic curve cryptography (HECC) processor with uniform power draw. The HECC processor performs divisor scalar multiplication on the Jacobian of genus 2 hyperelliptic curves defined over prime fields for arbitrary field and curve parameters. It supports the most frequent case of divisor doubling and addition. The optimized implementation, which is synthesized in a $0.13{\mu}m$ standard CMOS technology, performs an 81-bit divisor multiplication in 503 ms consuming only $6.55{\mu}J$ of energy (average power consumption is $12.76{\mu}W$). In addition, we present a technique to make the power consumption of the HECC processor more uniform and lower the peaks of its power consumption.

MoTE-ECC Based Encryption on MSP430

  • Seo, Hwajeong;Kim, Howon
    • Journal of information and communication convergence engineering
    • /
    • v.15 no.3
    • /
    • pp.160-164
    • /
    • 2017
  • Public key cryptography (PKC) is the basic building block for the cryptography applications such as encryption, key distribution, and digital signature scheme. Among many PKC, elliptic curve cryptography (ECC) is the most widely used in IT systems. Recently, very efficient Montgomery-Twisted-Edward (MoTE)-ECC was suggested, which supports low complexity for the finite field arithmetic, group operation, and scalar multiplication. However, we cannot directly adopt the MoTE-ECC to new PKC systems since the cryptography is not fully evaluated in terms of performance on the Internet of Things (IoT) platforms, which only supports very limited computation power, energy, and storage. In this paper, we fully evaluate the MoTE-ECC implementations on the representative IoT devices (16-bit MSP processors). The implementation is highly optimized for the target platform and compared in three different factors (ROM, RAM, and execution time). The work provides good reference results for a gradual transition from legacy ECC to MoTE-ECC on emerging IoT platforms.

A fully digitized Vector Control of PMSM using 80296SA (80296SA를 이용한 영구자석 동기전동기 벡터제어의 완전 디지털화)

  • 안영식;배정용;이홍희
    • Proceedings of the KIPE Conference
    • /
    • 1998.11a
    • /
    • pp.5-8
    • /
    • 1998
  • The adaptation to vector control theory is so generalized that it is widely used for implementing the high-performance of AC machine. Nowadays, One-Chip microprocessors or DSP chips are being well-used to implement Vector Control algorithm. DSP Chip have less flexibility for memory decoding and I/O rather than One-Chip microprocessor so that is requires more additional circuit and high cost. And the past One-Chip micro processors have difficult of implementation the complex algorithm because of small memory capacity and low arithmetic performance. Therefore we implemented the vector control algorithm of PMSM(Permanent Magnetic Synchronous Motors) using 80296SA form intel , which have many features as 6M memory space, 500MHz clock frequency, including memory decoding circuit and general I/O, Special I/O(EPA, Interrupt controller, Timer/Count, PWM generator) which is proper controller for the complex algorithm or operation program requiring so much memory capacity, So in this paper we fully digitized the vector control of PMSM included SVPWM Voltage controller using the intel 80296SA

  • PDF

Low Power SAD Processor Architecture for Motion Estimation of K264 (K264 Motion Estimation용 저전력 SAD 프로세서 설계)

  • Kim, Bee-Chul;Oh, Se-Man;Yoo, Hyeon-Joong;Jang, Young-Beom
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.263-264
    • /
    • 2007
  • In this paper, an efficient SAD(Sum of Absolute Differences) processor structure for motion estimation of 0.264 is proposed. SAD processors are commonly used both in full search methods for motion estimation or in fast search methods for motion estimation. Proposed structure consists of SAD calculator block, combinator block, and minimum value calculator block. Especially, proposed structure is simplified by using Distributed Arithmetic for addition operation. The Verilog-HDL(Hard Description Language) coding and FPGA implementation results for the proposed structure show 39% and 32% gate count reduction comparison with those of the conventional structure, respectively. Due to its efficient processing scheme, the proposed SAD processor structure can be widely used in size dominant H.264 chip.

  • PDF

Parallel algorithm of global routing for general purpose associative processign system (법용 연합 처리 시스템에서의 전역배선 병렬화 기법)

  • Park, Taegeun
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.32A no.4
    • /
    • pp.93-102
    • /
    • 1995
  • This paper introduces a general purpose Associative Processor(AP) which is very efficient for search-oriented applications. The proposed architecture consists of three main functional blocks: Content-Addressable Memory(CAM) arry, row logic, and control section. The proposed AP is a Single-Instruction, Multiple-Data(SIMD) device based on a CAM core and an array of high speed processors. As an application for the proposed hardware, we present a parallel algorithm to solve a global routing problem in the layout process utilizing the processing capabilities of a rudimentary logic and the selective matching and writing capability of CAMs, along with basic algorithms such a minimum(maximum) search, less(greater) than search and parallel arithmetic. We have focused on the simultaneous minimization of the desity of the channels and the wire length by sedking a less crowded channel with shorter wire distance. We present an efficient mapping technique of the problem into the CAM structure. Experimental results on difficult examples, on randomly generated data, and on benchmark problems from MCNC are included.

  • PDF

Hardware Design of Special-Purpose Arithmetic Unit for 3-Dimensional Graphics Processor (3차원 그래픽프로세서용 특수 목적 연산장치의 하드웨어 설계)

  • Choi, Byeong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.140-142
    • /
    • 2011
  • In this paper, special purpose arithmetic unit for mobile graphics accelerator is designed. The designed processor supports six operations, such as $1/{\chi}$, $\frac{1}{{\sqrt{x}}$, $log_2x$, $2^x$, $sin(x)$, $cos(x)$. The processor adopts 2nd-order polynomial minimax approximation scheme based on IEEE floating point data format to satisfy accuracy conditions and has 5-stage pipeline structure to meet high operational rates. The SFAU processor consists of 23,000 gates and its estimated operating frequency is about 400 Mhz at operating condition of 65nm CMOS technology. Because the processor can execute all operations with 5-stage pipeline scheme, it has about 400 MOPS(million operations per second) execution rate. Thus, it can be applicable to the 3D mobile graphics processors.

  • PDF

Design of Programmable and Configurable Elliptic Curve Cryptosystem Coprocessor (재구성 가능한 타원 곡선 암호화 프로세서 설계)

  • Lee Jee-Myong;Lee Chanho;Kwon Woo-Suk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.6 s.336
    • /
    • pp.67-74
    • /
    • 2005
  • Crypto-systems have difficulties in designing hardware due to the various standards. We propose a programmable and configurable architecture for cryptography coprocessors to accommodate various crypto-systems. The proposed architecture has a 32 bit I/O interface and internal bus width, and consists of a programmable finite field arithmetic unit, an input/output unit, a register file, and a control unit. The crypto-system is determined by the micro-codes in memory of the control unit, and is configured by programming the micro-codes. The coprocessor has a modular structure so that the arithmetic unit can be replaced if a substitute has an appropriate 32 bit I/O interface. It can be used in many crypto-systems by re-programming the micro-codes for corresponding crypto-system or by replacing operation units. We implement an elliptic curve crypto-processor using the proposed architecture and compare it with other crypto-processors

Implementation of MPEG-4 BSAC Audio Decoder using ARM926EJ-S Processors (ARM926EJ-S 프로세서를 이용한 MPEG-4 BSAC 오디오 복호화기의 구현)

  • Jeon, Young-Taek;Park, Young-Cheol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.1 no.2
    • /
    • pp.91-98
    • /
    • 2008
  • Domestic standard for Korean T-DMB includes MPEG-4 BSAC (Bit Sliced Arithmetic Coding) audio coding that has been established in 2003. This paper presents an implementation and optimization of MPEG-4 BSAC Audio Decoder on ARM926EJ-S processor. Tools and modules of the BSAC audio decoder were implemented with 32-bit fixed point operations. Further optimization was accomplished using ARM926EJ-S Inline Assembly. The optimization was based on the total number of multiplications and MAC (Multiply and Accumulation) operations causing most of core cycles of ARM926EJ-S, and also based on analysis of ARMv5 instructions. The result of optimization was evaluated on the basis of MIPS (Million Instruction per second). Implementation results show that BSAC bitstream at 96kbps can be decoded in real-time at 65MHz CPU clocks.

  • PDF