• Title/Summary/Keyword: Coprocessor

Search Result 56, Processing Time 0.072 seconds

An Integrated Design and Implementation of 128-bit block cipher SEED and UART with a low-cost FPGA (128비트 블록 암호 알고리즘 SEED와 UART의 저비용 FPGA를 이용한 통합 설계 및 구현)

  • Park, Ye-Chul;Yi, Kang
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.205-207
    • /
    • 2003
  • 본 논문에서는 국내 표준 128비트 블록 암호화 알고리즘인 SEED와 UART를 통합하여 최저가의 FPGA로 구현하는 방법을 제안한다. 논문[11베서 구현한 면적 요구량이 최소로 구현된 SEED암호화 모듈의 유용성을 실제 내장형 시스템에 적응하여 그 실효성을 보여주는 것이 본 논문의 목적이다. 우리가 구현한 회로는 SEED 를 통해 암호화를 한 후 UART를 이용하여 외부와의 통신할 수도 있고, SEED를 건너뛰고 UART 단독만 이웅하여 외부와 통신을 할 수도 있다. 또한, SEED 자체를 coprocessor로 이용하여 암호화/복호화 가능만 사용할 수도 있도록 설계하였다. 구현 결과, 10만 게이트를 갖는 Xilinx사의 Spartan-ll 계열의 xc2s100시리 즈 칩을 사용하였을 때, SEED와 UART와 주변 논리 회로를 합하여 84% 이하의 면적을 차지 하였고, 최대 41.3Mhz클럭에서 동작하였으며, SEED의 암호화 처리 Througput은 54.SSMbps로서 UART를 이용하여 통신하는데 전혀 문제가 없었다.

  • PDF

Fast Packet Filter ing using Network Coprocessor (네트워크 보조 프로세서를 사용한 고속 패킷 필터링)

  • Yi, Hong-Seok;Kim, Jong-Su;Chung, Ki-Hyun;Choi, Kyung-Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05b
    • /
    • pp.1129-1132
    • /
    • 2003
  • 사용자의 인터넷 서비스 고속화 요구가 증대되면서 스위치나 라우터 보안 장비와 같은 인터네트워킹 장비들의 성능 향상 요구도 커지고 있는데 이는 패킷을 처리하는 양과 속도를 향상시켜야 하다는 것을 의미한다. 스위치와 같은 라인 인터페이스 장치들은 주로 전용 하드웨어로 설계되므로 네트워크의 트래픽 처리 성능을 보장받을 수 있으나, 보안 장비나 네트워크 응용 장비들은 일반 서버 기반에서 트래픽을 처리하는 경우가 많아 시스템적으로 성능 향상에 제약을 받거나 성능 향상을 위해서 높은 비용을 지불해야 한다. 근래의 이러한 통신 관련 장비들에서의 패킷 처리 방식은 단순한 연결 차원을 넘어 패킷을 분석하고 연결을 제어하는 모습을 보이고 있는데 이러한 추가 작업 때문에라도 시스템에 많은 부하가 발생한다. 이러한 트래픽의 분석 처리를 빠르게 하기 위해서는 입력된 데이터와 설정된 규칙간의 비교와 판단이 빨라질 필요가 있는데, 본 논문에서는 이를 위해 기존에 연구된 몇 가지 S/W 적인 해결 방법과 H/W 적인 방법들을 분석하고, 더 나은 검색 성능을 위해 H/W 기반 네트워크 보조 프로세서를 이용한 방식을 제안하고 실험을 통하여 검증하였다.

  • PDF

A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices (Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계)

  • Kwon Do-All;Chung Tae-Sang
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.12
    • /
    • pp.723-731
    • /
    • 2005
  • The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.

Implementation of the Squared-Error Pattern Clustering Processor Using the Residue Number System (剩餘數體系를 이용한 자승오차 패턴 클러스터링 프로세서의 실현)

  • Kim, Hyeong-Min;Cho, Won-Kyung
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.2
    • /
    • pp.87-93
    • /
    • 1989
  • Squared-error Pattern Clustering algorithm used in unsupervised pattern recognition and image processing application demands substantial processing time for operation of feature vector matrix. So, this paper propose the fast squared-error Pattern Clustering Processor using the Residue Number System which have been the nature of parallel processing and pipeline. The proposed Squared-error Pattern Clustering Processor illustrate satisfiable error rate for Cluster number which can be divide meaningful region and about 200 times faster than 80287 coprocessor from experiments result of image segmentation. In this result, it is useful to real-time processing application for large data.

  • PDF

Enhancement of H.264/AVC Encoding Speed and Reduction of CPU Load through Parallel Programming Based on CUDA (CUDA 기반의 병렬 프로그래밍을 통한 H.264/AVC 부호화 속도 향상 및 CPU 부하 경감)

  • Jang, Eun-Been;Ha, Yun-Su
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.34 no.6
    • /
    • pp.858-863
    • /
    • 2010
  • In order to enhance encoding speed in dynamic image encoding using H.264/AVC, reducing the time for motion estimation which takes a large portion of the processing time is very important. An approach using graphics processing unit(GPU) as a coprocessor to assist the central processing unit(CPU) in computing massive data, will be a way to reduce the processing time. In this paper, we present an efficient block-level parallel algorithm for the motion estimation(ME) on a computer unified device architecture(CUDA) platform developed in general-purpose computation on GPU. Experiments are carried out to verify the effectiveness of the proposed algorithm.

Memory-Efficient High Performance Parallelization of Aho-Corasick Algorithm on Intel Xeon Phi (Intel Xeon Phi 에서의 Aho-Corasick 알고리즘을 위한 메모리 친화적인 고성능 병렬화)

  • Tran, Nhat-Phuong;Jeong, Yosang;Lee, Myungho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.87-89
    • /
    • 2014
  • Aho-Corasick (AC) algorithm is a multiple patterns string matching algorithm commonly used in many applications with real-time performance requirements. In this paper, we parallelize the AC algorithm on the Intel's Many Integrated Core (MIC) Architecture, Xeon Phi Coprocessor. We propose a new technique to compress the Deterministic Finite Automaton structure which represents the set of pattern strings again which the input data is inspected for possible matches. The new technique reduces the cache misses and leads to significantly improved performance on Xeon Phi.

Design of an Image Processing ASIC Architecture using Parallel Approach with Zero or Little (통신부담을 감소시킨 영상처리를 위한 병렬처리 방식 ASIC구조 설계)

  • 안병덕;정지원;선우명훈
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.10
    • /
    • pp.2043-2052
    • /
    • 1994
  • This paper proposes a new parallel ASIC architecture for real-time image processing to reduce inter-processing element (inter-PE) communication overhead, called a Sliding Memory Plane (SliM) Image Processor. The Slim Image Processor consists of $3\times3$ processing elements (PEs) connected by a mesh topology. With easy scalability due to the topology. a set of SliM Image Processors can form a mesh-connected SIMD parallel architecture. called the SliM Array Processor. The idea of sliding means that all pixels are slided into all neighboring PEs without interrupting PEs and without a coprocessor or a DMA controller. Since the inter-PE communication and computation occur simultaneously. the inter-PE communication overhead, significant disadvantage of existing machines greatly diminishes. Two I/O planes provide a buffering capability and reduce the date I/O overhead. In addition, using the by-passing path provides eight-way connectivity even with four links. with these salient features. SliM shows a significant performance improvement. This paper presents architectures of a PE and the SliM Image Processor, and describes the design of an instruction set.

  • PDF

Hardware Design of Elliptic Curve processor Resistant against Simple Power Analysis Attack (단순 전력분석 공격에 대처하는 타원곡선 암호프로세서의 하드웨어 설계)

  • Choi, Byeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.1
    • /
    • pp.143-152
    • /
    • 2012
  • In this paper hardware implementation of GF($2^{191}$) elliptic curve cryptographic coprocessor which supports 7 operations such as scalar multiplication(kP), Menezes-Vanstone(MV) elliptic curve cipher/decipher algorithms, point addition(P+Q), point doubling(2P), finite-field multiplication/division is described. To meet structure resistant against simple power analysis, the ECC processor adopts the Montgomery scalar multiplication scheme which main loop operation consists of the key-independent operations. It has operational characteristics that arithmetic units, such GF_ALU, GF_MUL, and GF_DIV, which have 1, (m/8), and (m-1) fixed operation cycles in GF($2^m$), respectively, can be executed in parallel. The processor has about 68,000 gates and its simulated worst case delay time is about 7.8 ns under 0.35um CMOS technology. Because it has about 320 kbps cipher and 640 kbps rate and supports 7 finite-field operations, it can be efficiently applied to the various cryptographic and communication applications.

Design and Implementation of a Host Interface for a Regular Expression Processor (정규표현식 프로세서를 위한 호스트 인터페이스 설계 및 구현)

  • Kim, JongHyun;Yun, SangKyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.2
    • /
    • pp.97-103
    • /
    • 2017
  • Many hardware-based regular expression matching architectures have been proposed for high-performance matching. In particular, regular expression processors, which perform pattern matching by treating the regular expressions as the instruction sequence like general purpose processors, have been proposed. After instruction sequence and data are provided in the instruction memory and data memory, respectively, a regular expression processor can perform pattern matching. To use a regular expression processor as a coprocessor, we need the host interface to transfer the instruction and data into the memory of a regular expression processor. In this paper, we design and implement the host interface between a host and a regular expression processor in the DE1-SoC board and the application program interface. We verify the operations of the host interface and a regular expression processor by executing the application programs which perform pattern matching using the application program interface.

AE32000B: a Fully Synthesizable 32-Bit Embedded Microprocessor Core

  • Kim, Hyun-Gyu;Jung, Dae-Young;Jung, Hyun-Sup;Choi, Young-Min;Han, Jung-Su;Min, Byung-Gueon;Oh, Hyeong-Cheol
    • ETRI Journal
    • /
    • v.25 no.5
    • /
    • pp.337-344
    • /
    • 2003
  • In this paper, we introduce a fully synthesizable 32-bit embedded microprocessor core called the AE32000B. The AE32000B core is based on the extendable instruction set computer architecture, so it has high code density and a low memory access rate. In order to improve the performance of the core, we developed and adopted various design options, including the load extension register instruction (LERI) folding unit, a high performance multiply and accumulate (MAC) unit, various DSP units, and an efficient coprocessor interface. The instructions per cycle count of the Dhrystone 2.1 benchmark for the designed core is about 0.86. We verified the synthesizability and the area and time performances of our design using two CMOS standard cell libraries: a 0.35-${\mu}m$ library and a 0.18-${\mu}m$ library. With the 0.35-${\mu}m$ library, the core can be synthesized with about 47,000 gates and operate at 70 MHz or higher, while it can be synthesized with about 53,000 gates and operate at 120 MHz or higher with the 0.18-${\mu}m$ library.

  • PDF