• Title/Summary/Keyword: High-performance processor

Search Result 618, Processing Time 0.031 seconds

A design of compact and high-performance AES processor using composite field based S-Box and hardware sharing (합성체 기반의 S-Box와 하드웨어 공유를 이용한 저면적/고성능 AES 프로세서 설계)

  • Yang, Hyun-Chang;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.8
    • /
    • pp.67-74
    • /
    • 2008
  • A compact and high-performance AES(Advanced Encryption Standard) encryption/decryption processor is designed by applying various hardware sharing and optimization techniques. In order to achieve minimized hardware complexity, sharing the S-Boxes for round transformation with the key scheduler, as well as merging and reusing datapaths for encryption and decryption are utilized, thus the area of S-Boxes is reduced by 25%. Also, the S-Boxes which require the largest hardware in AES processor is designed by applying composite field arithmetic on $GF(((2^2)^2)^2)$, thus it further reduces the area of S-Boxes when compared to the design based on $GF(2^8)$ or $GF((2^4)^2)$. By optimizing the operation of the 64-bit round transformation and round key scheduling, the round transformation is processed in 3 clock cycles and an encryption of 128-bit data block is performed in 31 clock cycles. The designed AES processor has about 15,870 gates, and the estimated throughput is 412.9 Mbps at 100 MHz clock frequency.

High Performance Elliptic Curve Cryptographic Processor for $GF(2^m)$ ($GF(2^m)$의 고속 타원곡선 암호 프로세서)

  • Kim, Chang-Hoon;Kim, Tae-Ho;Hong, Chun-Pyo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.3
    • /
    • pp.113-123
    • /
    • 2007
  • This paper presents a high-performance elliptic curve cryptographic processor over $GF(2^m)$. The proposed design adopts Lopez-Dahab Montgomery algorithm for elliptic curve point multiplication and uses Gaussian normal basis for $GF(2^m)$ field arithmetic operations. We select m=163 which is the smallest value among five recommended $GF(2^m)$ field sizes by NIST and it is Gaussian normal basis of type 4. The proposed elliptic curve cryptographic processor consists of host interface, data memory, instruction memory, and control. We implement the proposed design using Xilinx XCV2000E FPGA device. Based on the FPGA implementation results, we can see that our design is 2.6 times faster and requires significantly less hardware resources compared with the previously proposed best hardware implementation.

Comparison of Parallel Computation Performances for 3D Wave Propagation Modeling using a Xeon Phi x200 Processor (제온 파이 x200 프로세서를 이용한 3차원 음향 파동 전파 모델링 병렬 연산 성능 비교)

  • Lee, Jongwoo;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.21 no.4
    • /
    • pp.213-219
    • /
    • 2018
  • In this study, we simulated 3D wave propagation modeling using a Xeon Phi x200 processor and compared the parallel computation performance with that using a Xeon CPU. Unlike the 1st generation Xeon Phi coprocessor codenamed Knights Corner, the 2nd generation x200 Xeon Phi processor requires no additional communication between the internal memory and the main memory since it can run an operating system directly. The Xeon Phi x200 processor can run large-scale computation independently, with the large main memory and the high-bandwidth memory. For comparison of parallel computation, we performed the modeling using the MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) libraries. Numerical examples using the SEG/EAGE salt model demonstrated that we can achieve 2.69 to 3.24 times faster modeling performance using the Xeon Phi with a large number of computational cores and high-bandwidth memory compared to that using the 12-core CPU.

Design of Architecture of Programmable Stack-based Video Processor with VHDL (VHDL을 이용한 프로그램 가능한 스택 기반 영상 프로세서 구조 설계)

  • 박주현;김영민
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.4
    • /
    • pp.31-43
    • /
    • 1999
  • The main goal of this paper is to design a high performance SVP(Stack based Video Processor) for network applications. The SVP is a comprehensive scheme; 'better' in the sense that it is an optimal selection of previously proposed enhancements of a stack machine and a video processor. This can process effectively object-based video data using a S-RISC(Stack-based Reduced Instruction Set Computer) with a semi -general-purpose architecture having a stack buffer for OOP(Object-Oriented Programming) with many small procedures at running programs. And it includes a vector processor that can improve the MPEG coding speed. The vector processor in the SVP can execute advanced mode motion compensation, motion prediction by half pixel and SA-DCT(Shape Adaptive-Discrete Cosine Transform) of MPEG-4. Absolutors and halfers in the vector processor make this architecture extensive to a encoder. We also designed a VLSI stack-oriented video processor using the proposed architecture of stack-oriented video decoding. It was designed with O.5$\mu\textrm{m}$ 3LM standard-cell technology, and has 110K logic gates and 12 Kbits SRAM internal buffer. The operating frequency is 50MHz. This executes algorithms of video decoding for QCIF 15fps(frame per second), maximum rate of VLBV(Very Low Bitrate Video) in MPEG-4.

  • PDF

DSP-Based Digital Controller for Multi-Phase Synchronous Buck Converters

  • Kim, Jung-Hoon;Lim, Jeong-Gyu;Chung, Se-Kyo;Song, Yu-Jin
    • Journal of Power Electronics
    • /
    • v.9 no.3
    • /
    • pp.410-417
    • /
    • 2009
  • This paper represents a design and implementation of a digital controller for a multi-phase synchronous buck converter (SBC) using a digital signal processor (DSP). The multi-phase SBC has generally been used for a voltage regulation module (VRM) of a microprocessor because of its high current handling capability at a low output voltage. The VRM requires high control performance of tight output regulation, high slew rate, and load sharing capability of multiple converters. In order to achieve these requirements, the design and implementation of a digital control system for a multi-phase SBC are presented in this paper. The digital PWM generation, current sensing, and voltage and current controller using a DSP TMS320F2812 are considered. The experimental results are provided to show the validity of the implemented digital control system.

Center Compensation Servo Control for High Speed CD-RW System (고배속 CD-RW 시스템을 위한 중점 서보 제어)

  • Seo, Sam-Jun;Kim, Dong-Sik
    • Proceedings of the KIEE Conference
    • /
    • 2003.07d
    • /
    • pp.2438-2440
    • /
    • 2003
  • This thesis presents a design methodology of a Digital Servo Signal Processor for high speed CD-ROM drive systems. The proposed Digital Servo Signal Processor enables us to develop CD-related systems for the very high speed applications and is one of the key components of the CD-ROM systems. The proposed center compensation servo control is newly built for a actuator shaking due to the fast response of a step motor when it jumps to a long distance. From experimental results, we can see that the performance of the control system is improved greatly. The proposed servo algorithm shows a shorter setting time including a pull-in time and a faster access time.

  • PDF

Meshfree/GFEM in hardware-efficiency prospective

  • Tian, Rong
    • Interaction and multiscale mechanics
    • /
    • v.6 no.2
    • /
    • pp.197-210
    • /
    • 2013
  • A fundamental trend of processor architecture evolving towards exaflops is fast increasing floating point performance (so-called "free" flops) accompanied by much slowly increasing memory and network bandwidth. In order to fully enjoy the "free" flops, a numerical algorithm of PDEs should request more flops per byte or increase arithmetic intensity. A meshfree/GFEM approximation can be the class of the algorithm. It is shown in a GFEM without extra dof that the kind of approximation takes advantages of the high performance of manycore GPUs by a high accuracy of approximation; the "expensive" method is found to be reversely hardware-efficient on the emerging architecture of manycore.

Parallel Deblocking Filter Based on Modified Order of Accessing the Coding Tree Units for HEVC on Multicore Processor

  • Lei, Haiwei;Liu, Wenyi;Wang, Anhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.3
    • /
    • pp.1684-1699
    • /
    • 2017
  • The deblocking filter (DF) reduces blocking artifacts in encoded video sequences, and thereby significantly improves the subjective and objective quality of videos. Statistics show that the DF accounts for 5-18% of the total decoding time in high-efficiency video coding. Therefore, speeding up the DF will improve codec performance, especially for the decoder. In view of the rapid development of multicore technology, we propose a parallel DF scheme based on a modified order of accessing the coding tree units (CTUs) by analyzing the data dependencies between adjacent CTUs. This enables the DF to run in parallel, providing accelerated performance and more flexibility in the degree of parallelism, as well as finer parallel granularity. We additionally solve the problems of variable privatization and thread synchronization in the parallelization of the DF. Finally, the DF module is parallelized based on the HM16.1 reference software using OpenMP technology. The acceleration performance is experimentally tested under various numbers of cores, and the results show that the proposed scheme is very effective at speeding up the DF.

A Specialized Reader for High Speed UHF RFID Tag Inlay Inspection Equipment (고속 UHF RFID 태그 검사 장비를 위한 전용 리더)

  • Bae, Sung Woo;Park, Jun-Seok;Seong, Yeong Rak;Oh, Ha-Ryoung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.1
    • /
    • pp.63-69
    • /
    • 2014
  • RFIDs have not become widespread as expected partly due to the cost, size, read range, and reliability problems of tags. The success rate of reading must be improved in order for RFIDs to be widely adopted. Quality control of tags is crucial to meet this requirement. In this study, we designed and implemented a high-performance reader used in inspection equipment that conducts prior inspection of tags. To improve performance of the developed reader, the baseband modem and command processor (CP) were designed using H/W logic and implemented with FPGA. The inspection of small pitch inlays was made possible through the antenna shielding device and H/W command processor function. This equipment enables accurate evaluation of performance and identification of tags satisfying a given read range. By contributing to sort out defective tags, the results can ultimately lead to more stable RFID services.

Design and Implementation of FPGA-based High Speed Multimedia Data Reassembly Processor (FPGA 기반의 고속 멀티미디어 데이터 재조합 프로세서 설계 및 구현)

  • Kim, Won-Ho
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.9 no.3
    • /
    • pp.213-218
    • /
    • 2008
  • This paper describes hardware-based high speed multimedia data reassembly processor for remote multimedia Set-Top-Box(MSTB) of interactive satellite multimedia communication system. The conventional multimedia data reassembly scheme is based on software processing of MSTB. As increasing of transmission rate for multimedia data services, the CPU load of remote MSTB is increased and reassembly performance of MSTB is limited. To provide high speed multimedia data service to end user, we proposed hardware based high speed multimedia data reassembly processor. It is implemented by using an FPGA, a PCI interface chip, and RAMs. And it is integrated in MSTB and tested. It has been confirmed to meet required all functions and processing rate up to 116Mbps.

  • PDF