• Title/Summary/Keyword: 레지스터

Search Result 506, Processing Time 0.027 seconds

High Speed Implementation of LEA on ARMv8 (ARMv8 상에서 LEA 암호화 고속 구현)

  • Seo, Hwa-jeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.10
    • /
    • pp.1929-1934
    • /
    • 2017
  • Lightweight block cipher (Lightweight Encryption Algorithm, LEA), is the most promising block cipher algorithm due to its efficient implementation feature and high security level. The LEA block cipher is widely used in real-field applications and there are many efforts to enhance the performance of LEA in terms of execution timing to achieve the high availability under any circumstances. In this paper, we enhance the performance of LEA block cipher, particularly on ARMv8 processors. The LEA implementation is optimized by using new SIMD instructions namely NEON engine and 24 LEA encryption operations are simultaneously performed in parallel way. In order to reduce the number of memory access, we utilized the all NEON registers to retain the intermediate results. Finally, we evaluated the performance of the LEA implementation, and the proposed implementations on Apple A7 and Apple A9 achieved the 2.4 cycles/byte and 2.2 cycles/byte, respectively.

FSM Designs with Control Flow Intensive Cycle-C Descriptions (Cycle-C를 이용한 제어흐름 중심의 FSM 설계)

  • Yun Chang-Ryul;Jhang Kyoung-Son
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.1
    • /
    • pp.26-35
    • /
    • 2005
  • Generally, we employ FSMs for the design of controllers in digital systems. FSMs are Implemented with state diagrams generated from control flow. With HDL, we design and verify FSMs based on state diagrams. As the number of states in the system increases, the verification or modification processes become complicated, error prone and time consuming. In this paper, we propose a control flow oriented hardware description language at the register transfer level called Cycle-C. Cycle-C describes FSMs with timing information and control How intensive algorithms. The Cycle-C description is automatically converted into FSMs in the form of synthesizable RTL VHDL. In experiments, we design FSMs for control intensive interface circuits. There is little area difference between Cycle-C design and manual design. In addition, Cycle-C design needs only 10~50% of the number lines of manual RTL VHDL designs.

Implementation of Optimizing Compiler for Bus-based VLIW Processors (버스기반의 VLIW형 프로세서를 위한 최적화 컴파일러 구현)

  • Hong, Seung-Pyo;Moon, Soo-Mook
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.401-407
    • /
    • 2000
  • Modern microprocessors exploit instruction-level parallel processing to increase the performance. Especially VLIW processors supported by the parallelizing compiler are used more and more in specific applications such as high-end DSP and graphic processing. Bus-based VLIW architecture was proposed for these specific applications and it was designed to reduce the overhead of forwarding unit and the instruction width. In this paper, a optimizing scheduling compiler developed for the proposed bus-based VLIW processor is introduced. First, the method to model interconnections between buses and resource usage patterns is described. Then, on the basis of the modeling, machine-dependent optimization techniques such as bus-to-register promotion, copy coalescing and operand substitution were implemented. Optimization techniques for general-purpose VLIW microprocessors such as selective scheduling and enhanced pipelining scheduling(EPS) were also implemented. The experiment result shows about 20% performance gain for multimedia application benchmarks.

  • PDF

A Register-Based Caching Technique for the Advanced Performance of Multithreaded Models (다중스레드 모델의 성능 향상을 위한 가용 레지스터 기반 캐슁 기법)

  • Go, Hun-Jun;Gwon, Yeong-Pil;Yu, Won-Hui
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.107-116
    • /
    • 2001
  • A multithreaded model is a hybrid one which combines locality of execution of the von Neumann model with asynchronous data availability and implicit parallelism of the dataflow model. Much researches that have been made toward the advanced performance of multithreaded models are about the cache memory which have been proved to be efficient in the von Neumann model. To use an instruction cache or operand cache, the multithreaded models must have cache memories. If cache memories are added to the multithreaded model, they may have the disadvantage of high implementation cost in the mode. To solve these problems, we did not add cache memory but applied the method of executing the caching by using available registers of the multithreaded models. The available register-based caching method is one that use the registers which are not used on the execution of threads. It may accomplish the same effect as the cache memory. The multithreaded models can compute the number of available registers to be used during the process of the register optimization, and therefore this method can be easily applied on the models. By applying this method, we can also remove the access conflict and the bottleneck of frame memories. When we applied the proposed available register-based caching method, we found that there was an improved performance of the multithreaded model. Also, when the available-register-based caching method is compared with the cache based caching method, we found that there was the almost same execution overhead.

  • PDF

Design and Implementation of Hand-Held Inspection Device for High Performance Mobile TFT LCD/OLED Module (고성능 모바일 TFT LCD/OLED 모듈을 위한 헨드헬드 검사장비 설계 및 구현)

  • Moon, Seung-Jin;Kim, Hong-Kyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.6B
    • /
    • pp.630-640
    • /
    • 2009
  • The thesis suggests hand-held equipment to overhaul for mobile TFT LCD/OLED module of high-performance. The established module equipment to overhaul could distinguish outputting video data to module for distinguishing flicker, but it is impossible with low system. In this thesis, supporting system could check the various supplement functions from bringing equipment to overhaul without changing design of FPGA or H/W the module various size for equipment to overhaul for module of high-performance coincidently. The system includes hand-held equipment to overhaul, test software embedded and software a base personal computer and have designed to output, save, and certify all contents of module test of hand-held equipment to overhaul to interface universal serial bus. Setting up 9 items that represent for efficient verification of the proposed system have been possible confirmation with TFT LCD/OLED module of high-performance, establishment scan time, creation gamma, changing register, supporting interface, and multi inch modules.

Multi-Dimensional Record Scan with SIMD Vector Instructions (SIMD 벡터 명령어를 이용한 다차원 레코드 스캔)

  • Cho, Sung-Ryong;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.732-736
    • /
    • 2010
  • Processing a large amount of data becomes more important than ever. Particularly, the information queries which require multi-dimensional record scan can be efficiently implemented with SIMD instruction sets. In this article, we present a SIMD record scan technique which employs row-based scanning. Our technique is different from existing SIMD techniques for predicate processes and aggregate operations. Those techniques apply SIMD instructions to the attributes in the same column of the database, exploiting the column-based record organization of the in-memory database systems. Whereas, our SIMD technique is useful for multi-dimensional record scanning. As the sizes of registers and the memory become larger, our row-based SIMD scan can have bigger impact on the performance. Moreover, since our technique is orthogonal to the parallelization techniques for multi-core processors, it can be applied to both uni-processors and multi-core processors without too many changes in the software architectures.

Analysis of Code Sequence Generating Algorism and Implementation of Code Sequence Generator using Boolean Functions (부울함수를 이용한 부호계열 발생알고리즘 분석 부호계열발생기 구성)

  • Lee, Jeong-Jae
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.13 no.4
    • /
    • pp.194-200
    • /
    • 2012
  • In this paper we analyze the code sequence generating algorism defined on $GF(2^n)$ proposed by S.Bostas and V.Kumar[7] and derive the implementation functions of code sequence generator using Boolean functions which can map the vector space $F_2^n$ of all binary vectors of length n, to the finite field with two elements $F_2$. We find the code sequence generating boolean functions based on two kinds of the primitive polynomials of degree, n=5 and n=7 from trace function. We then design and implement the code sequence generators using these functions, and produce two code sequence groups. The two groups have the period 31 and 127 and the magnitudes of out of phase(${\tau}{\neq}0$) autocorrelation and crosscorrelation functions {-9, -1, 7} and {-17, -1, 15}, satisfying the period $L=2^n-1$ and the correlation functions $R_{ij}({\tau})=\{-2^{(n+1)/2}-1,-1,2^{(n+l)/2}-1\}$ respectively. Through these results, we confirm that the code sequence generators using boolean functions are designed and implemented correctly.

A new design method of m-bit parallel BCH encoder (m-비트 병렬 BCH 인코더의 새로운 설계 방법)

  • Lee, June;Woo, Choong-Chae
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.244-249
    • /
    • 2010
  • The design of error correction code with low complexity has a good attraction for next generation multi-level cell flash memory. Sharing sub-expressions is effective method to reduce complexity and chip size. This paper proposes a new design method of m-bit parallel BCH encoder based on serial linear feedback shift register structure with low complexity using sub-expression. In addition, general algorithm for obtaining the sub-expression is introduced. The sub-expression can be expressed by matrix operation between sub-matrix of generator matrix and sum of two different variables. The number of the sub-expression is restricted by. The obtained sub-expressions can be shared for implementation of different m-parallel BCH encoder. This paper is not focused on solving a problem (delay) induced by numerous fan-out, but complexity reduction, expecially the number of gates.

Efficient Radio Resource Measurement System in IEEE 802.11 Networks (IEEE 802.11 네트워크에서 효율적인 라디오 자원 측정 시스템 연구)

  • Yang, Seung-Chur;Lee, Sung-Ho;Kim, Jong-Deok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.11
    • /
    • pp.2437-2445
    • /
    • 2012
  • This paper presents the efficient measurement method of radio resource by analyzing various medium occupied elements. The medium occupied time consists of 802.11 frames, wireless interference, and protocol waiting time from a wireless node on a current channel. And it is used to performance metric. Existing research is only measured partial occupied elements, and is lack of validation of measurement unit and scalability on various IEEE 802.11 radio. This paper presents the measurement method of classified occupied elements. To achieve this, we modified 802.11n based OpenHAL device driver to collect the register information of wireless chipset, and to analyze receiving frames in an virtual monitor mode. We conclude accurate medium occupied time measurement system from various validation methods.

An Embedded Software Debugger Using an Instruction Set Simulator (명령어 집합 시뮬레이터를 이용한 임베디드 소프트웨어 디버거)

  • Jung, Hun;Son, Sung-Hoon;Shin, Dong-Ha
    • Journal of the Korea Society for Simulation
    • /
    • v.15 no.4
    • /
    • pp.51-58
    • /
    • 2006
  • Debugging embedded softwares is very different from debugging general softwares. For examples, debugging embedded software requires more information, such as information on power consumption, information on the distribution of executed instructions, information on the distribution of used registers, and information on the amount of clocks consumed during the execution of a program, that is not needed in debugging general softwares. In this paper, we propose more effective method fer debugging embedded softwares using an instruction set simulator for the microprocessor that is executing embedded softwares. In this research, we develop a debugger based on an instruction set simulator for a domestic embedded microprocessor called SE1608 and we shows an effective debugging method using a MiBench program which is widely used to benchmark embedded softwares. The debugging method proposed in this paper is relatively easy to implement and shows many advantages compared with existing debugging methods.

  • PDF