• 제목/요약/키워드: hardware coprocessor

검색결과 25건 처리시간 0.021초

Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계 (A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices)

  • 권두올;정태상
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제54권12호
    • /
    • pp.723-731
    • /
    • 2005
  • The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.

단순 전력분석 공격에 대처하는 타원곡선 암호프로세서의 하드웨어 설계 (Hardware Design of Elliptic Curve processor Resistant against Simple Power Analysis Attack)

  • 최병윤
    • 한국정보통신학회논문지
    • /
    • 제16권1호
    • /
    • pp.143-152
    • /
    • 2012
  • 본 논문은 스칼라 곱셈, Menezes-Vanstone 타원곡선 암호 및 복호 알고리즘, 점-덧셈, 점-2배 연산, 유한체상 곱셈, 나눗셈 등의 7가지 동작을 수행하는 GF($2^{191}$) 타원곡선 암호프로세서를 하드웨어로 설계하였다. 단순 전력 분석에 대비하기 위해 타원곡선 암호프로세서는 주된 반복 동작이 키 값에 무관하게 동일한 연산 동작으로 구성되는 몽고메리 스칼라 곱셈 기법을 사용하며, GF($2^m$)의 유한체에서 각각 1, (m/8), (m-1)개의 고정된 사이클에 완료되는 GF-ALU, GF-MUL, GF-DIV 연산장치가 병렬적으로 수행되는 동작 특성을 갖는다. 설계된 프로세서는 0.35um CMOS 공정에서 약 68,000개의 게이트로 구성되며, 시뮬레이션을 통한 최악 지연시간은 7.8 ns로 약 125 MHz의 동작속도를 갖는다. 설계된 프로세서는 320 kps의 암호율, 640 kbps을 복호율 갖고 7개의 유한체 연산을 지원하므로 다양한 암호와 통신 분야에 적용할 수 있다.

정규표현식 프로세서를 위한 호스트 인터페이스 설계 및 구현 (Design and Implementation of a Host Interface for a Regular Expression Processor)

  • 김종현;윤상균
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제23권2호
    • /
    • pp.97-103
    • /
    • 2017
  • 정규표현식 패턴 매칭을 고속으로 수행하기 위하여 하드웨어 기반의 정규표현식 매칭 회로들이 제시되었으며, 특히 보통 프로세서처럼 정규표현식에 대한 프로그램을 실행하여 패턴 매칭을 수행하는 정규표현식 프로세서가 제시되었다. 정규표현식 프로세서가 패턴 매칭을 수행하기 위해서는 명령어 메모리에 정규표현식 패턴에 대한 명령어가, 데이터 메모리에는 매칭 대상이 되는 데이터가 미리 저장되어야 한다. 정규표현식 프로세서를 호스트의 보조프로세서로 사용하려면 호스트에서 정규표현식 프로세서의 명령어 메모리와 데이터 메모리를 초기화하는 기능을 제공해야 하며 이를 위한 호스트 인터페이스가 필요하다. 본 논문에서는 Altera사의 DE1-SoC 보드에서 호스트와 정규표현식 프로세서 간의 인터페이스를 설계하였고, 이를 사용하기 위한 응용 프로그램 인터페이스도 구현하였다. 응용 프로그램에서 응용프로그램 인터페이스를 사용하여 정규표현식 프로세서를 이용한 패턴 매칭을 수행하여 호스트 인터페이스의 동작을 확인하였다.

Design of 32 bit Parallel Processor Core for High Energy Efficiency using Instruction-Levels Dynamic Voltage Scaling Technique

  • Yang, Yil-Suk;Roh, Tae-Moon;Yeo, Soon-Il;Kwon, Woo-H.;Kim, Jong-Dae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제9권1호
    • /
    • pp.1-7
    • /
    • 2009
  • This paper describes design of high energy efficiency 32 bit parallel processor core using instruction-levels data gating and dynamic voltage scaling (DVS) techniques. We present instruction-levels data gating technique. We can control activation and switching activity of the function units in the proposed data technique. We present instruction-levels DVS technique without using DC-DC converter and voltage scheduler controlled by the operation system. We can control powers of the function units in the proposed DVS technique. The proposed instruction-levels DVS technique has the simple architecture than complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system and a hardware implementation is very easy. But, the energy efficiency of the proposed instruction-levels DVS technique having dual-power supply is similar to the complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system. We simulate the circuit simulation for running test program using Spectra. We selected reduced power supply to 0.667 times of the supplied power supply. The energy efficiency of the proposed 32 bit parallel processor core using instruction-levels data gating and DVS techniques can improve about 88.4% than that of the 32 bit parallel processor core without using those. The designed high energy efficiency 32 bit parallel processor core can utilize as the coprocessor processing massive data at high speed.

Memory Organization for a Fuzzy Controller.

  • Jee, K.D.S.;Poluzzi, R.;Russo, B.
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1993년도 Fifth International Fuzzy Systems Association World Congress 93
    • /
    • pp.1041-1043
    • /
    • 1993
  • Fuzzy logic based Control Theory has gained much interest in the industrial world, thanks to its ability to formalize and solve in a very natural way many problems that are very difficult to quantify at an analytical level. This paper shows a solution for treating membership function inside hardware circuits. The proposed hardware structure optimizes the memoried size by using particular form of the vectorial representation. The process of memorizing fuzzy sets, i.e. their membership function, has always been one of the more problematic issues for the hardware implementation, due to the quite large memory space that is needed. To simplify such an implementation, it is commonly [1,2,8,9,10,11] used to limit the membership functions either to those having triangular or trapezoidal shape, or pre-definite shape. These kinds of functions are able to cover a large spectrum of applications with a limited usage of memory, since they can be memorized by specifying very few parameters ( ight, base, critical points, etc.). This however results in a loss of computational power due to computation on the medium points. A solution to this problem is obtained by discretizing the universe of discourse U, i.e. by fixing a finite number of points and memorizing the value of the membership functions on such points [3,10,14,15]. Such a solution provides a satisfying computational speed, a very high precision of definitions and gives the users the opportunity to choose membership functions of any shape. However, a significant memory waste can as well be registered. It is indeed possible that for each of the given fuzzy sets many elements of the universe of discourse have a membership value equal to zero. It has also been noticed that almost in all cases common points among fuzzy sets, i.e. points with non null membership values are very few. More specifically, in many applications, for each element u of U, there exists at most three fuzzy sets for which the membership value is ot null [3,5,6,7,12,13]. Our proposal is based on such hypotheses. Moreover, we use a technique that even though it does not restrict the shapes of membership functions, it reduces strongly the computational time for the membership values and optimizes the function memorization. In figure 1 it is represented a term set whose characteristics are common for fuzzy controllers and to which we will refer in the following. The above term set has a universe of discourse with 128 elements (so to have a good resolution), 8 fuzzy sets that describe the term set, 32 levels of discretization for the membership values. Clearly, the number of bits necessary for the given specifications are 5 for 32 truth levels, 3 for 8 membership functions and 7 for 128 levels of resolution. The memory depth is given by the dimension of the universe of the discourse (128 in our case) and it will be represented by the memory rows. The length of a world of memory is defined by: Length = nem (dm(m)+dm(fm) Where: fm is the maximum number of non null values in every element of the universe of the discourse, dm(m) is the dimension of the values of the membership function m, dm(fm) is the dimension of the word to represent the index of the highest membership function. In our case then Length=24. The memory dimension is therefore 128*24 bits. If we had chosen to memorize all values of the membership functions we would have needed to memorize on each memory row the membership value of each element. Fuzzy sets word dimension is 8*5 bits. Therefore, the dimension of the memory would have been 128*40 bits. Coherently with our hypothesis, in fig. 1 each element of universe of the discourse has a non null membership value on at most three fuzzy sets. Focusing on the elements 32,64,96 of the universe of discourse, they will be memorized as follows: The computation of the rule weights is done by comparing those bits that represent the index of the membership function, with the word of the program memor . The output bus of the Program Memory (μCOD), is given as input a comparator (Combinatory Net). If the index is equal to the bus value then one of the non null weight derives from the rule and it is produced as output, otherwise the output is zero (fig. 2). It is clear, that the memory dimension of the antecedent is in this way reduced since only non null values are memorized. Moreover, the time performance of the system is equivalent to the performance of a system using vectorial memorization of all weights. The dimensioning of the word is influenced by some parameters of the input variable. The most important parameter is the maximum number membership functions (nfm) having a non null value in each element of the universe of discourse. From our study in the field of fuzzy system, we see that typically nfm 3 and there are at most 16 membership function. At any rate, such a value can be increased up to the physical dimensional limit of the antecedent memory. A less important role n the optimization process of the word dimension is played by the number of membership functions defined for each linguistic term. The table below shows the request word dimension as a function of such parameters and compares our proposed method with the method of vectorial memorization[10]. Summing up, the characteristics of our method are: Users are not restricted to membership functions with specific shapes. The number of the fuzzy sets and the resolution of the vertical axis have a very small influence in increasing memory space. Weight computations are done by combinatorial network and therefore the time performance of the system is equivalent to the one of the vectorial method. The number of non null membership values on any element of the universe of discourse is limited. Such a constraint is usually non very restrictive since many controllers obtain a good precision with only three non null weights. The method here briefly described has been adopted by our group in the design of an optimized version of the coprocessor described in [10].

  • PDF