• Title/Summary/Keyword: Vector Processor

Search Result 176, Processing Time 0.029 seconds

Architecture Exploration of Optimal Many-Core Processors for a Vector-based Rasterization Algorithm (래스터화 알고리즘을 위한 최적의 매니코어 프로세서 구조 탐색)

  • Son, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.1
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we implement and evaluate the performance of a vector-based rasterization algorithm for 3D graphics by using a SIMD (single instruction multiple data) many-core processor architecture. In addition, we evaluate the impact of a data-per-processing elements (DPE) ratio that is defined as the amount of data directly mapped to each processing element (PE) within many-core in terms of performance, energy efficiency, and area efficiency. For the experiment, we utilize seven different PE configurations by varying the DPE ratio (or the number PEs), which are implemented in the same 130 nm CMOS technology with a 500 MHz clock frequency. Experimental results indicate that the optimal PE configuration is achieved as the DPE ratio is in the range from 16,384 to 256 (or the number of PEs is in the range from 16 and 1,024), which meets the requirements of mobile devices in terms of the optimal performance and efficiency.

GPU-Based ECC Decode Unit for Efficient Massive Data Reception Acceleration

  • Kwon, Jisu;Seok, Moon Gi;Park, Daejin
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1359-1371
    • /
    • 2020
  • In transmitting and receiving such a large amount of data, reliable data communication is crucial for normal operation of a device and to prevent abnormal operations caused by errors. Therefore, in this paper, it is assumed that an error correction code (ECC) that can detect and correct errors by itself is used in an environment where massive data is sequentially received. Because an embedded system has limited resources, such as a low-performance processor or a small memory, it requires efficient operation of applications. In this paper, we propose using an accelerated ECC-decoding technique with a graphics processing unit (GPU) built into the embedded system when receiving a large amount of data. In the matrix-vector multiplication that forms the Hamming code used as a function of the ECC operation, the matrix is expressed in compressed sparse row (CSR) format, and a sparse matrix-vector product is used. The multiplication operation is performed in the kernel of the GPU, and we also accelerate the Hamming code computation so that the ECC operation can be performed in parallel. The proposed technique is implemented with CUDA on a GPU-embedded target board, NVIDIA Jetson TX2, and compared with execution time of the CPU.

발사체 추력백터제어 구동장치용 컴퓨터 하드웨어 설계

  • Park, Moon-Su;Lee, Hee-Joong;Min, Byeong-Joo;Choi, Hyung-Don
    • Aerospace Engineering and Technology
    • /
    • v.3 no.2
    • /
    • pp.56-64
    • /
    • 2004
  • In this research, design results of computer hardware which control solid motor movable nozzle thrust vector control(TVC) actuator for Korea Space Launch Vehicle I(KSLV-I) are described. TVC computer hardware is the equipment which has jobs for receiving control commands from Navigation Guidance Unit(NGU) and then actuating TVC actuator. Also, it has ability to communicate with other on board or ground equipments. Computer hardware has a digital signal processor as the main processor which is capable of high speed calculating ability of control algorithm, so it can have more stability, reliability and flexibility than the previous analog controller of KSR-III. Target board was designed for on board program development and then first prototype hardware was developed. Top level system design criteria, hardware configurations and ground support equipment of TVC computer system are described.

  • PDF

Common-mode Voltage Reduction of Three Level Four Leg PWM Converter (3레벨 4레그 PWM 컨버터의 커먼 모드 전압 저감)

  • Chee, Seung-Jun;Ko, Sanggi;Kim, Hyeon-Sik;Sul, Seung-Ki
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.19 no.6
    • /
    • pp.488-493
    • /
    • 2014
  • This paper presents a carrier-based pulse-width modulation(PWM) method for reducing the common-mode voltage of a three-level four-leg converter. The idea of the proposed PWM method is intuitive and easy to be implemented in digital signal processor-based converter control systems. On the basis of the analysis of space-vector PWM(SVPWM) and sinusoidal PWM(SPWM) switching patterns, the fourth leg pole voltage of the three-phase converter called "f leg pole voltage" is manipulated to reduce the common-mode voltage. To synthesize f leg pole voltage for the suppression of the common-mode voltage, positive and negative pole voltage references of f leg are calculated. An offset voltage is also deduced to prevent the distortion of a, b, and c phase voltages. The feasibility of the proposed PWM method is verified by simulation and experimental results. The common-mode voltage of the proposed PWM method in peak-to-peak value is 33% in comparison with that of the conventional SVPWM method. The transition number of the common-mode voltage is also reduced to 25%.

Efficient Implementation of SVM-Based Speech/Music Classifier by Utilizing Temporal Locality (시간적 근접성 향상을 통한 효율적인 SVM 기반 음성/음악 분류기의 구현 방법)

  • Lim, Chung-Soo;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.149-156
    • /
    • 2012
  • Support vector machines (SVMs) are well known for their pattern recognition capability, but proper care should be taken to alleviate their inherent implementation cost resulting from high computational intensity and memory requirement, especially in embedded systems where only limited resources are available. Since the memory requirement determined by the dimensionality and the number of support vectors is generally too high for a cache in embedded systems to accomodate, frequent accesses to the main memory occur inevitably whenever the cache is not able to provide requested data to the processor. These frequent accesses to the main memory result in overall performance degradation and increased energy consumption because a memory access typically takes longer and consumes more energy than a cache access or a register access. In this paper, we propose a technique that reduces the number of main memory accesses by optimizing the data access pattern of the SVM-based classifier in such a way that the temporal locality of the accesses increases, fully utilizing data loaded into the processor chip. With experiments, we confirm the enhancement made by the proposed technique in terms of the number of memory accesses, overall execution time, and energy consumption.

Wind Power System using Doubly-Fed Induction Generator and Matrix Converter (매트릭스컨버터와 이중여자유도발전기를 사용한 풍력발전시스템)

  • Lee, Dong-Geun;Kwon, Gi-Hyun;Han, Byung-Moon;Li, Yu-Long;Choi, Nam-Sup;Choy, Young-Do
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.985-993
    • /
    • 2008
  • This paper proposes a new DFIG(Doubly-Fed Induction Generator) system using matrix converter, which is very effectively used for interconnecting the wind power system to the power grid. The operation of proposed system was verified by computer simulations with PSCAD/EMTDC software. The feasibility of hardware implementation was conformed by experimental works with a laboratory scaled-model of wind power system. The laboratory scaled-model was built using a motor-generator set with vector drive system, and a matrix converter with DSP(Digital Signal Processor). The operation of scaled-model was tested by modeling the specific variable-speed wind turbine using the real wind data in order to make the scaled-model simulate the real wind power system as close as possible. The simulation and experimental results confirm that matrix converter can be applied for the DFIG system.

The Implementation of Processor for Linearly shift Knapsack Public Key Crypto System In Cheon Paik (선형이동 Knapsack 공개키 암호시스템을 위한 프로세서 구현)

  • 백인천;차균현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.11
    • /
    • pp.2291-2302
    • /
    • 1994
  • This paper shows the implementation and design of special processor for linearly shift knapsack public key cryptography system. We highten the density of existing knapsack vector and shift the vectors linearly in order to implement the structure of linearly shift knapsack system which has the stronger cryptosystem. As it needs the parallel processing at each path according to the characteristics of this system. we propose the pipelined parallel structure and implement this system into VLSL. Also we evaluate this system and compare with other systems. The processing speed of this system is 550kb/s when dimension is 100. It is possible to use this system at the place of requiring high speed security to enlarge the structure of it.

  • PDF

The Postprocessor Technology of for 5-axis Control Machining (5축가공을 위한 포스트프로세서 기술)

  • Jung, Hyoun-Chul;Hwang, Jong-Dae;Kim, Sang-Myung;Jung, Yoon-Gyo
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.10 no.2
    • /
    • pp.9-15
    • /
    • 2011
  • In order to develop a practical postprocessor for 5-axis machining, the general equations of numerically controlled (NC) data for 5-axis configurations with not only non-orthogonal rotary axes but also orthogonal rotary axes were exactly expressed by the inverse kinematics, and a Windows-based postprocessor written in Visual Basic was developed according to the proposed algorithm. The developed postprocessor is a general system that suitable for all kinds of 5-axis machine tool with orthogonal and non-orthogonal rotary axes. Through implementation of the developed postprocessor and verification by a cutting simulation and machining experiment, the effectiveness of the proposed algorithm is confirmed. Compatibility is improved by allowing exchange of data formats such as rotational tool center position (RTCP) controlled NC data, vector post NC data, and program object file (POF) cutter location (CL)data, and convenience is increased by adding the function of work-piece origin offset. Consequently, the technology of practical post-processor for 5-axis machining is developed.

Design of a DSP Controller and Driver for the Power-by-wire(PBW) Driving System Using BLDC Servo Motor Pump (BLDC 서보 모터 펌프를 이용하는 직동력(PBW) 구동시스템의 DSP 제어기 및 구동기 설계)

  • Joo, Jae-Hun;Sim, Dong-Seouk;Choi, Jung-Keyng
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.5
    • /
    • pp.1207-1212
    • /
    • 2011
  • This paper presents a study on the DSP(Digital Signal Processor) controller for the PBW(power-by-wire) system using BLDC(Brushless Direct Current) servo motor pump. The PBW hydraulic actuator was realized with hydraulic pump driven by BLDC servo motor, hydraulic cylinder and controller. This PBW system needs speed control of servo motor for linear thrust action of hydraulic cylinder. This paper implements a servo controller with vector control algorithm and MIN-MAX PWM technique. As CPU of a controller, TMS320F2812 DSP was adopted because it has PWM waveform generator, A/D converter, SPI(Serial Peripheral Interface) port and many input/output port etc.

Fast GPU Implementation for the Solution of Tridiagonal Matrix Systems (삼중대각행렬 시스템 풀이의 빠른 GPU 구현)

  • Kim, Yong-Hee;Lee, Sung-Kee
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.692-704
    • /
    • 2005
  • With the improvement of computer hardware, GPUs(Graphics Processor Units) have tremendous memory bandwidth and computation power. This leads GPUs to use in general purpose computation. Especially, GPU implementation of compute-intensive physics based simulations is actively studied. In the solution of differential equations which are base of physics simulations, tridiagonal matrix systems occur repeatedly by finite-difference approximation. From the point of view of physics based simulations, fast solution of tridiagonal matrix system is important research field. We propose a fast GPU implementation for the solution of tridiagonal matrix systems. In this paper, we implement the cyclic reduction(also known as odd-even reduction) algorithm which is a popular choice for vector processors. We obtained a considerable performance improvement for solving tridiagonal matrix systems over Thomas method and conjugate gradient method. Thomas method is well known as a method for solving tridiagonal matrix systems on CPU and conjugate gradient method has shown good results on GPU. We experimented our proposed method by applying it to heat conduction, advection-diffusion, and shallow water simulations. The results of these simulations have shown a remarkable performance of over 35 frame-per-second on the 1024x1024 grid.