• Title/Summary/Keyword: SIMD Processor

Search Result 65, Processing Time 0.022 seconds

A Design of a Shader Processor based on a dual-phase pipeline architecture (듀얼 페이즈 명령어 파이프라인구조의 쉐이더 프로세서 설계)

  • Jeong, Hyung-Ki;Nam, Ki-Hun;Lee, Gwang-Yeob
    • Journal of IKEEE
    • /
    • v.12 no.4
    • /
    • pp.246-254
    • /
    • 2008
  • This paper represents a design of a 4 way SIMD processor with multi-thread and dual phase instruction pipeline. 8 threads can be performing in round-robin order, so any hazards can’t occur. The dual phase pipeline makes a pipeline operate as two pipelines, and it can fetch maximum 4 unit instructions at once. This variable length instruction set divide into first phase and second phase instructions, and with this function, complex branch and addressing can be executed at one clock cycle. This processor reduces the code size to quarter, pull out the doubled performance improvement than normal SIMD architecture.

  • PDF

A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia (모바일 멀티미디어의 효율적 처리를 위한 재구성형 병렬 프로세서의 구조)

  • Yoo, Se-Hoon;Kim, Ki-Chul;Yang, Yil-Suk;Roh, Tae-Moon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.23-32
    • /
    • 2007
  • This paper proposes a reconfigurable parallel processor architecture which can efficiently implement various multimedia applications, such as 3D graphics, H.264/H.263/MPEG-4, JPEG/JPEG2000, and MP3. The proposed architecture directly connects memories and processors so that memory access time and power consumption are reduced. It supports floating-point operations needed in the geometry stage of 3D graphics. It adopts partitioned SIMD to reduce hardware costs. Conditional execution of instructions is used for easy development of parallel algorithms.

A Fast SIFT Implementation Based on Integer Gaussian and Reconfigurable Processor

  • Su, Le Tran;Lee, Jong Soo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.2 no.3
    • /
    • pp.39-52
    • /
    • 2009
  • Scale Invariant Feature Transform (SIFT) is an effective algorithm in object recognition, panorama stitching, and image matching, however, due to its complexity, real time processing is difficult to achieve with software approaches. This paper proposes using a reconfigurable hardware processor with integer half kernel. The integer half kernel Gaussian reduces the Gaussian pyramid complexity in about half [] and the reconfigurable processor carries out a parallel implementation of a full search Fast SIFT algorithm. We use a low memory, fine grain single instruction stream multiple data stream (SIMD) pixel processor that is currently being developed. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and I/O capabilities of the processor which results in a system that can perform real time image and video compression. We apply this novel implementation to images and measure the effectiveness. Experimental simulation results indicate that the proposed implementation is capable of real time applications.

  • PDF

Implementation of Multi-Core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 알고리즘을 위한 멀티코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.2
    • /
    • pp.45-52
    • /
    • 2011
  • In the past, a patient went to the room where an ultrasound image diagnosis device was set, and then he or she was examined by a doctor. However, currently a doctor can go and examine the patient with a handheld ultrasound device who stays in a room. However, it was implemented with only fundamental functions, and can not meet the high performance required by the focusing algorithm of ultrasound beam which determines the quality of ultrasound image. In addition, low energy consumption was satisfied for the mobile ultrasound device. To satisfy these requirements, this paper proposes a high-performance and low-power single instruction, multiple data (SIMD) based multi-core processor that supports a representative beamforming algorithm out of several focusing methods of mobile ultrasound image signals. The proposed SIMD multi-core processor, which consists of 16 processing elements (PEs), satisfies the high-performance required by the beamforming algorithm by exploiting considerable data-level parallelism inherent in the echo image data of ultrasound. Experimental results showed that the proposed multi-core processor outperforms a commercial high-performance processor, TI DSP C6416, in terms of execution time (15.8 times better), energy efficiency (6.9 times better), and area efficiency (10 times better).

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

A Study on the 3 Dimension Graphics Accelerator for Phong Shading Algorithm (Phong Shading 알고리즘을 적용한 3차원 영상을 위한 고속 그래픽스 가속기 연구)

  • Park, Youn-Ok;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.5
    • /
    • pp.97-103
    • /
    • 2010
  • There are many algorithms for 2D to 3D graphic conversion technology which have the high complexity and large scale of iterative computation. So in this paper propose parallel algorithm and high speed graphics accelerator architecture using Park's MAMS(Multiple Access Memory System) for Phong Shading, one of many 3D algorithms. The Proposed SIMD processor architecture is simulated by HDL and simulated and got 30 times faster result. It means any kinds of 3D algorithm can make parallel algorithm and accelerated by SIMD processor with Park's MAMS for real time processing.

Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder (JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증)

  • Kim, Yong-Min;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.2
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

Parallel Simulation of Bounded Petri Nets using Data Packing Scheme (데이터 중첩을 통한 페트리네트의 병렬 시뮬레이션)

  • 김영찬;김탁곤
    • Journal of the Korea Society for Simulation
    • /
    • v.11 no.2
    • /
    • pp.67-75
    • /
    • 2002
  • This paper proposes a parallel simulation algorithm for bounded Petri nets in a single processor, which exploits the SIMD(Single Instruction Multiple Data)-type parallelism. The proposed algorithm is based on a data packing scheme which packs multiple bytes data in a single register, thereby being manipulated simultaneously. The parallelism can reduce simulation time of bounded Petri nets in a single processor environment. The effectiveness of the algorithm is demonstrated by presenting speed-up of simulation time for two bounded Petri nets.

  • PDF

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

Parallel Factorization using Quadratic Sieve Algorithm on SIMD machines (SIMD상에서의 이차선별법을 사용한 병렬 소인수분해 알고리즘)

  • Kim, Yang-Hee
    • The KIPS Transactions:PartA
    • /
    • v.8A no.1
    • /
    • pp.36-41
    • /
    • 2001
  • In this paper, we first design an parallel quadratic sieve algorithm for factoring method. We then present parallel factoring algorithm for factoring a large odd integer by repeatedly using the parallel quadratic sieve algorithm based on the divide-and-conquer strategy on SIMD machines with DMM. We show that this algorithm is optimal in view of the product of time and processor numbers.

  • PDF