• Title/Summary/Keyword: SIMD Processor

Search Result 65, Processing Time 0.298 seconds

Control Unit Design and Implementation for SIMD Programmable Unified Shader (SIMD 프로그래머블 통합 셰이더를 위한 제어 유닛 설계 및 구현)

  • Kim, Kyeong-Seob;Lee, Yun-Sub;Yu, Byung-Cheol;Jung, Jin-Ha;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.7
    • /
    • pp.37-47
    • /
    • 2011
  • Real picture like high quality computer graphic is widely used in various fields and shader processor, a key part of a graphic processor, has been advanced to programmable unified shader. However, The existing graphic processors have been optimized to commercial algorithms, so development of an algorithm which is not based on it requires an independent shader processor. In this paper, we have designed and implemented a control unit to support high quality 3 dimensional computer graphic image on programmable integrated shader processor. We have done evaluation through functional level simulation of designed control unit. Hardware resource usage rate are measured by implementing directly on FPGA Virtex-4 and execution speed are verified by applying ASIC library. the result of an evaluation shows that the control unit has the commands more about 1.5 times compared to the other shader processors that is a behavior similar to the control unit and with a number of processing units used in a shader processor, compared with the other processors, overall performance of the control unit is improved about 3.1 GFLOPS.

An Analytical Evaluation of 2D Mesh-connected SIMD Architecture for Parallel Matrix Multiplication (2D Mesh SIMD 구조에서의 병렬 행렬 곱셈의 수치적 성능 분석)

  • Kim, Cheong-Ghil
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.10 no.1
    • /
    • pp.7-13
    • /
    • 2011
  • Matrix multiplication is a fundamental operation of linear algebra and arises in many areas of science and engineering. This paper introduces an efficient parallel matrix multiplication scheme on N ${\times}$ N mesh-connected SIMD array processor, called multiple hierarchical SIMD architecture (HMSA). The architectural characteristic of HMSA is the hierarchically structured control units which consist of a global control unit, N local control units configured diagonally, and $N^2$ processing elements (PEs) arranged in an N ${\times}$ N array. PEs are communicating through local buses connecting four adjacent neighbor PEs in mesh-torus networks and global buses running across the rows and columns called horizontal buses and vertical buses, respectively. This architecture enables HMSA to have the features of diagonally indexed concurrent broadcast and the accessibility to either rows (row control mode) or columns (column control mode) of 2D array PEs alternately. An algorithmic mapping method is used for performance evaluation by mapping matrix multiplication on the proposed architecture. The asymptotic time complexities of them are evaluated and the result shows that paralle matrix multiplication on HMSA can provide significant performance improvement.

  • PDF

Design of Compiler & Variable-Length Instructions for SIMD Structured Shader (가변길이 SIMD구조 쉐이더 명령어 및 컴파일러 설계)

  • Kwak, Jae-Chang;Park, Tae-Ryoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.12
    • /
    • pp.2691-2697
    • /
    • 2010
  • Shader instructions and Compiler are designed for supporting 3D graphic shader 3.0 API. Variable-length instructions are proposed to reduce the size of hardware of graphic processor in SIMD structure by shortening the length of instructions. The designed shader compiler supports variable and two phased structured instructions, and can be programmable at ESSL level. Conformance Test proposed by Khronos group is accomplished to verify the design result of instructions and complier. The test result shows overall average 37% performance improvement at the 16 functions of basic GL shader.

Design of Special Function Unit for Vectorized SIMD Programmable Unified Shader (벡터화된 SIMD 프로그램어블 통합 셰이더를 위한 특수 함수 유닛 설계)

  • Jung, Jin-Ha;Kim, Kyeong-Seob;Yun, Jeong-Hee;Seo, Jang-Won;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.56-70
    • /
    • 2010
  • Rendering technique generating 2 dimensional image to give reality and high performance graphical processor for efficient processing of massive data are necessary to support realistic 3 dimensional graphical image. Recently, graphical hardwares have evolved rapidly. This enables high quality rendering effect that we were unable to process in realtime. Improving shading technique enabled us to render realistic images but still much time is required for this process. Multiple operational units are being integrated in a graphical processor for effective floating point operation using massive data to process almost real looking images. In this paper, we have designed and implemented a special functional unit to support high quality 3 dimensional computer graphic image on programmable integrated shader processor. We have done evaluation through functional level simulation of designed special functional unit. Hardware resource usage rate and execution speed are measured implementing directly on FPGA Virtex-4(xc4vlx200).

Implementation of an Optimal Many-core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 기법을 위한 최적의 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.119-128
    • /
    • 2011
  • This paper introduces design space exploration of many-core processors that meet high performance and low power required by the beamforming algorithm of image signals of mobile ultrasound. For the design space exploration of the many-core processor, we mapped different number of ultrasound image data to each processing element of many-core, and then determined an optimal many-core processor architecture in terms of execution time, energy efficiency and area efficiency. Experimental results indicate that PE=4096 and 1024 provide the highest energy efficiency and area efficiency, respectively. In addition, PE=4096 achieves 46x and 10x better than TI DSP C6416, which is widely used for ultrasound image devices, in terms of energy efficiency and area efficiency, respectively.

NTGST-Based Parallel Computer Vision Inspection for High Resolution BLU (NTGST 병렬화를 이용한 고해상도 BLU 검사의 고속화)

  • 김복만;서경석;최흥문
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.19-24
    • /
    • 2004
  • A novel fast parallel NTGST is proposed for high resolution computer vision inspection of the BLUs in a LCD production line. The conventional computation- intensive NTGST algorithm is modified and its C codes are optimized into fast NTGST to be adapted to the SIMD parallel architecture. And then, the input inspection image is partitioned and allocated to each of the P processors in multi-threaded implementation, and the NTGST is executed on SIMD architecture of N data items simultaneously in each thread. Thus, the proposed inspection system can achieve the speedup of O(NP). Experiments using Dual-Pentium III processor with its MMX and extended MMX SIMD technology show that the proposed parallel NTGST is about Sp=8 times faster than the conventional NTGST, which shows the scalability of the proposed system implementation for the fast, high resolution computer vision inspection of the various sized BLUs in LCD production lines.

A Low Power Design of H.264 Codec Based on Hardware and Software Co-design

  • Park, Seong-Mo;Lee, Suk-Ho;Shin, Kyoung-Seon;Lee, Jae-Jin;Chung, Moo-Kyoung;Lee, Jun-Young;Eum, Nak-Woong
    • Information and Communications Magazine
    • /
    • v.25 no.12
    • /
    • pp.10-18
    • /
    • 2008
  • In this paper, we present a low-power design of H.264 codec based on dedicated hardware and software solution on EMP(ETRI Multi-core platform). The dedicated hardware scheme has reducing computation using motion estimation skip and reducing memory access for motion estimation. The design reduces data transfer load to 66% compared to conventional method. The gate count of H.264 encoder and the performance is about 455k and 43Mhz@30fps with D1(720x480) for H.264 encoder. The software solution is with ASIP(Application Specific Instruction Processor) that it is SIMD(Single Instruction Multiple Data), Dual Issue VLIW(Very Long Instruction Word) core, specified register file for SIMD, internal memory and data memory access for memory controller, 6 step pipeline, and 32 bits bus width. Performance and gate count is 400MHz@30fps with CIF(Common Intermediated format) and about 100k per core for H.264 decoder.

A Study on Tools for Implementing High-speed Neural Network (신경회로망의 고속 구현 방법에 관한 연구)

  • Kim, Pyong-Kun;Kim, Doo-Sik;Lee, Sang-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.377-380
    • /
    • 2002
  • 신경회로망은 문자인식, 자동제어 등의 여러 분야에 널리 쓰이는 방식이다. 그러나 신경회로망을 구현하는데는 연산량이 많아서 실시간으로 구현하기에 어려움이 많이 따른다. 본 논문은 신경회로망을 구현하는데 필요한 연산을 살펴보고 그 연산을 구현하는 방법을 비교 분석하였다. 신경회로망을 구현하기 위해 DSP(Digital Signal Processor), PC의 FPU(Floating Point Unit), Intel사의 Pentium 계열 프로세서에서 지원하는 SIMD(Single Instruction Multiple Data) 기술을 사용하여 결과를 비교 분석 하였다. 신경회로망의 핵심인 MLP(Multi Layer Perceptron) 연산에 대해 실험한 결과 SIMD 기술을 이용하는 방법이 다른 방법에 비해 2배이상 좋은 결과를 나타내었다.

  • PDF

Acceleration of Radial Gradient Paint Processor for Mobile Device (모바일 기기에서의 방사형 그라디언트 페인트 가속)

  • Kim, Jin-Woo;Park, Jin-Hong;Han, Tack-Don
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.530-533
    • /
    • 2011
  • 방사형 그라디언트 페인트(radial gradient paint)는 벡터 그래픽스(vector graphics)에서 적은 정보로 다양한 효과를 적용시킬 수 있는 방법이다. 기본적으로 이 방법은 곱하기, 나누기, 제곱근 등의 복잡한 연산이 필요하기 때문에 모바일 같은 저성능 환경에 적합하지 않았다. 하지만 최근 모바일 기기들은 SIMD 연산 지원 및 고성능의 GPU 탑재 등으로 성능이 향상됨에 따라 이러한 문제를 해결할 수 있게 되었다. 본 논문은 ARM의 SIMD연산인 NEON을 이용하여 최대 2.6배의 성능을 가속시켰으며 GPU의 쉐이더를 이용하여 4.9배의 성능을 가속하였다.

An Echo Processor for Medical Ultrasound Imaging Using a GPU with Massively Parallel Processing Architecture (병렬 처리 구조의 GPU를 이용한 의료 초음파 영상용 에코 신호 처리기)

  • Seo, Sin-Hyeok;Sohn, Hak-Yeol;Song, Tai-Kyong
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.871-872
    • /
    • 2008
  • The method and results of the software implementation of a echo processor for medical ultrasound imaging using a GPU (NVIDIA G80) is presented. The echo signal processing functions are modified in a SIMD manner suitable for the GPU's massively parallel processing architecture so that the GPU's 128 ALUs are utilized nearly 100%. The preliminary result for a frame of image composed of 128 scan lines, each having 10240 16-bit samples, shows that the echo processor can be inplemented at a high rate of 30 frames per second when implemented in C, which is close to the optimized assembly codes running on the TI's TMS320C6416 DSP.

  • PDF