• Title/Summary/Keyword: SIMD instruction

Search Result 81, Processing Time 0.024 seconds

Parallel Speedup of NTGST on SIMD type Multiprocessor (SIMD 구조의 다중 프로세서를 이용한 NTGST의 병렬고속화)

  • 김복만;서경석;김종화;최흥문
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.127-130
    • /
    • 2001
  • 본 논문에서는 SIMD (Single Instruction stream and Multiple Data stream)형 병렬 구조의 다중 프로세서를 이용하여 NTGST (noise-tolerant generalized symmetry transform)를 병렬 고속화하였다. 먼저 NTGST의 화소 및 영상 영역간의 계산 독립성을 이용하여 영상을 분할하여 P개의 프로세서에 할당하고, 이들 각각을 N개의 데이터를 한번에 처리하는 SIMD 구조로 병렬화하여 NP에 비례하는 속도 향상을 얻었다. 실험에서 MMX 기술의 펜티엄 Ⅲ 프로세서를 2개 사용하여 제안한 알고리즘이 기존의 NTGST 보다 8배 가까이 고속으로 처리됨을 확인하였다.

  • PDF

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing (효율적인 영상데이터 처리를 위한 SIMD기반 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2011
  • Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.

Efficient Maximum Intensity Projection using SIMD Instruction and Streaming Memory Transfer (단일 명령 복수 데이터 연산과 순차적 메모리 참조를 이용한 효율적인 최대 휘소 투영 볼륨 가시화)

  • Kye, Hee-Won
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.4
    • /
    • pp.512-520
    • /
    • 2009
  • Maximum intensity projection (MIP) is a volume rendering method which extracts maximum values along the viewing direction through volume data. It visualizes high-density structures, such as angio-graphic datasets so that it is frequently used in medical imaging systems. We have proposed an efficient two-step MIP acceleration method that uses the recent CPUs. First, we exploited SIMD instructions to reduce conditional branch instructions which take up a considerable part of whole rendering process, so that we improved rendering speed. Second, we proposed a new method, which accesses volume and image data successively by modifying the shear-warp rendering. This method improves memory access patterns so that cache misses are reduced. Using the current CPUs, our method improved the rendering speed by a factor of 7 than that of the shear-warp rendering.

  • PDF

Fast implementation of Hadamard transformation of HEVC with SIMD (SIMD를 이용한 HEVC 하다마드 트랜스폼의 고속 구현)

  • You, Jong-Hun;Jo, Hyun-Ho;Sim, Dong-Gyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.11a
    • /
    • pp.307-309
    • /
    • 2011
  • 본 논문에서는 SIMD(Single Instruction Multiple Data) 프로세서를 사용한 HEVC 부호화기의 하다마트 트랜스폼 고속화를 제안한다. 본 논문에서는 MMX와 SSE 레지스터를 사용하여 하다마드 트랜스폼을 SIMD 연산으로 대체함으로써 메모리 접근 횟수와 명령어의 수를 줄여 부호화기를 고속화하였다. 또한, HEVC의 10비트 입력에 따른 SIMD 구조의 비효율적인 구현을 해결하기 위하여 하다마드 트랜스폼의 입력 픽셀 비트수를 감소시키는 IBDD(Internal Bit Depth Decreasing)를 제안했다. HEVC 부호화기에 하다마드 트랜스폼을 SIMD 연산으로 대체한 결과 부호화 효율의 저하 없이, 부호화기의 수행 시간은 10% 감소되었다.

  • PDF

Design of Compiler & Variable-Length Instructions for SIMD Structured Shader (가변길이 SIMD구조 쉐이더 명령어 및 컴파일러 설계)

  • Kwak, Jae-Chang;Park, Tae-Ryoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.12
    • /
    • pp.2691-2697
    • /
    • 2010
  • Shader instructions and Compiler are designed for supporting 3D graphic shader 3.0 API. Variable-length instructions are proposed to reduce the size of hardware of graphic processor in SIMD structure by shortening the length of instructions. The designed shader compiler supports variable and two phased structured instructions, and can be programmable at ESSL level. Conformance Test proposed by Khronos group is accomplished to verify the design result of instructions and complier. The test result shows overall average 37% performance improvement at the 16 functions of basic GL shader.

A Fast SAD Algorithm for Area-based Stereo Matching Methods (영역기반 스테레오 영상 정합을 위한 고속 SAD 알고리즘)

  • Lee, Woo-Young;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.7 no.2
    • /
    • pp.8-12
    • /
    • 2012
  • Area-based stereo matchng algorithms are widely used for image analysis for stereo vision. SAD (Sum of Absolute Difference) algorithm is one of well known area-based stereo matchng algorithms with the characteristics of data intensive computing application. Therefore, it requires very high computation capabilities and its processing speed becomes very slow with software realization. This paper proposes a fast SAD algorithm utilizing SSE (Streaming SIMD Extensions) instructions based on SIMD (Single Instruction Multiple Data) parallism. CPU supporing SSE instructions has 16 XMM registers with 128 bits. For the performance evaluation of the proposed scheme, we compare the processing speed between SAD with/without SSE instructions. The proposed scheme achieves four times performance improvement over the general SAD, which shows the possibility of the software realization of real time SAD algorithm.

A Low Power Design of H.264 Codec Based on Hardware and Software Co-design

  • Park, Seong-Mo;Lee, Suk-Ho;Shin, Kyoung-Seon;Lee, Jae-Jin;Chung, Moo-Kyoung;Lee, Jun-Young;Eum, Nak-Woong
    • Information and Communications Magazine
    • /
    • v.25 no.12
    • /
    • pp.10-18
    • /
    • 2008
  • In this paper, we present a low-power design of H.264 codec based on dedicated hardware and software solution on EMP(ETRI Multi-core platform). The dedicated hardware scheme has reducing computation using motion estimation skip and reducing memory access for motion estimation. The design reduces data transfer load to 66% compared to conventional method. The gate count of H.264 encoder and the performance is about 455k and 43Mhz@30fps with D1(720x480) for H.264 encoder. The software solution is with ASIP(Application Specific Instruction Processor) that it is SIMD(Single Instruction Multiple Data), Dual Issue VLIW(Very Long Instruction Word) core, specified register file for SIMD, internal memory and data memory access for memory controller, 6 step pipeline, and 32 bits bus width. Performance and gate count is 400MHz@30fps with CIF(Common Intermediated format) and about 100k per core for H.264 decoder.

Scalable Application Mapping for SIMD Reconfigurable Architecture

  • Kim, Yongjoo;Lee, Jongeun;Lee, Jinyong;Paek, Yunheung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.6
    • /
    • pp.634-646
    • /
    • 2015
  • Coarse-Grained Reconfigurable Architecture (CGRA) is a very promising platform that provides fast turn-around-time as well as very high energy efficiency for multimedia applications. One of the problems with CGRAs, however, is application mapping, which currently does not scale well with geometrically increasing numbers of cores. To mitigate the scalability problem, this paper discusses how to use the SIMD (Single Instruction Multiple Data) paradigm for CGRAs. While the idea of SIMD is not new, SIMD can complicate the mapping problem by adding an additional dimension of iteration mapping to the already complex problem of operation and data mapping, which are all interdependent, and can thus significantly affect performance through memory bank conflicts. In this paper, based on a new architecture called SIMD reconfigurable architecture, which allows SIMD execution at multiple levels of granularity, we present how to minimize bank conflicts considering all three related sub-problems, for various RA organizations. We also present data tiling and evaluate a conflict-free scheduling algorithm as a way to eliminate bank conflicts for a certain class of mapping problem.

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 픽셀 서브워드 병렬처리 명령어 구현)

  • Jung, Yong-Bum;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.99-108
    • /
    • 2011
  • Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.

The PC Clustering of the SIMD Structure for a Distributed Process of On-line Contingency (온라인 선로상정사고 분산처리를 위한 SIMD 구조의 PC 클러스터링)

  • Jang, Se-Hwan;Kim, Jin-Ho;Park, June-Ho
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.7
    • /
    • pp.1150-1156
    • /
    • 2008
  • This paper introduces the PC clustering of the SIMD structure for a distributed processing of on-line contingency to assess a static security of a power system. To execute on-line contingency analysis of a large-scale power system, we need to use high-speed execution device. Therefore, we constructed PC-cluster system using PC clustering method of the SIMD structure and applied to a power system, which relatively shows high quality on the high-speed execution and has a low price. SIMD(single instruction stream, multiple data stream) is a structure that processes are controlled by one signal. The PC cluster system is consisting of 8 PCs. Each PC employs the 2 GHz Pentium 4 CPU and is connected with the others through ethernet switch based fast ethernet. Also, we consider N-1 line contingency that have high potentiality of occurrence realistically. We propose the distributed process algorithm of the SIMD structure for reducing too much execution time on the on-line N-1 line contingency analysis in the large-scale power system. And we have verified a usefulness of the proposed algorithm and the constructed PC cluster system through IEEE 39 and 118 bus system.