• Title/Summary/Keyword: single-instruction multiple-data

Search Result 77, Processing Time 0.022 seconds

Optimization of Image Averaging Filter Using SIMD Instructions (SIMD 명령어를 이용한 영상의 평균화 필터 최적화)

  • Kim, Jun-Chul;Cui, Xuenan;Park, Eun-Soo;Kim, Hak-Il
    • Proceedings of the KIEE Conference
    • /
    • 2007.10a
    • /
    • pp.431-432
    • /
    • 2007
  • 본 연구에서는 영상의 잡음 제거에 우수한 평균화 필터 알고리즘을 SIMD(Single Instruction Multiple Data) 명령어를 이용하여 고속화하였다. 먼저 순차 알고리즘의 속도 향상을 위한 최적화를 수행하고, SIMD에 적합 하도록 변환 하여 4 또는 16 개의 데이터를 동시에 연산이 가능하도록 하였다. 또한 나누기 연산을 줄이기 위하여 8비트의 데이터 타입과 함께 룩업 테이블(Lookup Table)을 이용하였다. 각각의 데이터 타입에 적합하게 알고리즘을 변환하여 비교하였고 실험을 통하여 기존의 순차 처리방식에 비해 평균적으로 2.5배 이상의 속도 향상을 보았다.

  • PDF

A Study on High-speed Image Binarization Using SIMD (SIMD를 이용한 영상의 고속 이진화에 관한 연구)

  • Kim, Doo-Sik;Lee, Sang-Ho;Kim, Byeong-Geun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.775-778
    • /
    • 2002
  • 영상 이진화란 명도 영상(gray-scaled image)을 이진 영상(bi-leveled image)으로 변환하는 것을 말한다. 영상 이진화는 문서 인식, 비디오 영상 분석 등과 같이 영상처리 분야에서 많이 사용되는 기본적인 영상 처리 과정에 해당한다. 본 논문은 Intel 사의 Pentium 계열 프로세서에서 지원하는 SIMD(Single-Instruction Multiple-Data) 기술을 이용하여 영상 이진화를 고속으로 수행하는 방법을 소개한다. 우편영상에 대하여 실험한 결과, SSE2 명령어로 구현된 프로그램은 기존의 C 언어로 구현된 프로그램에 비하여 4배 이상의 속도 향상을 보였다.

  • PDF

A Study on Tools for Implementing High-speed Neural Network (신경회로망의 고속 구현 방법에 관한 연구)

  • Kim, Pyong-Kun;Kim, Doo-Sik;Lee, Sang-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.377-380
    • /
    • 2002
  • 신경회로망은 문자인식, 자동제어 등의 여러 분야에 널리 쓰이는 방식이다. 그러나 신경회로망을 구현하는데는 연산량이 많아서 실시간으로 구현하기에 어려움이 많이 따른다. 본 논문은 신경회로망을 구현하는데 필요한 연산을 살펴보고 그 연산을 구현하는 방법을 비교 분석하였다. 신경회로망을 구현하기 위해 DSP(Digital Signal Processor), PC의 FPU(Floating Point Unit), Intel사의 Pentium 계열 프로세서에서 지원하는 SIMD(Single Instruction Multiple Data) 기술을 사용하여 결과를 비교 분석 하였다. 신경회로망의 핵심인 MLP(Multi Layer Perceptron) 연산에 대해 실험한 결과 SIMD 기술을 이용하는 방법이 다른 방법에 비해 2배이상 좋은 결과를 나타내었다.

  • PDF

A Fast SIFT Implementation Based on Integer Gaussian and Reconfigurable Processor

  • Su, Le Tran;Lee, Jong Soo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.2 no.3
    • /
    • pp.39-52
    • /
    • 2009
  • Scale Invariant Feature Transform (SIFT) is an effective algorithm in object recognition, panorama stitching, and image matching, however, due to its complexity, real time processing is difficult to achieve with software approaches. This paper proposes using a reconfigurable hardware processor with integer half kernel. The integer half kernel Gaussian reduces the Gaussian pyramid complexity in about half [] and the reconfigurable processor carries out a parallel implementation of a full search Fast SIFT algorithm. We use a low memory, fine grain single instruction stream multiple data stream (SIMD) pixel processor that is currently being developed. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and I/O capabilities of the processor which results in a system that can perform real time image and video compression. We apply this novel implementation to images and measure the effectiveness. Experimental simulation results indicate that the proposed implementation is capable of real time applications.

  • PDF

Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder (JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증)

  • Kim, Yong-Min;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.2
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

  • Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.1
    • /
    • pp.1-9
    • /
    • 2014
  • This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.

Architecture Exploration of Optimal Many-Core Processors for a Vector-based Rasterization Algorithm (래스터화 알고리즘을 위한 최적의 매니코어 프로세서 구조 탐색)

  • Son, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.1
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we implement and evaluate the performance of a vector-based rasterization algorithm for 3D graphics by using a SIMD (single instruction multiple data) many-core processor architecture. In addition, we evaluate the impact of a data-per-processing elements (DPE) ratio that is defined as the amount of data directly mapped to each processing element (PE) within many-core in terms of performance, energy efficiency, and area efficiency. For the experiment, we utilize seven different PE configurations by varying the DPE ratio (or the number PEs), which are implemented in the same 130 nm CMOS technology with a 500 MHz clock frequency. Experimental results indicate that the optimal PE configuration is achieved as the DPE ratio is in the range from 16,384 to 256 (or the number of PEs is in the range from 16 and 1,024), which meets the requirements of mobile devices in terms of the optimal performance and efficiency.

Optimal Economic Load Dispatch using Parallel Genetic Algorithms in Large Scale Power Systems (병렬유전알고리즘을 응용한 대규모 전력계통의 최적 부하배분)

  • Kim, Tae-Kyun;Kim, Kyu-Ho;Yu, Seok-Ku
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.4
    • /
    • pp.388-394
    • /
    • 1999
  • This paper is concerned with an application of Parallel Genetic Algorithms(PGA) to optimal econmic load dispatch(ELD) in power systems. The ELD problem is to minimize the total generation fuel cost of power outputs for all generating units while satisfying load balancing constraints. Genetic Algorithms(GA) is a good candidate for effective parallelization because of their inherent principle of evolving in parallel a population of individuals. Each individual of a population evaluates the fitness function without data exchanges between individuals. In application of the parallel processing to GA, it is possible to use Single Instruction stream, Multiple Data stream(SIMD), a kind of parallel system. The architecture of SIMD system need not data communications between processors assigned. The proposed ELD problem with C code is implemented by SIMSCRIPT language for parallel processing which is a powerfrul, free-from and versatile computer simulation programming language. The proposed algorithms has been tested for 38 units system and has been compared with Sequential Quadratic programming(SQP).

  • PDF

Multi-Core Processor for Real-Time Sound Synthesis of Gayageum (가야금의 실시간 음 합성을 위한 멀티코어 프로세서 구현)

  • Choi, Ji-Won;Cho, Sang-Jin;Kim, Cheol-Hong;Kim, Jong-Myon;Chong, Ui-Pil
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.1-10
    • /
    • 2011
  • Physical modeling has been widely used for sound synthesis since it synthesizes high quality sound which is similar to real-sound for musical instruments. However, physical modeling requires a lot of parameters to synthesize a large number of sounds simultaneously for the musical instrument, preventing its real-time processing. To solve this problem, this paper proposes a single instruction, multiple data (SIMD) based multi-core processor that supports real-time processing of sound synthesis of gayageum which is a representative Korean traditional musical instrument. The proposed SIMD-base multi-core processor consists of 12 processing elements (PE) to control 12 strings of gayageum in which each PE supports modeling of the corresponding string. The proposed SIMD-based multi-core processor can generate synthesized sounds of 12 strings simultaneously after receiving excitation signals and parameters of each string as an input. Experimental results using a sampling reate 44.1 kHz and 16 bits quantization show that synthesis sound using the proposed multi-core processor was very similar to the original sound. In addition, the proposed multi-core processor outperforms commercial processors(TI's TMS320C6416, ARM926EJ-S, ARM1020E) in terms of execution time ($5.6{\sim}11.4{\times}$ better) and energy efficiency (about $553{\sim}1,424{\times}$ better).

Implementation of Parallel Processor for Sound Synthesis of Guitar (기타의 음 합성을 위한 병렬 프로세서 구현)

  • Choi, Ji-Won;Kim, Yong-Min;Cho, Sang-Jin;Kim, Jong-Myon;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.191-199
    • /
    • 2010
  • Physical modeling is a synthesis method of high quality sound which is similar to real sound for musical instruments. However, since physical modeling requires a lot of parameters to synthesize sound of a musical instrument, it prevents real-time processing for the musical instrument which supports a large number of sounds simultaneously. To solve this problem, this paper proposes a single instruction multiple data (SIMD) parallel processor that supports real-time processing of sound synthesis of guitar, a representative plucked string musical instrument. To control six strings of guitar, we used a SIMD parallel processor which consists of six processing elements (PEs). Each PE supports modeling of the corresponding string. The proposed SIMD processor can generate synthesized sounds of six strings simultaneously when a parallel synthesis algorithm receives excitation signals and parameters of each string as an input. Experimental results using a sampling rate 44.1 kHz and 16 bits quantization indicate that synthesis sounds using the proposed parallel processor were very similar to original sound. In addition, the proposed parallel processor outperforms commercial TI's TMS320C6416 in terms of execution time (8.9x better) and energy efficiency (39.8x better).