• Title/Summary/Keyword: SIMD

Search Result 176, Processing Time 0.02 seconds

Implementation of Multi-Core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 알고리즘을 위한 멀티코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.2
    • /
    • pp.45-52
    • /
    • 2011
  • In the past, a patient went to the room where an ultrasound image diagnosis device was set, and then he or she was examined by a doctor. However, currently a doctor can go and examine the patient with a handheld ultrasound device who stays in a room. However, it was implemented with only fundamental functions, and can not meet the high performance required by the focusing algorithm of ultrasound beam which determines the quality of ultrasound image. In addition, low energy consumption was satisfied for the mobile ultrasound device. To satisfy these requirements, this paper proposes a high-performance and low-power single instruction, multiple data (SIMD) based multi-core processor that supports a representative beamforming algorithm out of several focusing methods of mobile ultrasound image signals. The proposed SIMD multi-core processor, which consists of 16 processing elements (PEs), satisfies the high-performance required by the beamforming algorithm by exploiting considerable data-level parallelism inherent in the echo image data of ultrasound. Experimental results showed that the proposed multi-core processor outperforms a commercial high-performance processor, TI DSP C6416, in terms of execution time (15.8 times better), energy efficiency (6.9 times better), and area efficiency (10 times better).

Parallel Simulation of Bounded Petri Nets using Data Packing Scheme (데이터 중첩을 통한 페트리네트의 병렬 시뮬레이션)

  • 김영찬;김탁곤
    • Journal of the Korea Society for Simulation
    • /
    • v.11 no.2
    • /
    • pp.67-75
    • /
    • 2002
  • This paper proposes a parallel simulation algorithm for bounded Petri nets in a single processor, which exploits the SIMD(Single Instruction Multiple Data)-type parallelism. The proposed algorithm is based on a data packing scheme which packs multiple bytes data in a single register, thereby being manipulated simultaneously. The parallelism can reduce simulation time of bounded Petri nets in a single processor environment. The effectiveness of the algorithm is demonstrated by presenting speed-up of simulation time for two bounded Petri nets.

  • PDF

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

Multiaccess Memory System supporting Local Buffer Memory System to Processing Elements (처리기에 지역 버퍼 메모리 시스템을 지원하는 다중접근기억장치)

  • Lee, Hyung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.30-37
    • /
    • 2012
  • A memory system with the linear skewing scheme has been regarded as one of suitable memory systems for a single instruction, multiple data (SIMD) architecture. The memory system supports simultaneous access n data to m memory modules within various access types with a constant interval in an arbitrary position in two dimensional data array of $M{\times}N$. Although $m{\times}cells$ memory cells are physically required to support logical two dimensional $M{\times}N$ array of data by means of the memory system, at least (m-n)${\times}cells$ memory cells remain in disuse, where cells is (M-1)/q+(N-1)/$p{\times}{\lceil}M/q{\rceil}+1$. On keeping functionalities the memory system supports, $(n{\times}t){\times}N/p$ out of a number of unused memory cells, where t>0, being used as local buffer memories for n processing elements is proposed in this paper.

A Study of Printed Score Recognition and its Parallel Algorithm (인쇄 악보의 인식과 병렬 알고리즘에 관한 연구)

  • 황영길;김성천
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.5
    • /
    • pp.959-970
    • /
    • 1994
  • In this thesis, a printed score is read by using handy scanner and the recognition process is excuted in parallel, finally, on Mesh-Connected Computer. What is read is classified into certain patterns and is recognized, based on knowledge. The preprocessing steps are minimized and simple operations are used in the algorithm proposed in this thesis. The score symbols on a printed score can be recognized irrespective of their sizes but their diversity males it difficult to recognize them all, so it is programmed so as to recognize some symbols that is used necessarily and frequently. The recognized result is transformed into the MIDI standard file format. It is required to use a parallel processing system with multiprocessors because the high speed image processing is required. A digitized two-dimensional image is appropriate in processing on the SIMD Mesh-Connected Computer(MCC). Therefore, we explain this architecture and present parallel algorithm using SIMD MCC with n processors that achieves time complexity0(n).

  • PDF

A k-Tree-Based Resource (CU/PE) Allocation for Reconfigurable MSIMD/MIMD Multi-Dimensional Mesh-Connected Architectures

  • Srisawat, Jeeraporn;Surakampontorn, Wanlop;Atexandridis, Kikitas A.
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.58-61
    • /
    • 2002
  • In this paper, we present a new generalized k-Tree-based (CU/PE) allocation model to perform dynamic resource (CU/PE) allocation/deallocation decision for the reconfigurable MSIMD/MIMD multi-dimensional (k-D) mesh-connected architectures. Those reconfigurable multi-SIMD/MIMD systems allow dynamic modes of executing tasks, which are SIMD and MIMD. The MIMD task requires only the free sub-system; however the SIMD task needs not only the free sub-system but also the corresponding free CU. In our new k-Tree-based (CU/PE) allocation model, we introduce two best-fit heuristics for the CU allocation decision: 1) the CU depth first search (CU-DFS) in O(kN$_{f}$ ) time and 2) the CU adjacent search (CU-AS) in O(k2$^{k}$ ) time. By the simulation study, the system performance of these two CU allocation strategies was also investigated. Our simulation results showed that the CU-AS and CU-DFS strategies performed the same system performance when applied for the reconfigurable MSIMD/MIMD 2-D and 3-D mesh-connected architectures.

  • PDF

Parallelization mathod of IDCT with SIMD for fast HEVC decoding (HEVC 고속 복호화를 위한 SIMD 기반의 IDCT 병렬 프로그래밍 기법)

  • Hong, Seungbo;Choi, Kiho;Park, Sang-Hyo;Jang, Euee Seon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2013.06a
    • /
    • pp.113-116
    • /
    • 2013
  • 최근 방송, 의료, 우주산업, 게임, UCC, 핸드폰 등 여러 사업 분야에 걸쳐 실제에 근접한 영상을 요구하고 있고 이것은 3D와 Ultra High Definition (UHD) 영상의 출현으로 현실화 되고 있다. UHD 급에 걸맞는 압축률을 위해 Joint Collaborative Team on Video Coding (JCT-VC) 에서는 MPEG-4 Part 10 AVC/H.264를 뒤이을 차세대 코덱으로 High Efficiency Video Coding (HEVC) 를 개발을 시작했다. HEVC는 기존 MPEG-4 Part 10 AVC/H.264코덱과 비교해 40%이상의 압축률을 나타내지만 복잡도 역시 상승했다. 특히 복호화기에서 복잡도는 중요한 요소이며, 역 코사인변환 (Inverse Discrete Cosine Transform, IDCT) 은 전체 복호화시간의 8% ~ 16%를 차지하는 알고리즘이다. 본 논문에서는 IDCT 의 수행시간을 줄이기 위해 병렬프로그래밍 중의 하나인 SIMD명령어를 사용하여 효율적으로 병렬화 프로그래밍을 하는 기법들을 제안한다. 본 제안 기법은 IDCT 수행시간을 평균 59% 단축하는 결과를 보였다.

  • PDF

Pipelined Parallel Processing System for Image Processing (영상처리를 위한 Pipelined 병렬처리 시스템)

  • Lee, Hyung;Kim, Jong-Bae;Choi, Sung-Hyk;Park, Jong-Won
    • Journal of IKEEE
    • /
    • v.4 no.2 s.7
    • /
    • pp.212-224
    • /
    • 2000
  • In this paper, a parallel processing system is proposed for improving the processing speed of image related applications. The proposed parallel processing system is fully synchronous SIMD computer with pipelined architecture and consists of processing elements and a multi-access memory system. The multi-access memory system is made up of memory modules and a memory controller, which consists of memory module selection module, data routing module, and address calculating and routing module, to perform parallel memory accesses with the variety of types: block, horizontal, and vertical access way. Morphological filter had been applied to verify the parallel processing system and resulted in faithful processing speed.

  • PDF

An Efficient high-speed reverse conversion method of the SIMD base for the decoder of the H.264 (H.264의 복호화기를 위한 SIMD기반의 효율적인 고속 역 변환 방법)

  • Yu Sang-Jun;Kim Seong-Hoon;Oh Seoung-Jun;Sohn Chae-Bong;Ahn Chang-Beom;Park Ho-Chong
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2004.11a
    • /
    • pp.99-102
    • /
    • 2004
  • 본 논문에서는 SIMD 명령어를 이용하여 H.264 복호화기의 역 정수 변환 과정과 역 양자화 과정을 고속으로 처리 할 수 있는 방법을 제안한다. 제안하는 고속 역 변환 방법을 ZERO 블록에 대하여 역 변환과 역 양자화 과정을 수행하지 않음으로써 속도 향상을 얻을 수 있다. 움직임이 적은 Akiyo 영상에서는 QP=0일 때 참조 코드(reference code)의 역 정수 변환과 역 양자화 과정에 비하여 7.52배, QP=24인 경우 8.1배의 속도 향상을 얻을 수 있다. 또한 움직임이 많은 Stefan 영상에 대해서는 QP=0일 때 고속 역 변환 방법이 참조 코드의 역 정수 변환과 역 양자화 과정에 비하여 6.7배. QP=36인 경우 7.83배의 속도 향상을 얻을 수 있다

  • PDF

Parallel Processing System with combined Architecture of SIMD with MIMD (SIMD와 MIMD가 결합된 구조를 갖는 병렬처리시스템)

  • Lee, Hyung;Choi, Sung-Hyuk;Kim, Jung-Bae;Park, Jong-Won
    • The KIPS Transactions:PartA
    • /
    • v.8A no.1
    • /
    • pp.9-15
    • /
    • 2001
  • 영상에 관련된 다양한 응용 시스템들을 구현하는 많은 연구들이 진행되어 왔지만, 그러한 영상 관련 응용 시스템을 구현함에 있어서 처리속도의 저하로 인하여 많은 어려움을 겪고 있다. 이를 해결하기 위해 대두된 여러 방법들 중에서 최근 하드웨어 접근 방법에 고려한 많은 관심과 연구가 진행되고 있다. 본 논문은 영상을 실시간으로 처리하기 위하여 하드웨어 구조를 갖는 병렬처리시스템을 기술하며, 또한 병렬처리시스템을 얼굴 검색 시스템에 적용한 후 처리속도 및 실험 결과를 기술한다. 병렬처리시스템은 SIMD와 MIMD가 결합된 구조를 갖고 있기 때문에 다양한 영상 응용시스템에 대해서 융통성과 효율성을 제공하며, 144개의 처리기와 12개의 다중접근기억장치, 외부 메모리 모듈을 위한 인터페이스와 외부 프로세서 장치(i960Kx)와의 통신을 위한 인터페이스로 구성되어있다. 다중접근기억장치는 메모리 모듈선택회로, 데이터 라이팅회로, 그리고, 주소계산 및 라우팅회로로 구성되어 있다. 또한 얼굴 검색 시스템을 병렬처리 시스템에 적합한 병렬화를 제공하기 위해 메쉬방법을 이용하여 전처리, 정규화, 4개 특징값 추출, 그리고 분류화로 구성하였다. 병렬처리시스템은 하드웨어 모의실험 패키지인 CADENCE사의 Verilog-XL로 모의실험을 수행하여 기능과 성능을 검증하였다.

  • PDF