• Title/Summary/Keyword: PE(Processing Element

Search Result 72, Processing Time 0.031 seconds

Design and implementation of the SliM image processor chip (SliM 이미지 프로세서 칩 설계 및 구현)

  • 옹수환;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.10
    • /
    • pp.186-194
    • /
    • 1996
  • The SliM (sliding memory plane) array processor has been proposed to alleviate disadvantages of existing mesh-connected SIMD(single instruction stream- multiple data streams) array processors, such as the inter-PE(processing element) communication overhead, the data I/O overhead and complicated interconnections. This paper presents the deisgn and implementation of SliM image processor ASIC (application specific integrated circuit) chip consisting of mesh connected 5 X 5 PE. The PE architecture implemented here is quite different from the originally proposed PE. We have performed the front-end design, such as VHDL (VHSIC hardware description language)modeling, logic synthesis and simulation, and have doen the back-end design procedure. The SliM ASIC chip used the VTI 0.8$\mu$m standard cell library (v8r4.4) has 55,255 gates and twenty-five 128 X 9 bit SRAM modules. The chip has the 326.71 X 313.24mil$^{2}$ die size and is packed using the 144 pin MQFP. The chip operates perfectly at 25 MHz and gives 625 MIPS. For performance evaluation, we developed parallel algorithms and the performance results showed improvement compared with existing image processors.

  • PDF

Numerical Prediction of Incompressible Flows Using a Multi-Block Finite Volume Method on a Parellel Computer (병렬 컴퓨터에서 다중블록 유한체적법을 이용한 비압축성 유동해석)

  • Kang, Dong-Jin;Sohn, Jeong-Lak
    • The KSFM Journal of Fluid Machinery
    • /
    • v.1 no.1 s.1
    • /
    • pp.72-80
    • /
    • 1998
  • Computational analysis of incompressible flows by numerically solving Navier-Stokes equations using multi-block finite volume method is conducted on a parallel computing system. Numerical algorithms adopted in this study $include^{(1)}$ QUICK upwinding scheme for convective $terms,^{(2)}$ central differencing for other terms $and^{(3)}$ the second-order Euler differencing for time-marching procedure. Structured grids are used on the body-fitted coordinate with multi-block concept which uses overlaid grids on the block-interfacing boundaries. Computational code is parallelized on the MPI environment. Numerical accuracy of the computational method is verified by solving a benchmark test case of the flow inside two-dimensional rectangular cavity. Computation in the axial compressor cascade is conducted by using 4 PE's md, as results, no numerical instabilities are observed and it is expected that the present computational method can be applied to the turbomachinery flow problems without major difficulties.

  • PDF

Efficient Scientific Computation on Cray T3E (Cray T3E에서 효과적인 과학계산의 수행)

  • 김선경
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2000.11a
    • /
    • pp.483-489
    • /
    • 2000
  • 슈퍼컴퓨터는 여러 분야에서 많이 이용되고 있으며 특히 과학과 공학 분야에서 해결하려는 응용문제들은 더욱 빠른 컴퓨터에 대한 요구가 보다 많아지고 있다. 이미 단일 프로세서로는 그 요구를 충족시킬 수 없으며 따라서 병렬처리 기법의 도입이 불가피하다. 컴퓨터는 하드웨어만으로 모든 것이 해결되지 않는다. 하드웨어적인 특징을 극대화할 수 있는 알고리즘과 프로그램 등 소프트웨어 개발이 필수적이다. 본 논문에서는 아주 큰 행렬의 극한의 고유치(extreme eigenvalue)를 구하는 란초스(Lanczos) 알고리즘, 또한 아주 큰 선형시스템의 해를 구하는 GMRES방법에 대하여 병렬알고리즘을 제안하고 message-passing 병렬처리 컴퓨터에서 얼마나 효과적으로 수행할 수 있는지 분석한다. 초병렬 컴퓨터(MPP)인 Cray T3E는 128개의 PE(Processing Element)로 구성되어 있는데 사용하는 PE의 수에 따라 병렬알고리즘의 성능분석을 하였다.

  • PDF

A Real-Time Implementation of the Vision System for SMT Automation (SMT자동화를 위한 시각 시스템의 실시간 구현)

  • 전병환;윤일동;김용환;황신환;이상욱;최종수
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.6
    • /
    • pp.944-953
    • /
    • 1990
  • This paper describes design and implementation of a real-time high-precision vision system for an automation of SMT(surface mounting technology ). Also, a part inspection algorithm which calculates the position and direction of SMD(surface mounted device) accurately and performs the ruling using those information are presented, and a parallel processing technique for implementing those algorithms is also described. For a real-time implementation of iage acquisition and processing, several hardware modules, namely, multi-functional A/D-D/A board, frame memory board are developed. Particularly, a PE (processing element) board which employs the DSP56001 DSP (digital signal processor) is developed for the purpose of concurrent processing of part inspection algorithms. A stand-alone vision system is built by integration of the developed hardware modules and related softwares.

  • PDF

Parallel Processing Implementation of Discrete Hartley Transform using Systolic Array Processor Architecture (Systolic Array Processor Architecture를 이용한 Discrete Hartley Transform 의 병렬 처리 실행)

  • Kang, J.K.;Joo, C.H.;Choi, J.S.
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.14-16
    • /
    • 1988
  • With the development of VLSI technology, research on special processors for high-speed processing is on the increase and studies are focused on designing VLSI-oriented processors for signal processing. This paper processes a one-dimensional systolic array for Discrete Hartley Transform implementation and also processes processing element which is well described for algorithm. The discrete Hartley Transform(DHT) is a real-valued transform closely related to the DFT of a real-valued sequence can be exploited to reduce both the storage and the computation requried to produce the transform of real-valued sequence to a real-valued spectrum while preserving some of the useful properties of the DFT is something preferred. Finally, the architecture of one-dimensional 8-point systolic array, the detailed diagram of PE, total time units concept on implementation this arrays, and modularity are described.

  • PDF

Distributed Stream Processing System with apache Hadoop for PTAM on Xeon Phi Cluster (PTAM을 위한 제온파이 기반 하둡 분산 스트림 프로세싱 시스템)

  • Seo, Jae Min;Cho, Kyu Nam;Kim, Do Hyung;Jeong, Chang-Sung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.184-186
    • /
    • 2015
  • 본 논문에서는 PTAM을 위한 새로운 분산 스트림 프로세싱 시스템을 제안한다. PTAM은 하나의 시스템에서 동작하도록 설계되었다. 이는 PTAM이 가지고 있는 한계점을 말해주는 부분인데, PTAM은 Bundle Adjustment의 계산 부하가 커지는 경우에 map을 구축하는데 있어 많은 시간과 리소스가 필요하다. 이에 하둡을 통해 계산 부하를 분산하고, PE(Processing Element)를 Xeon phi 시스템을 통해 동작되는 시스템을 제안한다.

A Latency Optimization Mapping Algorithm for Hybrid Optical Network-on-Chip (하이브리드 광학 네트워크-온-칩에서 지연 시간 최적화를 위한 매핑 알고리즘)

  • Lee, Jae Hun;Li, Chang Lin;Han, Tae Hee
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.7
    • /
    • pp.131-139
    • /
    • 2013
  • To overcome the limitations in performance and power consumption of traditional electrical interconnection based network-on-chips (NoCs), a hybrid optical network-on-chip (HONoC) architecture using optical interconnects is emerging. However, the HONoC architecture should use circuit-switching scheme owing to the overhead by optical devices, which worsens the latency unfairness problem caused by frequent path collisions. This resultingly exert a bad influence in overall performance of the system. In this paper, we propose a new task mapping algorithm for optimizing latency by reducing path collisions. The proposed algorithm allocates a task to a certain processing element (PE) for the purpose of minimizing path collisions and worst case latencies. Compared to the random mapping technique and the bandwidth-constrained mapping technique, simulation results show the reduction in latency by 43% and 61% in average for each $4{\times}4$ and $8{\times}8$ mesh topology, respectively.

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

  • 신경욱;최병윤;이문기
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.9
    • /
    • pp.1115-1124
    • /
    • 1988
  • A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.

  • PDF

Hardware Module for Real-time Integer Pel Motion Estimation of H.264 (H.264 의 실시간 부호화를 위한 정수 단위 화소 움직임 예측 모듈 구조)

  • Shin, Ji-Yong;Lee, In-Jik;Kim, Shin-Dug
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.330-332
    • /
    • 2007
  • 본 논문에서는 정수 단위 화소 움직임 예측(ME: Motion Estimation)을 위한 Unsymmetrical-cross Multi-Hexagon-grid Search (UMHexagonS) 알고리즘을 기반으로, CIF 크기의 영상을 실시간으로 부호화 하기 위한 정수 단위 화소 움직임 예측 모듈을 제안한다. 제안하는 정수 단위 화소 움직임 예측 모듈은 32 개의 1 차원 연산유닛(PE: Processing Element) 배열, 데이터 선택/재배열 유닛, 내부버퍼, 그리고 트리 구조의 덧셈기로 구성되며, CIF 크기의 영상 100 프레임을 부호화 하기 위한 클럭 사이클을 계산하여 실험결과로 제시하였다. 그 결과 제안하는 구조는 400MHz 의 클럭 속도에서 CIF 크기의 영상을 실시간으로 부호화 할 수 있다.

  • PDF

AN IMPLEMENTATION AND EVALUATION OF RANDOMIZED-ANN SIMULATOR USING A PC CLUSTER

  • Morita, Yoshiharu;Nakagawa, Tohru;Kitagawa, Hajime
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2001.10a
    • /
    • pp.99-102
    • /
    • 2001
  • We propose a PC cluster using general-purpose microprocessors and a high-speed network for simulating ANN (Artificial Neural Network) processes on Linux OS. We apply this cluster to intelligent information processing such as ANN simulation. The elapsed time for simulating ANNs can be reduced from 7,295 seconds by a PE (Processing Element) to 1,226 seconds by six PEs. The reliability of a pattern-classification using ANNs can be improved by the proposed ANN, Randomized-ANN. In order to generate a Randomized-ANN, we choose three ANNs and combine the output results from three huts by means of logical AND. Results are as follows: The mean correct answer rate is 94.4%, the mean wrong answer rate is only 0.1 %, and the mean unknown answer rate is 5.5 %. We make sure that Randomized-ANN approach reduces the mean wrong answer rate within a tenth part and improves the reliability of Japanese coin classification.

  • PDF