• Title/Summary/Keyword: Parallel computer architecture

Search Result 231, Processing Time 0.021 seconds

Transputer-based Pyramidal Parallel Array Computer(TPPAC) architecture (Prelimineary Version) (트랜스퓨터를 사용한 피라미드형 병렬 어레이 컴퓨터 (TPPAC) 구조)

  • Jeong, Chang-Sung;Jeong, Chul-Hwan
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.647-650
    • /
    • 1988
  • This paper proposes and sketches out a new parallel architecture of transputer-based pyramidal parallel array computer (TPPAC) used to process computationally intensive problems for geometric processing applications such as computer vision, image processing etc. It explores how efficiently the pyramid computer architecture is designed using transputer chips, and poses a new interconnection scheme for TPPAC without using additional transputers.

  • PDF

A Study on Efficient Executions of MPI Parallel Programs in Memory-Centric Computer Architecture

  • Lee, Je-Man;Lee, Seung-Chul;Shin, Dongha
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.1-11
    • /
    • 2020
  • In this paper, we present a technique that executes MPI parallel programs, that are developed on processor-centric computer architecture, more efficiently on memory-centric computer architecture without program modification. The technique we present here improves performance by replacing low-speed data communication over the network of MPI library functions with high-speed data communication using the property called fast large shared memory of memory-centric computer architecture. The technique we present in the paper is implemented in two programs. The first program is a modified MPI library called MC-MPI-LIB that runs MPI parallel programs more efficiently on memory-centric computer architecture preserving the semantics of MPI library functions. The second program is a simulation program called MC-MPI-SIM that simulates the performance of memory-centric computer architecture on processor-centric computer architecture. We developed and tested the programs on distributed systems environment deployed on Docker based virtualization. We analyzed the performance of several MPI parallel programs and showed that we achieved better performance on memory-centric computer architecture. Especially we could see very high performance on the MPI parallel programs with high communication overhead.

Design and Implementation of a Massively Parallel Multithreaded Architecture: DAVRID

  • Sangho Ha;Kim, Junghwan;Park, Eunha;Yoonhee Hah;Sangyong Han;Daejoon Hwang;Kim, Heunghwan;Seungho Cho
    • Journal of Electrical Engineering and information Science
    • /
    • v.1 no.2
    • /
    • pp.15-26
    • /
    • 1996
  • MPAs(Massively Parallel Architectures) should address two fundamental issues for scalability: synchronization and communication latency. Dataflow architecture faces problems of excessive synchronization overhead and inefficient execution of sequential programs while they offer the ability to exploit massive parallelism inherent in programs. In contrast, MPAs based on von Neumann computational model may suffer from inefficient synchronization mechanism and communication latency. DAVRID (DAtaflow/Von Neumann RISC hybrID) is a massively parallel multithreaded architecture which takes advantages of von Neumann and dataflow models. It has good single thread performance as well as tolerates synchronization and communication latency. In this paper, we describe the DAVRID architecture in detail and evaluate its performance through simulation runs over several benchmarks.

  • PDF

David II: A new architecture for parallel rendering processors with effective memory system (David II: 효과적인 메모리 시스템을 가지는 병렬 렌더링 프로세서)

  • Lee, Kil-Whan;Park, Woo-Chan;Kim, Il-San;Han, Tack-Don
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.1655-1658
    • /
    • 2004
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture, called DAVID II, resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that DAVID II achieves almost linear speedup at best case even in sixteen rasterizers.

  • PDF

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

An Extended Evaluation Algorithm in Parallel Deductive Database (병렬 연역 데이타베이스에서 확장된 평가 알고리즘)

  • Jo, U-Hyeon;Kim, Hang-Jun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.7
    • /
    • pp.1680-1686
    • /
    • 1996
  • The deterministic update method of intensional predicates in a parallel deductive database that deductive database is distributed in a parallel computer architecture in needed. Using updated data from the deterministic update method, a strategy for parallel evaluation of intensional predicates is required. The paper is concerned with an approach to updating parallel deductive database in which very insertion or deletion can be performed in a deterministic way, and an extended parallel semi-naive evaluation algorithm in a parallel computer architecture. After presenting an approach to updating intensional predicates and strategy for parallel evaluation, its implementation is discussed. A parallel deductive database consists of the set of facts being the extensional database and the set of rules being the intensional database. We assume that these sets are distributed in each processor, research how to update intensional predicates and evaluate using the update method. The parallel architecture for the deductive database consists of a set of processors and a message passing network to interconnect these processors.

  • PDF

Efficient Executions of MPI Parallel Programs in Memory-Centric Computer Architecture (메모리 중심 컴퓨터 구조에서 MPI 병렬 프로그램의 효율적인 수행)

  • Lee, Je-Man;Lee, Seung-Chul;Shin, Dong-Ha
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.257-258
    • /
    • 2019
  • 본 논문에서는 "프로세서 중심 컴퓨터 구조"에서 개발된 MPI 병렬 프로그램을 수정하지 않고 "메모리 중심 컴퓨터 구조"에서 더 효율적으로 수행시키는 기술을 제안한다. 본 연구에서 제안하는 기술은 메모리 중심 컴퓨터 구조가 가지는 "빠른 대용량 공유 메모리" 특징을 이용하여 MPI 표준 라이브러리가 수행하는 네트워크 통신을 통한 느린 데이터 전달을 공유 메모리를 통한 빠른 데이터 전달로 대체하여 효율성을 얻는다. 본 연구에서 제안한 기술은 도커 가상화 기술을 사용한 분산 시스템 환경에서 MC-MPI-LIB 라이브러리 및 MC-MPI-SIM 시뮬레이터로 구현되었으며 다수의 MPI 병렬 프로그램으로 시험 수행하여 효율성이 있음을 보였다.

  • PDF

PERFORMANCE OF A KNIGHT TOUR PARALLEL ALGORITHM ON MULTI-CORE SYSTEM USING OPENMP

  • VIJAYAKUMAR SANGAMESVARAPPA;VIDYAATHULASIRAMAN
    • Journal of applied mathematics & informatics
    • /
    • v.41 no.6
    • /
    • pp.1317-1326
    • /
    • 2023
  • Today's computers, desktops and laptops were build with multi-core architecture. Developing and running serial programs in this multi-core architecture fritters away the resources and time. Parallel programming is the only solution for proper utilization of resources available in the modern computers. The major challenge in the multi-core environment is the designing of parallel algorithm and performance analysis. This paper describes the design and performance analysis of parallel algorithm by taking the Knight Tour problem as an example using OpenMP interface. Comparison has been made with performance of serial and parallel algorithm. The comparison shows that the proposed parallel algorithm achieves good performance compared to serial algorithm.

Pipelined Parallel Processing System for Image Processing (영상처리를 위한 Pipelined 병렬처리 시스템)

  • Lee, Hyung;Kim, Jong-Bae;Choi, Sung-Hyk;Park, Jong-Won
    • Journal of IKEEE
    • /
    • v.4 no.2 s.7
    • /
    • pp.212-224
    • /
    • 2000
  • In this paper, a parallel processing system is proposed for improving the processing speed of image related applications. The proposed parallel processing system is fully synchronous SIMD computer with pipelined architecture and consists of processing elements and a multi-access memory system. The multi-access memory system is made up of memory modules and a memory controller, which consists of memory module selection module, data routing module, and address calculating and routing module, to perform parallel memory accesses with the variety of types: block, horizontal, and vertical access way. Morphological filter had been applied to verify the parallel processing system and resulted in faithful processing speed.

  • PDF

Design and Performance Analysis of the H/V-bus Parallel Computer (H/V-버스 병렬컴퓨터의 설계 및 성능 분석)

  • 김종현
    • Journal of the Korea Society for Simulation
    • /
    • v.3 no.1
    • /
    • pp.29-42
    • /
    • 1994
  • The architecture of a MIMD-type parallel computer system is specified: a simulator is developed to support design and evaluation of systems based on the architecture: and conducted with the simulator to evaluate system performance. The horizontal/vertical-bus(H/V-bus) system architecture provides an NxN array of processing elements which communicate with each other through a network of N horizontal buses and N vertical buses. The simulator, written in SLAM II and FORTRAN, is designed to provide high-resolution in simulating the IPC mechanism. Parameters provide the user with independent control of system size, PE speed and IPC mechanism speed. Results generated by the simulator include execution times, PE utilizations, queue lengths, and other data. The simulator is used to study system performance when a partial differential equation is solved by parallel Gauss-Seidel method. For comparisons, the benchmark is also executed on a single-bus system simulator that is derived from the H/V-bus system simulator. The benchmark is also solved on a single PE to obtain data for computing speedups. An extensive analysis of results is presented.

  • PDF