• 제목/요약/키워드: Parallel Matrix Solver

검색결과 19건 처리시간 0.025초

병렬 컴퓨터를 이용한 형상 압연공정 유한요소 해석의 분산병렬처리에 관한 연구 (Finite Element Analysis of Shape Rolling Process using Destributive Parallel Algorithms on Cray T3E)

  • 권기찬;윤성기
    • 대한기계학회논문집A
    • /
    • 제24권5호
    • /
    • pp.1215-1230
    • /
    • 2000
  • Parallel Approaches using Cray T3E which is NIPP (Massively Parallel Processors) machine are presented for the efficient computation of the finite element analysis of 3-D shape rolling processes. D omain decomposition method coupled with parallel linear equation solver is used. Domain decomposition is applied for obtaining element tangent stifffiess matrices and residual vectors. Direct and iterative parallel algorithms are used for solving the linear equations. Direct algorithm is_parallel version of direct banded matrix solver. For iterative algorithms, the well-known preconditioned conjugate gradient solver with Jacobi preconditioner is also employed. Moreover a new effective iterative scheme with block inverse matrix preconditioner, which is named by present authors, is presented and its results are compared with the one using Jacobi preconditioner. PVM and MPI are used for message passing and synchronization between processors. The performance and efficiency of each algorithm is discussed and comparisons are made among different algorithms.

AN ASYNCHRONOUS PARALLEL SOLVER FOR SOME MATRIX PROBLEMS

  • Park, Pil-Seong
    • Journal of applied mathematics & informatics
    • /
    • 제7권3호
    • /
    • pp.1045-1058
    • /
    • 2000
  • In usual synchronous parallel computing, workload balance is a crucial factor to reduce idle times of some processors that have finished their jobs earlier than others. However, it is difficult to achieve on a heterogeneous workstation clusters where the available computing power of each processor is unpredictable. As a way to overcome such a problem, the idea of asynchronous methods has grown out and is being increasingly used and studied, but there is none for eigenvalue problems yet. In this paper, we suggest a new asynchronous method to solve some singular matrix problems, that can also be used for finding a certain eigenvector of some matrices.

A Fast Poisson Solver of Second-Order Accuracy for Isolated Systems in Three-Dimensional Cartesian and Cylindrical Coordinates

  • Moon, Sanghyuk;Kim, Woong-Tae;Ostriker, Eve C.
    • 천문학회보
    • /
    • 제44권1호
    • /
    • pp.46.1-46.1
    • /
    • 2019
  • We present an accurate and efficient method to calculate the gravitational potential of an isolated system in three-dimensional Cartesian and cylindrical coordinates subject to vacuum (open) boundary conditions. Our method consists of two parts: an interior solver and a boundary solver. The interior solver adopts an eigenfunction expansion method together with a tridiagonal matrix solver to solve the Poisson equation subject to the zero boundary condition. The boundary solver employs James's method to calculate the boundary potential due to the screening charges required to keep the zero boundary condition for the interior solver. A full computation of gravitational potential requires running the interior solver twice and the boundary solver once. We develop a method to compute the discrete Green's function in cylindrical coordinates, which is an integral part of the James algorithm to maintain second-order accuracy. We implement our method in the {\tt Athena++} magnetohydrodynamics code, and perform various tests to check that our solver is second-order accurate and exhibits good parallel performance.

  • PDF

영역 분할에 의한 SIMPLER 모델의 병렬화와 성능 분석 (Implementation and Performance Analysis of a Parallel SIMPLER Model Based on Domain Decomposition)

  • 곽호상;이상산
    • 한국전산유체공학회지
    • /
    • 제3권1호
    • /
    • pp.22-29
    • /
    • 1998
  • Parallel implementation is conducted for a SIMPLER finite volume model. The present parallelism is based on domain decomposition and explicit message passing using MPI and SHMEM. Two parallel solvers to tridiagonal matrix equation are employed. The implementation is verified on the Cray T3E system for a benchmark problem of natural convection in a sidewall-heated cavity. The test results illustrate good scalability of the present parallel models. Performance issues are elaborated in view of convergence as well as conventional parallel overheads and single processor performance. The effectiveness of a localized matrix solution algorithm is demonstrated.

  • PDF

고성능 병렬 유한요소 솔버를 이용한 3차원 주시와 진폭계산 (3-D Traveltime and Amplitude Calculation using High-performance Parallel Finite-element Solver)

  • 양동우;김정호
    • 지구물리와물리탐사
    • /
    • 제7권4호
    • /
    • pp.234-244
    • /
    • 2004
  • 주파수 영역 유한요소 파동방정식의 3차원 모델링은 거대한 크기의 산재행렬(sparse matrix)인 임피던스 행렬을 풀어야 한다. 이러한 이유 때문에 파동방정식의 3차원 모델링은 주로 시간 영역에서 이루어지고 있다. 이 연구는 주파수 영역 파동방정식의 유한요소 3차원 모델링 연구의 일환으로 라플라스 영역에서 1개 주파수에 대한 파동방정식 해를 이용하여 주시와 진폭을 계산할 수 있는 SWEET(Suppressed Wave Equation Estimation of Traveltime) 알고리즘과 병렬 유한요소 솔버를 결합하여 주파수 영역 3차원 모델링을 시도 하였다. 이렇게 계산된 주시와 진폭은 파선이론에 기반하여 계산된 주시와 진폭과 달리 급경사 구조 또는 수평 속도의 비가 큰 곳에서도 정확하게 계산되며, Kirchhoff 구조보정에 유용하게 사용될 수 있다. 연구의 결과를 검증하기 위하여 SEG/EAGE 3D 암염 모델의 주시와 진폭 계산에 적용하여 이를 검증하였다.

Parallel Algorithm of Conjugate Gradient Solver using OpenGL Compute Shader

  • Va, Hongly;Lee, Do-keyong;Hong, Min
    • 한국컴퓨터정보학회논문지
    • /
    • 제26권1호
    • /
    • pp.1-9
    • /
    • 2021
  • OpenGL compute shader는 다른 shader 단계와 다르게 동작하며, 병렬로 모든 데이터를 계산하는데 사용할 수 있다. 본 논문은 OpenGL compute shader에서 반복 켤레 기울기 방법을 통해 희소선형 시스템을 계산하기 위한 GPU 기반의 병렬 알고리즘 제안하였다. 제안된 희소 선형 해결 방법은 대칭인 양의 정부호 행렬과 같은 대형 선형 시스템을 해결하기 위해 사용된다. 본 논문은 이 알고리즘을 사용하여 매트릭스 형식이 다른 8가지 예제들에 대해서 CPU와 GPU를 기반으로한 성능 비교 결과를 제공한다. 본 논문은 4가지 잘 알려져 있는 매트릭스 형식(Dense, COO, ELL and CSR)을 매트릭스 저장소를 사용하였다. 8개의 희소 매트릭스를 사용한 성능 비교 실험에서 GPU 기반 선형 해결 시스템이 CPU 기반 선형 해결 시스템보다 훨씬 빠르며, GPU 기반에서 0.64ms, CPU 기반에서 15.37ms의 평균 컴퓨팅 시간을 제공한다.

PERFORMANCE ENHANCEMENT OF PARALLEL MULTIFRONTAL SOLVER ON BLOCK LANCZOS METHOD

  • Byun, Wan-Il;Kim, Seung-Jo
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • 제13권1호
    • /
    • pp.13-20
    • /
    • 2009
  • The IPSAP which is a finite element analysis program has been developed for high parallel performance computing. This program consists of various analysis modules - stress, vibration and thermal analysis module, etc. The M orthogonal block Lanczos algorithm with shiftinvert transformation is used for solving eigenvalue problems in the vibration module. And the multifrontal algorithm which is one of the most efficient direct linear equation solvers is applied to factorization and triangular system solving phases in this block Lanczos iteration routine. In this study, the performance enhancement procedures of the IPSAP are composed of the following stages: 1) communication volume minimization of the factorization phase by modifying parallel matrix subroutines. 2) idling time minimization in triangular system solving phase by partial inverse of the frontal matrix and the LCM (least common multiple) concept.

  • PDF

대형비대칭 이산행렬의 CRAY-T3E에서의 해법을 위한 확장가능한 병렬준비행렬 (A Scalable Parallel Preconditioner on the CRAY-T3E for Large Nonsymmetric Spares Linear Systems)

  • 마상백
    • 정보처리학회논문지A
    • /
    • 제8A권3호
    • /
    • pp.227-234
    • /
    • 2001
  • In this paper we propose a block-type parallel preconditioner for solving large sparse nonsymmetric linear systems, which we expect to be scalable. It is Multi-Color Block SOR preconditioner, combined with direct sparse matrix solver. For the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with Multi-Color ordering. Since most of the time is spent on the diagonal inversion, which is done on each processor, we expect it to be a good scalable preconditioner. We compared it with four other preconditioners, which are ILU(0)-wavefront ordering, ILU(0)-Multi-Color ordering, SPAI(SParse Approximate Inverse), and SSOR preconditiner. Experiments were conducted for the Finite Difference discretizations of two problems with various meshsizes varying up to $1025{\times}1024$. CRAY-T3E with 128 nodes was used. MPI library was used for interprocess communications, The results show that Multi-Color Block SOR is scalabl and gives the best performances.

  • PDF

클러스터 시스템에서 3차원 강소성 유한요소법의 병렬처리 (Parallel Processing of 3D Rigid-Plastic FEM on a Cluster System)

  • 최영;서용위
    • 한국정밀공학회지
    • /
    • 제22권1호
    • /
    • pp.122-129
    • /
    • 2005
  • On the cluster system, the parallel code of rigid-plastic FEM has been developed. The cluster system, Simforge, has 15 processors and the total memory is 4.5GBytes. In the developed parallel code, the distributed data of the column-wise partitioned stiffness are stored as the compressed row storage and the diagonal preconditioned conjugate gradient solver is applied. The analysis of block upsetting is performed with the parallel code on Simforge cluster system. In this paper, the analysis results are compared and discussed.

BioFET 시뮬레이션을 위한 CUDA 기반 병렬 Bi-CG 행렬 해법 (CUDA-based Parallel Bi-Conjugate Gradient Matrix Solver for BioFET Simulation)

  • 박태정;우준명;김창헌
    • 전자공학회논문지CI
    • /
    • 제48권1호
    • /
    • pp.90-100
    • /
    • 2011
  • 본 연구에서는 연산 부하가 매우 큰 Bio-FET 시뮬레이션을 위해 낮은 비용으로 대규모 병렬처리 환경 구축이 가능한 최신 그래픽 프로세서(GPU)를 이용해서 선형 방정식 해법을 수행하기 위한 병렬 Bi-CG(Bi-Conjugate Gradient) 방식을 제안한다. 제안하는 병렬 방식에서는 반도체 소자 시뮬레이션, 전산유체역학(CFD), 열전달 시뮬레이션 등을 포함한 다양한 분야에서 많은 연산량이 집중되어 전체 시뮬레이션에 필요한 시간을 증가시키는 포아송(Poisson) 방정식의 해를 병렬 방식으로 구한다. 그 결과, 이 논문의 테스트에서 사용된 FDM 3차원 문제 공간에서 단일 CPU 대비 연산 속도가 최대 30 배 이상 증가했다. 실제 구현은 NVIDIA의 태슬라 아키텍처(Tesla Architecture) 기반 GPU에서 범용 목적으로 병렬 프로그래밍이 가능한 NVIDIA사의 CUDA(Compute Unified Device Architecture) 환경에서 수행되었으며 기존 연구가 주로 32 비트 정밀도(single floating point) 실수 범위에서 수행된 것과는 달리 본 연구는 64 비트 정밀도(double floating point) 실수 범위로 수행되어 Bi-CG 해법의 수렴성을 개선했다. 특히, CUDA는 비교적 코딩이 쉬운 반면, 최적화가 어려운 특성이 있어 본 논문에서는 제안하는 Bi-CG 해법에서의 최적화 방향도 논의한다.