• 제목/요약/키워드: parallel library

검색결과 188건 처리시간 0.028초

Parallel Finite Element Analysis of the Drag of a Car under Road Condition

  • Choi H. G.;Kim B. J.;Kim S. W.;Yoo J. Y.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 한국전산유체공학회 2003년도 The Fifth Asian Computational Fluid Dynamics Conference
    • /
    • pp.84-85
    • /
    • 2003
  • A parallelized FEM code based on domain decomposition method has been recently developed for a large scale computational fluid dynamics. A 4-step splitting finite element algorithm is adopted for unsteady computation of the incompressible Navier-Stokes equation, and Smagorinsky LES(Large Eddy Simulation) model is chosen for turbulent flow computation. Both METIS and MPI library are used for domain partitioning and data communication between processors respectively. Tiburon of Hyundai-motor is chosen as the computational model at $Re=7.5{\times}10^{5}$, which is based on the car height. It is confirmed that the drag under road condition is smaller than that of wind tunnel condition.

  • PDF

부구조법에 의한 영역 분할 및 강소성 유한요소해석의 병렬 계산 (Domain Decomposition using Substructuring Method and Parallel Comptation of the Rigid-Plastic Finite Element Analysis)

  • 박근;양동열
    • 한국소성가공학회:학술대회논문집
    • /
    • 한국소성가공학회 1998년도 춘계학술대회논문집
    • /
    • pp.246-249
    • /
    • 1998
  • In the present study, domain decomposition using the substructuring method is developed for the computational efficiency of the finite element analysis of metal forming processes. In order to avoid calculation of an inverse matrix during the substructuring procedure, the modified Cholesky decomposition method is implemented. As obtaining the data independence by the substructuring method, the program is easily parallelized using the Parallel Virtual Machine(PVM) library on a workstation cluster connected on networks. A numerical example for a simple upsetting is calculated and the speed-up ratio with respect to various domain decompositions and number of processors. Comparing the results, it is concluded that the improvement of performance is obtained through the proposed method.

  • PDF

Chimera 기법의 병렬처리에 관한 연구 (A Study of Parallel Implementations of the Chimera Method)

  • 조금원;권장혁;이승수
    • 한국전산유체공학회:학술대회논문집
    • /
    • 한국전산유체공학회 1999년도 춘계 학술대회논문집
    • /
    • pp.35-47
    • /
    • 1999
  • The development of a parallelized aerodynamic simulation process involving moving bodies is presented. The implementation of this process is demonstrated using a fully systemized Chimera methodology for steady and unsteady problems. This methodology consist of a Chimera hole-cutting, a new cut-paste algorithm for optimal mesh. interface generation and a two-step search method for donor cell identification. It is fully automated and requires minimal user input. All procedures of the Chimera technique are parallelized on the Cray T3E using the MPI library. Two and three-dimensional examples are chosen to demonstate the effectiveness and parallel performance of this procedure.

  • PDF

병렬 연산을 이용한 축류 블레이드의 역설계 (The Inverse Design Technique of Axial Blade Using the Parallel Calculation)

  • 조장근;안재성;박원규
    • 유체기계공업학회:학술대회논문집
    • /
    • 유체기계공업학회 1999년도 유체기계 연구개발 발표회 논문집
    • /
    • pp.200-207
    • /
    • 1999
  • An efficient inverse design technique based on the MGM (Modified Garabedian-McFadden) method has been developed. The 2-D Navier-Stokes equations are solved for obtaining the surface pressure distributions and coupled with the MGM method to perform the inverse design. The solver is parallelized by using the domain decomposition method and the standard MPI library for communications between the processors. The MGM method is a residual-correction technique, in which the residuals are the difference between the desired and the computed pressure distribution. The developed code was applied to several airfoil shapes and the axial blade. It has been found that they are well converged to their target pressure distribution.

  • PDF

AN ASSESSMENT OF PARALLEL PRECONDITIONERS FOR THE INTERIOR SPARSE GENERALIZED EIGENVALUE PROBLEMS BY CG-TYPE METHODS ON AN IBM REGATTA MACHINE

  • Ma, Sang-Back;Jang, Ho-Jong
    • Journal of applied mathematics & informatics
    • /
    • 제25권1_2호
    • /
    • pp.435-443
    • /
    • 2007
  • Computing the interior spectrum of large sparse generalized eigenvalue problems $Ax\;=\;{\lambda}Bx$, where A and b are large sparse and SPD(Symmetric Positive Definite), is often required in areas such as structural mechanics and quantum chemistry, to name a few. Recently, CG-type methods have been found useful and hence, very amenable to parallel computation for very large problems. Also, as in the case of linear systems proper choice of preconditioning is known to accelerate the rate of convergence. After the smallest eigenpair is found we use the orthogonal deflation technique to find the next m-1 eigenvalues, which is also suitable for parallelization. This offers advantages over Jacobi-Davidson methods with partial shifts, which requires re-computation of preconditioner matrx with new shifts. We consider as preconditioners Incomplete LU(ILU)(0) in two variants, ever-relaxation(SOR), and Point-symmetric SOR(SSOR). We set m to be 5. We conducted our experiments on matrices from discretizations of partial differential equations by finite difference method. The generated matrices has dimensions up to 4 million and total number of processors are 32. MPI(Message Passing Interface) library was used for interprocessor communications. Our results show that in general the Multi-Color ILU(0) gives the best performance.

MPI를 이용한 판재성형해석 프로그램의 병렬화 (Parallelization of sheet forming analysis program using MPI)

  • 김의중;서영성
    • 대한기계학회논문집A
    • /
    • 제22권1호
    • /
    • pp.132-141
    • /
    • 1998
  • A parallel version of sheet forming analysis program was developed. This version is compatible with any parallel computers which support MPI that is one of the most recent and popular message passing libraries. For this purpose, SERI-SFA, a vector version which runs on Cray Y-MP C90, a sequential vector computer, was used as a source code. For the sake of the effectiveness of the work, the parallelization was focused on the selected part after checking the rank of CPU consumed from the exemplary calculation on Cray Y-MP C90. The subroutines associated with contact algorithm was selected as targe parts. For this work, MPI was used as a message passing library. For the performance verification, an oil pan and an S-rail forming simulation were carried out. The performance check was carried out by the kernel and total CPU time along with theoretical performance using Amdahl's Law. The results showed some performance improvement within the limit of the selective paralellization.

선형 어레이 SliM-II 이미지 프로세서 칩 (A linear array SliM-II image processor chip)

  • 장현만;선우명훈
    • 전자공학회논문지C
    • /
    • 제35C권2호
    • /
    • pp.29-35
    • /
    • 1998
  • This paper describes architectures and design of a SIMD type parallel image processing chip called SliM-II. The chiphas a linear array of 64 processing elements (PEs), operates at 30 MHz in the worst case simulation and gives at least 1.92 GIPS. In contrast to existing array processors, such as IMAP, MGAP-2, VIP, etc., each PE has a multiplier that is quite effective for convolution, template matching, etc. The instruction set can execute an ALU operation, data I/O, and inter-PE communication simulataneously in a single instruction cycle. In addition, during the ALU/multiplier operation, SliM-II provides parallel move between the register file and on-chip memory as in DSP chips, SliM-II can greatly reduce the inter-PE communication overhead, due to the idea a sliding, which is a technique of overlapping inter-PE communication with computation. Moreover, the bandwidth of data I/O and inter-PE communication increases due to bit-parallel data paths. We used the COMPASS$^{TM}$ 3.3 V 0.6.$\mu$m standrd cell library (v8r4.10). The total number of transistors is about 1.5 muillions, the core size is 13.2 * 13.0 mm$^{2}$ and the package type is 208 pin PQ2 (Power Quad 2). The performance evaluation shows that, compared to a existing array processors, a proposed architeture gives a significant improvement for algorithms requiring multiplications.s.

  • PDF

병렬구조를 이용한 증강현실 구현 (Implementation of augmented reality using parallel structure)

  • 박태룡;허훈;곽재창
    • 전기전자학회논문지
    • /
    • 제17권3호
    • /
    • pp.371-377
    • /
    • 2013
  • 본 논문에서는 FAST와 BRIEF 알고리즘을 기반으로 하는 증강현실을 구현하기 위해서 효율적인 병렬 구조를 제안한다. 객체 인식 알고리즘으로 잘 알려진 SURF 알고리즘은 객체인식에 강인하지만 연산 량이 많아 실시간으로 구현하기에 어려운 단점을 가지고 있다. FAST와 BRIEF 알고리즘을 활용하여 객체를 인식하였고, 임베디드 환경에서 성능을 향상하기 위해 기존의 OpenMP 라이브러리를 사용한 병렬구조를 개선하여 속도를 약 70%에서 100%로 향상 시켰다.

JPEG2000 이산웨이블릿변환의 컨볼루션기반 non-cascaded 아키텍처를 위한 pipelined parallel 최적화 설계 (A Pipelined Parallel Optimized Design for Convolution-based Non-Cascaded Architecture of JPEG2000 DWT)

  • 이승권;공진흥
    • 대한전자공학회논문지SD
    • /
    • 제46권7호
    • /
    • pp.29-38
    • /
    • 2009
  • 본 연구에서는 실시간 이산웨이블릿변환을 위한 컨볼루션기반 non-cascaded 구조를 구현하고자 병렬곱셈기-중간버퍼-병렬누적기의 고성능 병렬파이프라인 연산회로를 설계하였다. 이산웨이블릿변환의 컨볼루션 곱셈연산은 필터계수의 대칭성과 업/다운 샘플링이 고려된 최적화를 통해서 1/4정도로 감소시킬 수 있으며, 화상데이터와 다수 필터계수들 간의 곱셈과정을 LUT기반의 병렬계수 DA 곱셈기 구조로 구현하면 3$\sim$5배 고속연산처리가 가능하게 된다. 또한 컨볼루션의 곱셈결과를 중간버퍼에 저장하여 누적가산 과정에서 재사용하면 전체 곱셈연산량을 1/2로 감소시켜 연산전력을 절약시킬 수 있다. 중간버퍼는 화상데이터와 필터계수들의 곱셈결과값들을 컨볼루션의 누적가산 과정을 위해 정렬시켜 저장하게 되는데, 이때 병렬누적가산기의 고속 순차검색을 위해 정렬된 병렬저장이 이루어지도록 버퍼관리 구조를 설계한다. 컨볼루션의 병렬곱셈기와 병렬누적가산기는 중간버퍼를 이용한 파이프라인을 구성하게 되는데, 파이프라인 연산처리 효율을 높이기 위해 병렬곱셈기의 연산처리 성능에 맞추어 누적가산기 및 중간버퍼의 병렬화 구조가 결정된다. 설계된 고성능 이산웨이블릿변환기의 성능을 검증하기 위해서 0.18um 라이브러리를 이용한 후반부 설계를 하였으며, 90MHz에서 SVGA(800$\sim$600)영상을 30fps로 실시간 처리함을 확인하였다.

고성능 H.264/AVC 디블로킹 필터를 위한 4-병렬 스케줄링 아키텍처 (A 4-parallel Scheduling Architecture for High-performance H.264/AVC Deblocking Filter)

  • 고병수;공진흥
    • 대한전자공학회논문지SD
    • /
    • 제49권8호
    • /
    • pp.63-72
    • /
    • 2012
  • 본 연구에서는 Quad FHD의 고해상도 동영상을 실시간 처리하는 고성능 H.264/AVC 디블로킹필터를 설계하였다. 연산처리 성능을 향상시키기 위해 라인에지필터 16개를 4개의 블록에지필터로 병렬 설계하였으며, 내부버퍼 크기와 연산 사이클을 줄이기 위해 H.264/AVC 디블로킹 필터 순서를 4단 병렬 지그재그 스캔 순서로 스케줄링하였다. 그리고 블록에지필터 연산 간 1사이클의 지연시간을 두어 데이터 충돌을 방지하고, 블록에지필터 간 내부버퍼를 인터리빙 버퍼로 구현하여 내부버퍼 크기를 줄였다. 0.18um 공정에서 시뮬레이션한 결과, 최대 동작주파수가 90MHz이며, 게이트 수는 140.16 Kgates이다. 제안하는 H.264/AVC 디블로킹필터는 동작주파수 90MHz에서 Quad FHD급 동영상($3840{\times}2160$)을 초당 113.17프레임으로 실시간 처리가 가능한 결과이다.