• Title/Summary/Keyword: Parallel computing model

Search Result 171, Processing Time 0.022 seconds

Implementation and Performance Evaluation of an Object-Oriented Parallel Programming Environment with Multithreaded Computational Model (다중스레드 계산 모델을 이용한 병렬 객체 지향 프로그래밍 환경의 구현 및 성능 평가)

  • Song, Jong-Hun;Kim, Heung-Hwan;Han, Sang-Yeong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.6
    • /
    • pp.708-718
    • /
    • 1999
  • 본 논문에서 제안하는 시스템은 일반적인 병렬 시스템의 하드웨어 구조에서, 다중 스레드 계산 모델을 이용하여 객체 지향 프로그래밍 환경을 구현한 시스템이다. 제안하는 시스템을 효과적으로 구현하기 위하여 컴파일러와 실행 시간 시스템의 측면에서 여러 가지 기법을 제시한다. 컴파일러의 측면에서는 멤버 변수의 접근 분석, 메소드의 병렬성 분석 기법을 제시하고, 실행 시간 시스템에서는 실시간 스레드/메시지 결합, 프레임 공유 기법을 제시한다. 본 논문에서 제안된 프로그래밍 환경은, MPI 메시지 인터페이스를 이용하여 구현하였으며, 벤치마크 프로그램을 실행함으로써 성능 분석을 하였다. 분석의 결과는 실행시간 시스템의 여러 가지 기법들이 성능 향상에 많은 효과가 있음을 보여주며, 이러한 결과는 일반적인 병렬 시스템에서도 적용 가능하다.Abstract In this paper, we suggest an object-oriented programming environment with multithreaded computation model on general parallel processing systems. We developed many methods for our environment to be efficient : in compiler, the analysis of member variable and method parallelism, and in runtime system, thread/message merging and frame sharing. The programming environment is implemented with MPI message interface, and its performance is analyzed with executing benchmark programs. The results show that the developed methods have influence on performance improvement, and this improvement can be applied to general parallel processing systems.

EFFICIENT COMPUTATION OF COMPRESSIBLE FLOW BY HIGHER-ORDER METHOD ACCELERATED USING GPU (고차 정확도 수치기법의 GPU 계산을 통한 효율적인 압축성 유동 해석)

  • Chang, T.K.;Park, J.S.;Kim, C.
    • Journal of computational fluids engineering
    • /
    • v.19 no.3
    • /
    • pp.52-61
    • /
    • 2014
  • The present paper deals with the efficient computation of higher-order CFD methods for compressible flow using graphics processing units (GPU). The higher-order CFD methods, such as discontinuous Galerkin (DG) methods and correction procedure via reconstruction (CPR) methods, can realize arbitrary higher-order accuracy with compact stencil on unstructured mesh. However, they require much more computational costs compared to the widely used finite volume methods (FVM). Graphics processing unit, consisting of hundreds or thousands small cores, is apt to massive parallel computations of compressible flow based on the higher-order CFD methods and can reduce computational time greatly. Higher-order multi-dimensional limiting process (MLP) is applied for the robust control of numerical oscillations around shock discontinuity and implemented efficiently on GPU. The program is written and optimized in CUDA library offered from NVIDIA. The whole algorithms are implemented to guarantee accurate and efficient computations for parallel programming on shared-memory model of GPU. The extensive numerical experiments validates that the GPU successfully accelerates computing compressible flow using higher-order method.

Efficient Processing of Huge Airborne Laser Scanned Data Utilizing Parallel Computing and Virtual Grid (병렬처리와 가상격자를 이용한 대용량 항공 레이저 스캔 자료의 효율적인 처리)

  • Han, Soo-Hee;Heo, Joon;Lkhagva, Enkhbaatar
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.4
    • /
    • pp.21-26
    • /
    • 2008
  • A method for processing huge airborne laser scanned data using parallel computing and virtual grid is proposed and the method is tested by generating raster DSM(Digital Surface Model) with IDW(Inverse Distance Weighting). Parallelism is involved for fast interpolation of huge point data and virtual grid is adopted for enhancing searching efficiency of irregularly distributed point data. Processing time was checked for the method using cluster constituted of one master node and six slave nodes, resulting in efficiency near to 1 and load scalability property. Also large data which cannot be processed with a sole system was processed with cluster system.

  • PDF

Three Dimensional FE Analysis of Acoustic Emission of Composite Plate (복합재료 파손 시 발생하는 음향방출의 3차원 유한요소 해석)

  • Paik, Seung-Hoon;Park, Si-Hyong;Kim, Seung Jo
    • Composites Research
    • /
    • v.18 no.5
    • /
    • pp.15-20
    • /
    • 2005
  • In this paper, damage induced acoustic emission in the composite plate in numerically simulated by using the three dimensional finite element method and explicit time integration. Acoustic source is modeled by equivalent volume source. To verify the proposed method, dynamic displacements due to the elastic wave are compared with the experiment when the fiber is broken in the single fiber embedded isotropic plate. For the laminated composite plates, the results are compared between homogenized model and DNS approach which models fibers and matrix separately. To capture high frequencies in the elastic wave, small time step size and a large number of meshes are required. The parallel computing technology is introduced to solve a large scale problem efficiently.

Numerical simulation on fluid-structure interaction of wind around super-tall building at high reynolds number conditions

  • Huang, Shenghong;Li, Rong;Li, Q.S.
    • Structural Engineering and Mechanics
    • /
    • v.46 no.2
    • /
    • pp.197-212
    • /
    • 2013
  • With more and more high-rise building being constructed in recent decades, bluff body flow with high Reynolds number and large scale dimensions has become an important topic in theoretical researches and engineering applications. In view of mechanics, the key problems in such flow are high Reynolds number turbulence and fluid-solid interaction. Aiming at such problems, a parallel fluid-structure interaction method based on socket parallel architecture was established and combined with the methods and models of large eddy simulation developed by authors recently. The new method is validated by the full two-way FSI simulations of 1:375 CAARC building model with Re = 70000 and a full scale Taipei101 high-rise building with Re = 1e8, The results obtained show that the proposed method and models is potential to perform high-Reynolds number LES and high-efficiency two-way coupling between detailed fluid dynamics computing and solid structure dynamics computing so that the detailed wind induced responses for high-rise buildings can be resolved practically.

Implememtation of Fast Rasterizer processing using GPGPU based on SIMT structure (SIMT 구조 기반 GPGPU를 이용한 고속 Rasterizer 구현)

  • Kim, Chiyong
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.276-279
    • /
    • 2017
  • In this paper, SIMT structure based GPGPU (General Purpose Computing on Graphics Processing Units) is used for accelerating the Rasterizer which constitutes the screen of the display device in pixel unit. The GPU has a large number of ALUs, and the processing is very fast because of parallel processing. Therefore, in this paper, we implemented a rasterizer that generates a 3D graphics model using a CPU that performs operations sequentially and a GPU that performs operations in parallel. We confirmed that proposed rasterizer in this paper is 1.45 times better than rasterizer using Intel CPU when generating one frame.

Numerical Study on the Drag of a Car Model under Road Condition (주행조건에서의 자동차 모델 항력에 대한 수치해석적 연구)

  • Kim, Beom-Jun;Kang, Sung-Woo;Choi, Hyoung-gwon;Yoo, Jung-Yul
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.27 no.8
    • /
    • pp.1182-1190
    • /
    • 2003
  • A parallelized FEM code based on domain decomposition method has been recently developed for large-scale computational fluid dynamics. A 4-step splitting finite element algorithm is adopted for unsteady flow computation of the incompressible Navier-Stokes equation, and Smagorinsky LES model is chosen for turbulent flow computation. Both METIS and MPI Libraries are used for domain partitioning and data communication between processors, respectively. Tiburon model of Hyundai Motor Company is chosen as the computational model at Re=7.5 $\times$ 10$^{5}$ , which is based on the car height. The calculation is carried out under both the wind tunnel condition and the road condition using IBM SP parallel architecture at KISTI Super Computing Center. Compared with the existing experimental data, both the velocity and pressure fields are predicted reasonably well and the drag coefficient is in good agreement. Furthermore, it is confirmed that the drag under the road condition is smaller than that under the wind-tunnel condition.

Adaptive Application Component Mapping for Parallel Computation Offloading in Variable Environments

  • Fan, Wenhao;Liu, Yuan'an;Tang, Bihua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.11
    • /
    • pp.4347-4366
    • /
    • 2015
  • Distinguished with traditional strategies which offload an application's computation to a single server, parallel computation offloading can promote the performance by simultaneously delivering the computation to multiple computing resources around the mobile terminal. However, due to the variability of communication and computation environments, static application component multi-partitioning algorithms are difficult to maintain the optimality of their solutions in time-varying scenarios, whereas, over-frequent algorithm executions triggered by changes of environments may bring excessive algorithm costs. To this end, an adaptive application component mapping algorithm for parallel computation offloading in variable environments is proposed in this paper, which aims at minimizing computation costs and inter-resource communication costs. It can provide the terminal a suitable solution for the current environment with a low incremental algorithm cost. We represent the application component multi-partitioning problem as a graph mapping model, then convert it into a pathfinding problem. A genetic algorithm enhanced by an elite-based immigrants mechanism is designed to obtain the solution adaptively, which can dynamically adjust the precision of the solution and boost the searching speed as transmission and processing speeds change. Simulation results demonstrate that our algorithm can promote the performance efficiently, and it is superior to the traditional approaches under variable environments to a large extent.

Efficient Parallel CUDA Random Number Generator on NVIDIA GPUs (NVIDIA GPU 상에서의 난수 생성을 위한 CUDA 병렬프로그램)

  • Kim, Youngtae;Hwang, Gyuhyeon
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1467-1473
    • /
    • 2015
  • In this paper, we implemented a parallel random number generation program on GPU's, which are known for high performance computing, using LCG (Linear Congruential Generator). Random numbers are important in all fields requiring the use of randomness, and LCG is one of the most widely used methods for the generation of pseudo-random numbers. We explained the parallel program using the NVIDIA CUDA model and MPI(Message Passing Interface) and showed uniform distribution and performance results. We also used a Monte Carlo algorithm to calculate pi(${\pi}$) comparing the parallel random number generator with cuRAND, which is a CUDA library function, and showed that our program is much more efficient. Finally we compared performance results using multi-GPU's with those of ideal speedups.

A topology optimization method of multiple load cases and constraints based on element independent nodal density

  • Yi, Jijun;Rong, Jianhua;Zeng, Tao;Huang, X.
    • Structural Engineering and Mechanics
    • /
    • v.45 no.6
    • /
    • pp.759-777
    • /
    • 2013
  • In this paper, a topology optimization method based on the element independent nodal density (EIND) is developed for continuum solids with multiple load cases and multiple constraints. The optimization problem is formulated ad minimizing the volume subject to displacement constraints. Nodal densities of the finite element mesh are used a the design variable. The nodal densities are interpolated into any point in the design domain by the Shepard interpolation scheme and the Heaviside function. Without using additional constraints (such ad the filtering technique), mesh-independent, checkerboard-free, distinct optimal topology can be obtained. Adopting the rational approximation for material properties (RAMP), the topology optimization procedure is implemented using a solid isotropic material with penalization (SIMP) method and a dual programming optimization algorithm. The computational efficiency is greatly improved by multithread parallel computing with OpenMP to run parallel programs for the shared-memory model of parallel computation. Finally, several examples are presented to demonstrate the effectiveness of the developed techniques.