• Title/Summary/Keyword: Parallel computation

Search Result 592, Processing Time 0.027 seconds

An Optimal Parallel Algorithm for Generating Computation Tree Form on Linear Array with Slotted Optical Buses (LASOB 상에서 계산 트리 형식을 생성하기 위한 최적 병렬 알고리즘)

  • Kim, Young-Hak
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.5
    • /
    • pp.475-484
    • /
    • 2000
  • Recently, processor arrays to enhance the banRecently, processor arrays to enhance the bandwidth of buses and to reduce the complexity of hardwares, using optical buses instead of electronic buses, have been proposed in manyliteratures. In this paper, we first propose a constant-time algorithm for parentheses matching problemon a linear array with slotted optical buses (LASOB).Then, given an algebraic expression of length n, we also propose a cost optimal parallel algorithmthat constructs computational tree form in the steps of constant time on LASOB with n processorsby using parentheses matching algorithm. A cost optimal parallel algorithm for this problem that runsin constant time has not yet been known on any parallel computation models.

  • PDF

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs (GP-GPU의 캐시메모리를 활용하기 위한 병렬 블록 LU 분해 프로그램의 구현)

  • Kim, Youngtae;Kim, Doo-Han;Yu, Myoung-Han
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.41-47
    • /
    • 2013
  • GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.

COMPUTATIONAL EFFICIENCY OF A MODIFIED SCATTERING KERNEL FOR FULL-COUPLED PHOTON-ELECTRON TRANSPORT PARALLEL COMPUTING WITH UNSTRUCTURED TETRAHEDRAL MESHES

  • Kim, Jong Woon;Hong, Ser Gi;Lee, Young-Ouk
    • Nuclear Engineering and Technology
    • /
    • v.46 no.2
    • /
    • pp.263-272
    • /
    • 2014
  • Scattering source calculations using conventional spherical harmonic expansion may require lots of computation time to treat full-coupled three-dimensional photon-electron transport in a highly anisotropic scattering medium where their scattering cross sections should be expanded with very high order (e.g., $P_7$ or higher) Legendre expansions. In this paper, we introduce a modified scattering kernel approach to avoid the unnecessarily repeated calculations involved with the scattering source calculation, and used it with parallel computing to effectively reduce the computation time. Its computational efficiency was tested for three-dimensional full-coupled photon-electron transport problems using our computer program which solves the multi-group discrete ordinates transport equation by using the discontinuous finite element method with unstructured tetrahedral meshes for complicated geometrical problems. The numerical tests show that we can improve speed up to 17~42 times for the elapsed time per iteration using the modified scattering kernel, not only in the single CPU calculation but also in the parallel computing with several CPUs.

Parallel Finite Element Simulation of the Incompressible Navier-stokes Equations (병렬 유한요소 해석기법을 이용한 유동장 해석)

  • Choi H. G.;Kim B. J.;Kang S. W.;Yoo J. Y.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2002.05a
    • /
    • pp.8-15
    • /
    • 2002
  • For the large scale computation of turbulent flows around an arbitrarily shaped body, a parallel LES (large eddy simulation) code has been recently developed in which domain decomposition method is adopted. METIS and MPI (message Passing interface) libraries are used for domain partitioning and data communication between processors, respectively. For unsteady computation of the incompressible Wavier-Stokes equation, 4-step splitting finite element algorithm [1] is adopted and Smagorinsky or dynamic LES model can be chosen fur the modeling of small eddies in turbulent flows. For the validation and performance-estimation of the parallel code, a three-dimensional laminar flow generated by natural convection inside a cube has been solved. Then, we have solved the turbulent flow around MIRA (Motor Industry Research Association) model at $Re = 2.6\times10^6$, which is based on the model height and inlet free stream velocity, using 32 processors on IBM SMP cluster and compared with the existing experiment.

  • PDF

A topology optimization method of multiple load cases and constraints based on element independent nodal density

  • Yi, Jijun;Rong, Jianhua;Zeng, Tao;Huang, X.
    • Structural Engineering and Mechanics
    • /
    • v.45 no.6
    • /
    • pp.759-777
    • /
    • 2013
  • In this paper, a topology optimization method based on the element independent nodal density (EIND) is developed for continuum solids with multiple load cases and multiple constraints. The optimization problem is formulated ad minimizing the volume subject to displacement constraints. Nodal densities of the finite element mesh are used a the design variable. The nodal densities are interpolated into any point in the design domain by the Shepard interpolation scheme and the Heaviside function. Without using additional constraints (such ad the filtering technique), mesh-independent, checkerboard-free, distinct optimal topology can be obtained. Adopting the rational approximation for material properties (RAMP), the topology optimization procedure is implemented using a solid isotropic material with penalization (SIMP) method and a dual programming optimization algorithm. The computational efficiency is greatly improved by multithread parallel computing with OpenMP to run parallel programs for the shared-memory model of parallel computation. Finally, several examples are presented to demonstrate the effectiveness of the developed techniques.

Efficient Scientific Computation on WP Parallel Computer (MP 병렬컴퓨터에서 효과적인 과학계산의 수행)

  • 김선경
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.4
    • /
    • pp.26-30
    • /
    • 2003
  • The Lanczos algorithm is the most commonly used in approximating a small number of extreme eigenvalues for symmetric large sparse matrices. Global communications in MP(Message Passing) parallel computer decrease the computation speed. In this paper, we introduce the s-step Lanczos method, and s-step method generates reduction matrices which are similar to reduction matrices generated by the standard Lanczos method. One iteration of the s-step Lanczos algorithm corresponds to s iterations of the standard Lanczos algorithm. The s-step method has the minimized global communication and has the superior parallel properties to the standard method. These algorithms are implemented on Cray T3E and performance results are presented.

  • PDF

Comparison of Parallel Computation Performances for 3D Wave Propagation Modeling using a Xeon Phi x200 Processor (제온 파이 x200 프로세서를 이용한 3차원 음향 파동 전파 모델링 병렬 연산 성능 비교)

  • Lee, Jongwoo;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.21 no.4
    • /
    • pp.213-219
    • /
    • 2018
  • In this study, we simulated 3D wave propagation modeling using a Xeon Phi x200 processor and compared the parallel computation performance with that using a Xeon CPU. Unlike the 1st generation Xeon Phi coprocessor codenamed Knights Corner, the 2nd generation x200 Xeon Phi processor requires no additional communication between the internal memory and the main memory since it can run an operating system directly. The Xeon Phi x200 processor can run large-scale computation independently, with the large main memory and the high-bandwidth memory. For comparison of parallel computation, we performed the modeling using the MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) libraries. Numerical examples using the SEG/EAGE salt model demonstrated that we can achieve 2.69 to 3.24 times faster modeling performance using the Xeon Phi with a large number of computational cores and high-bandwidth memory compared to that using the 12-core CPU.

Parallel VHDL Simulation on IBM SP2 and SGI Origin 2000 (IBM SP2와 SGI Origin 2000에서의 병렬 VHDL 시뮬레이션)

  • 정영식
    • Journal of the Korea Society for Simulation
    • /
    • v.7 no.1
    • /
    • pp.69-83
    • /
    • 1998
  • In this paper, we present the results of simulation by running parallel VHDL simulation on typical MPP(Massively Parallel Processor) systems such as IBM SP2 and SGI Origin 2000. Parallel simulation uses the synchronous protocol and parallel program is implemented using MPI(Message Passing Interface) based on message passing model, so that it can urn on any parallel programming environment which supports MPI, a standard communication library. And then GVT(Global Virtual Time) computation for parallel simulation is based on the global broadcasting with MPI_Bcast(), which is a standard function in MPI and piggybacking. Our benchmark exhibits that as size of VHDL grows, the parallel simulation has a better performance compared with the sequential simulation. In addition, we also show the results of comparison between IBM SP2 and SGI Origin 2000 by applying the same application to those indirectly.

  • PDF

Parallel Rendering of High Quality Animation based on a Dynamic Workload Allocation Scheme (작업영역의 동적 할당을 통한 고화질 애니메이션의 병렬 렌더링)

  • Rhee, Yun-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.1
    • /
    • pp.109-116
    • /
    • 2008
  • Even though many studies on parallel rendering based on PC clusters have been done. most of those did not cope with non-uniform scenes, where locations of 3D models are biased. In this work. we have built a PC cluster system with POV-Ray, a free rendering software on the public domain, and developed an adaptive load balancing scheme to optimize the parallel efficiency Especially, we noticed that a frame of 3D animation are closely coherent with adjacent frames. and thus we could estimate distribution of computation amount, based on the computation time of previous frame. The experimental results with 2 real animation data show that the proposed scheme reduces by 40% of execution time compared to the simple static partitioning scheme.

  • PDF