• Title/Summary/Keyword: Parallel Computing Method

Search Result 283, Processing Time 0.026 seconds

Performance Enhancement of Parallel Prime Sieving with Hybrid Programming and Pipeline Scheduling (혼합형 병렬처리 및 파이프라이닝을 활용한 소수 연산 알고리즘)

  • Ryu, Seung-yo;Kim, Dongseung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.10
    • /
    • pp.337-342
    • /
    • 2015
  • We develop a new parallelization method for Sieve of Eratosthenes algorithm, which enhances both computation speed and energy efficiency. A pipeline scheduling is included for better load balancing after proper workload partitioning. They run on multicore CPUs with hybrid parallel programming model which uses both message passing and multithreading computation. Experimental results performed on both small scale clusters and a PC with a mobile processor show significant improvement in execution time and energy consumptions.

Parallel Finite Element Analysis System Based on Domain Decomposition Method Bridges (영역분할법에 기반을 둔 병렬 유한요소해석 시스템)

  • Lee, Joon-Seong;Shioya, Ryuji;Lee, Eun-Chul;Lee, Yang-Chang
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.22 no.1
    • /
    • pp.35-44
    • /
    • 2009
  • This paper describes an application of domain decomposition method for parallel finite element analysis which is required to large scale 3D structural analysis. A parallel finite element method system which adopts a domain decomposition method is developed. Node is generated if its distance from existing node points is similar to the node spacing function at the point. The node spacing function is well controlled by the fuzzy knowledge processing. The Delaunay triangulation method is introduced as a basic tool for element generation. Domain decomposition method using automatic mesh generation system holds great benefits for 3D analyses. Aa parallel numerical algorithm for the finite element analyses, domain decomposition method was combined with an iterative solver, i.e. the conjugate gradient(CG) method where a whole analysis domain is fictitiously divided into a number of subdomains without overlapping. Practical performance of the present system are demonstrated through several examples.

A topology optimization method of multiple load cases and constraints based on element independent nodal density

  • Yi, Jijun;Rong, Jianhua;Zeng, Tao;Huang, X.
    • Structural Engineering and Mechanics
    • /
    • v.45 no.6
    • /
    • pp.759-777
    • /
    • 2013
  • In this paper, a topology optimization method based on the element independent nodal density (EIND) is developed for continuum solids with multiple load cases and multiple constraints. The optimization problem is formulated ad minimizing the volume subject to displacement constraints. Nodal densities of the finite element mesh are used a the design variable. The nodal densities are interpolated into any point in the design domain by the Shepard interpolation scheme and the Heaviside function. Without using additional constraints (such ad the filtering technique), mesh-independent, checkerboard-free, distinct optimal topology can be obtained. Adopting the rational approximation for material properties (RAMP), the topology optimization procedure is implemented using a solid isotropic material with penalization (SIMP) method and a dual programming optimization algorithm. The computational efficiency is greatly improved by multithread parallel computing with OpenMP to run parallel programs for the shared-memory model of parallel computation. Finally, several examples are presented to demonstrate the effectiveness of the developed techniques.

Implementation of Efficient Power Method on CUDA GPU (CUDA 기반 GPU에서 효율적인 Power Method의 구현)

  • Kim, Jung-Hwan;Kim, Jin-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.9-16
    • /
    • 2011
  • GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.

A parallel tasks Scheduling heuristic in the Cloud with multiple attributes

  • Wang, Qin;Hou, Rongtao;Hao, Yongsheng;Wang, Yin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.287-307
    • /
    • 2018
  • There are two targets to schedule parallel jobs in the Cloud: (1) scheduling the jobs as many as possible, and (2) reducing the average execution time of the jobs. Most of previous work mainly focuses on the computing speed of resources without considering other attributes, such as bandwidth, memory and so on. Especially, past work does not consider the supply-demand condition from those attributes. Resources have different attributes, considering those attributes together makes the scheduling problem more difficult. This is the problem that we try to solve in this paper. First of all, we propose a new parallel job scheduling method based on a classification method of resources from different attributes, and then a scheduling method-CPLMT (Cloud parallel scheduling based on the lists of multiple attributes) is proposed for the parallel tasks. The classification method categories resources into different kinds according to the number of resources that satisfy the job from different attributes of the resource, such as the speed of the resource, memory and so on. Different kinds have different priorities in the scheduling. For the job that belongs to the same kinds, we propose CPLMT to schedule those jobs. Comparisons between our method, FIFO (First in first out), ASJS (Adaptive Scoring Job Scheduling), Fair and CMMS (Cloud-Minmin) are executed under different environments. The simulation results show that our proposed CPLMT not only reduces the number of unfinished jobs, but also reduces the average execution time.

Parallel Structure Design Method for Mass Spring Simulation (질량스프링 시뮬레이션을 위한 병렬 구조 설계 방법)

  • Sung, Nak-Jun;Choi, Yoo-Joo;Hong, Min
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.3
    • /
    • pp.55-63
    • /
    • 2019
  • Recently, the GPU computing method has been utilized to improve the performance of the physics simulation field. In particular, in the case of a deformed object simulation requiring a large amount of computation, a GPU-based parallel processing algorithm is required to guarantee real-time performance. We have studied the parallel structure design method to improve the performance of the mass spring simulation method which is one of the methods of implementing the deformation object simulation. We used OpenGL's GLSL, a graphics library that allows direct access to the GPU, and implemented the GPGPU environment using an independent pipeline, the compute shader. In order to verify the effectiveness of the parallel structure design method, the mass - spring system was implemented based on CPU and GPU. Experimental results show that the proposed method improves computation speed by about 6,000% compared to the CPU Environment. It is expected that the lightweight simulation technology can be effectively applied to the augmented reality and the virtual reality field by using the design method proposed later in this research.

Three Dimensional FE Analysis of Acoustic Emission of Composite Plate (복합재료 파손 시 발생하는 음향방출의 3차원 유한요소 해석)

  • Paik, Seung-Hoon;Park, Si-Hyong;Kim, Seung Jo
    • Composites Research
    • /
    • v.18 no.5
    • /
    • pp.15-20
    • /
    • 2005
  • In this paper, damage induced acoustic emission in the composite plate in numerically simulated by using the three dimensional finite element method and explicit time integration. Acoustic source is modeled by equivalent volume source. To verify the proposed method, dynamic displacements due to the elastic wave are compared with the experiment when the fiber is broken in the single fiber embedded isotropic plate. For the laminated composite plates, the results are compared between homogenized model and DNS approach which models fibers and matrix separately. To capture high frequencies in the elastic wave, small time step size and a large number of meshes are required. The parallel computing technology is introduced to solve a large scale problem efficiently.

Molecular Dynamics Free Energy Simulation Study to Rationalize the Relative Activities of PPAR δ Agonists

  • Lee, Woo-Jin;Park, Hwang-Seo;Lee, Sangyoub
    • Bulletin of the Korean Chemical Society
    • /
    • v.29 no.2
    • /
    • pp.363-371
    • /
    • 2008
  • As a computational method for the discovery of the effective agonists for PPARd, we address the usefulness of molecular dynamics free energy (MDFE) simulation with explicit solvent in terms of the accuracy and the computing cost. For this purpose, we establish an efficient computational protocol of thermodynamic integration (TI) that is superior to free energy perturbation (FEP) method in parallel computing environment. Using this protocol, the relative binding affinities of GW501516 and its derivatives for PPARd are calculated. The accuracy of our protocol was evaluated in two steps. First, we devise a thermodynamic cycle to calculate the absolute and relative hydration free energies of test molecules. This allows a self-consistent check for the accuracy of the calculation protocol. Second, the calculated relative binding affinities of the selected ligands are compared with experimental IC50 values. The average deviation of the calculated binding free energies from the experimental results amounts at the most to 1 kcal/mol. The computational efficiency of current protocol is also assessed by comparing its execution times with those of the sequential version of the TI protocol. The results show that the calculation can be accelerated by 4 times when compared to the sequential run. Based on the calculations with the parallel computational protocol, a new potential agonist of GW501516 derivative is proposed.

Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing (병렬 연산을 이용한 방출 단층 영상의 재구성 속도향상 기초연구)

  • Park, Min-Jae;Lee, Jae-Sung;Kim, Soo-Mee;Kang, Ji-Yeon;Lee, Dong-Soo;Park, Kwang-Suk
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.443-450
    • /
    • 2009
  • Purpose: Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. Materials and Methods: The preliminary tests for the possibility on virtual manchines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Results: Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Conclusion: Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify.

Extracting Maximum Parallelism for Parallel Computing (병렬 계산을 위한 최대 병렬성 추출 방법)

  • Park, Doo-Soon
    • The Journal of Korean Association of Computer Education
    • /
    • v.8 no.1
    • /
    • pp.93-103
    • /
    • 2005
  • Since the most program execution time is consumed in a loop structure, extracting parallelism from sequential loop programs is critical for the faster program execution. Conventional studies for extracting the parallelism are focused mostly on a uniform data dependence distance. In this paper, we proposed data dependency elimination method for a nested loop and extended data dependency elimination method to extract parallelism from the loop with procedure calls. The data dependency elimination method and the extended data dependency elimination method can be applied to uniform and non-uniform data dependency distance. We compared our method with conventional methods using CRAY-T3E for the performance evaluation. The results show that the proposed algorithms are very effective.

  • PDF