• Title/Summary/Keyword: Parallel Computing

Search Result 807, Processing Time 0.029 seconds

A Linear Clustering Method for the Scheduling of the Directed Acyclic Graph Model with Multiprocessors Using Genetic Algorithm (다중프로세서를 갖는 유방향무환그래프 모델의 스케쥴링을 위한 유전알고리즘을 이용한 선형 클러스터링 해법)

  • Sung, Ki-Seok;Park, Jee-Hyuk
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.24 no.4
    • /
    • pp.591-600
    • /
    • 1998
  • The scheduling of parallel computing systems consists of two procedures, the assignment of tasks to each available processor and the ordering of tasks in each processor. The assignment procedure is same with a clustering. The clustering is classified into linear or nonlinear according to the precedence relationship of the tasks in each cluster. The parallel computing system can be modeled with a Directed Acyclic Graph(DAG). By the granularity theory, DAG is categorized into Coarse Grain Type(CDAG) and Fine Grain Type(FDAG). We suggest the linear clustering method for the scheduling of CDAG using the genetic algorithm. The method utilizes a properly that the optimal schedule of a CDAG is one of linear clustering. We present the computational comparisons between the suggested method for CDAG and an existing method for the general DAG including CDAG and FDAG.

  • PDF

Accelerating 2D DCT in Multi-core and Many-core Environments (멀티코어와 매니코어 환경에서의 2 차원 DCT 가속)

  • Hong, Jin-Gun;Jung, Sung-Wook;Kim, Cheong-Ghil;Burgstaller, Bernd
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.250-253
    • /
    • 2011
  • Chip manufacture nowadays turned their attention from accelerating uniprocessors to integrating multiple cores on a chip. Moreover desktop graphic hardware is now starting to support general purpose computation. Desktop users are able to use multi-core CPU and GPU as a high performance computing resources these days. However exploiting parallel computing resources are still challenging because of lack of higher programming abstraction for parallel programming. The 2-dimensional discrete cosine transform (2D-DCT) algorithms are most computational intensive part of JPEG encoding. There are many fast 2D-DCT algorithms already studied. We implemented several algorithms and estimated its runtime on multi-core CPU and GPU environments. Experiments show that data parallelism can be fully exploited on CPU and GPU architecture. We expect parallelized DCT bring performance benefit towards its applications such as JPEG and MPEG.

Development of a flux emergence simulation using parallel computing

  • Lee, Hwanhee;Magara, Tetsuya
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.71.1-71.1
    • /
    • 2019
  • The solar magnetic field comes from the solar interior and is related to various phenomena on the Sun. To understand this process, many studies have been conducted to produce its evolution using a single flux rope. In this study, we are interested in the emergence of two flux ropes and their evolution, which takes longer than the emergence of a single flux rope. To construct it, we develop a flux emergence simulation by applying a parallel computing to reduce a computation time in a wider domain. The original simulation code had been written in Fortran 77. We modify it to a version of Fortran 90 with Message Passing Interface (MPI). The results of the original and new simulation are compared on the NEC SX-Aurora TSUBASA which is a vector engine processor. The parallelized version is faster than running on a single core and it shows a possibility to handle large amounts of calculation. Based on this model, we can construct a complex flux emergence system, such as an evolution of two magnetic flux ropes.

  • PDF

Tree-dimensional FE Analysis of Acoustic Emission of Fiber Breakage using Explicit Time Integration Method (외연적 시간적분법을 이용한 복합재료 섬유 파단 시 음향방출의 3차원 유한요소 해석)

  • Paik, Seung-Hoon;Park, Si-Hyong;Kim, Seung-Jo
    • Proceedings of the Korean Society For Composite Materials Conference
    • /
    • 2005.04a
    • /
    • pp.172-175
    • /
    • 2005
  • The numerical simulation is performed for the acoustic emission and the wave propagation due to fiber breakage in single fiber composite plates by the finite element transient analysis. The acoustic emission and the following wave motions from a fiber breakage under a static loading is simulated to investigate the applicability of the explicit finite element method and the equivalent volume force model as a simulation tool of wave propagation and a modeling technique of an acoustic emission. For such a simple case of the damage event under static loading, various parameters affecting the wave motion are investigated for reliable simulations of the impact damage event. The high velocity and the small wave length of the acoustic emission require a refined analysis with dense distribution of the finite element and a small time step. In order to fulfill the requirement for capturing the exact wave propagation and to cover the 3-D simulation, we utilize the parallel FE transient analysis code and the parallel computing technology.

  • PDF

Hologram Generation Acceleration Method Using GPGPU (GPGPU를 이용한 홀로그램 생성 가속화 방법)

  • Lee, Yoon-Hyuk;Kim, Dong-Wook;Seo, Young-Ho
    • Journal of Broadcast Engineering
    • /
    • v.22 no.6
    • /
    • pp.800-807
    • /
    • 2017
  • A large amount of computation is required to generate a hologram using a computer. In order to accelerate the computation, many methods of acceleration by parallel programming using GPGPU(General Purpose computing on Graphic Process Unit) have been researched. In this paper, we propose a method of reducing the bottleneck caused by hologram pixel based parallel processing and using the shareable variables. We also propose how to optimize using Visual Profiler supported by nVidia's CUDA to make threads work optimally. The experimental results show that the proposed method reduces the calculation time by up to 40% compared with the existing research.

Crack Identification Using Evolutionary Algorithms in Parallel Computing Environment (병렬 환경하의 진화 이론을 이용한 결함인식)

  • Sim, Mun-Bo;Seo, Myeong-Won
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.26 no.9
    • /
    • pp.1806-1813
    • /
    • 2002
  • It is well known that a crack has an important effect on the dynamic behavior of a structure. This effect depends mainly on the location and depth of the crack. To identify the location and depth of a crack in a structure, a classical optimization technique was adopted by previous researchers. That technique overcame the difficulty of finding the intersection point of the superposed contours that correspond to the eigenfrequency caused by the crack presence. However, it is hard to select a trial solution initially for optimization because the defined objective function is heavily multimodal. A method is presented in this paper, which uses continuous evolutionary algorithms(CEAs). CEAs are effective for solving inverse problems and implemented on PC clusters to shorten calculation time. With finite element model of the structure to calculate eigenfrequencies, it is possible to formulate the inverse problem in optimization format. CEAs are used to identify the crack location and depth minimizing the difference from the measured frequencies. We have tried this new idea on a simple beam structure and the results are promising with high parallel efficiency over about 94%.

Parallelized Topology Design Optimization of the Frame of Human Powered Vessel (인력선 프레임의 병렬화 위상 최적설계)

  • Kim, Hyun-Suk;Lee, Ki-Myung;Kim, Min-Geun;Cho, Seon-Ho
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.47 no.1
    • /
    • pp.58-66
    • /
    • 2010
  • Topology design optimization is a method to determine the optimal distribution of material that yields the minimal compliance of structures, satisfying the constraint of allowable material volume. The method is easy to implement and widely used so that it becomes a powerful design tool in various disciplines. In this paper, a large-scale topology design optimization method is developed using the efficient adjoint sensitivity and optimality criteria methods. Parallel computing technique is required for the efficient topology optimization as well as the precise analysis of large-scale problems. Parallelized finite element analysis consists of the domain decomposition and the boundary communication. The preconditioned conjugate gradient method is employed for the analysis of decomposed sub-domains. The developed parallel computing method in topology optimization is utilized to determine the optimal structural layout of human powered vessel.

A Parallel Pipeline Execution Algorithm for H.264/AVC Intra Prediction (H.264/AVC의 인트라 예측 병렬 파이프라인 실행 알고리즘)

  • Xu, Jia-Yue;Cho, Hyo-Moon;Cho, Sang-Bock
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.5
    • /
    • pp.79-86
    • /
    • 2008
  • H.264/AVC is the newest international video coding standard developed by the joint ITU-T and ISO/IEC standards organizations. This newest video coding standard offers much higher coding efficiency than the H.261, H.263 and MPEG-4. But it has high computing complexity and high H/W resources wasting problem. This paper described the two unit parallel pipeline structure. This new structure comparing with standard model decreased the computing complexity of 67% and the H/W resources waste of 3%.

An Aggressive Register Allocation Algorithm for EPIC Architectures (EPIC 아키텍쳐를 위한 적극적 레지스터 할당 알고리듬)

  • Choe, Jun-Gi;Lee, Sang-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.497-511
    • /
    • 1999
  • Recently, many parallel processing technologies were developed, ILP(Instruction level Parallelism) processor's performance have been growed very rapidly. especially, EPIC(Explicitly Parallel Instruction computing) architectures attempt to enhance the performance in the predicated execution and speculative execution with the hardware. In this paper to improve the code scheduling possibility by applying to the characteristics of EPIC architectures, a new register allocation algorithm is proposed. And we proves that proposed register allocation algorithm is more efficient scheme than the conventional scheme when predicated execution is applied to our scheme by experiments. In experimental results, it shows much more performance enhancement, about 19% in proposed scheme than the conventional scheme. So, our scheme is verified that it is an effective register allocation method.

  • PDF

Numerical procedures for extreme impulsive loading on high strength concrete structures

  • Danielson, Kent T.;Adley, Mark D.;O'Daniel, James L.
    • Computers and Concrete
    • /
    • v.7 no.2
    • /
    • pp.159-167
    • /
    • 2010
  • This paper demonstrates numerical techniques for complex large-scale modeling with microplane constitutive theories for reinforced high strength concrete, which for these applications, is defined to be around the 7000 psi (48 MPa) strength as frequently found in protective structural design. Applications involve highly impulsive loads, such as an explosive detonation or impact-penetration event. These capabilities were implemented into the authors' finite element code, ParaAble and the PRONTO 3D code from Sandia National Laboratories. All materials are explicitly modeled with eight-noded hexahedral elements. The concrete is modeled with a microplane constitutive theory, the reinforcing steel is modeled with the Johnson-Cook model, and the high explosive material is modeled with a JWL equation of state and a programmed burn model. Damage evolution, which can be used for erosion of elements and/or for post-analysis examination of damage, is extracted from the microplane predictions and computed by a modified Holmquist-Johnson-Cook approach that relates damage to levels of inelastic strain increment and pressure. Computation is performed with MPI on parallel processors. Several practical analyses demonstrate that large-scale analyses of this type can be reasonably run on large parallel computing systems.