• Title/Summary/Keyword: Multi-core GPU

Search Result 35, Processing Time 0.027 seconds

Accelerated Large-Scale Simulation on DEVS based Hybrid System using Collaborative Computation on Multi-Cores and GPUs (멀티 코어와 GPU 결합 구조를 이용한 DEVS 기반 대규모 하이브리드 시스템 모델링 시뮬레이션의 가속화)

  • Kim, Seongseop;Cho, Jeonghun;Park, Daejin
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.3
    • /
    • pp.1-11
    • /
    • 2018
  • Discrete event system specification (DEVS) has been used in many simulations including hybrid systems featuring both discrete and continuous behavior that require a lot of time to get results. Therefore, in this study, we proposed the acceleration of a DEVS-based hybrid system simulation using multi-cores and GPUs tightly coupled computing. We analyzed the proposed heterogeneous computing of the simulation in terms of the configuration of the target device, changing simulation parameters, and power consumption for efficient simulation. The result revealed that the proposed architecture offers an advantage for high-performance simulation in terms of execution time, although more power consumption is required. With these results, we discovered that our approach is applicable in hybrid system simulation, and we demonstrated the possibility of optimized hardware distribution in terms of power consumption versus execution time via experiments in the proposed architecture.

Numerical simulation on jet breakup in the fuel-coolant interaction using smoothed particle hydrodynamics

  • Choi, Hae Yoon;Chae, Hoon;Kim, Eung Soo
    • Nuclear Engineering and Technology
    • /
    • v.53 no.10
    • /
    • pp.3264-3274
    • /
    • 2021
  • In a severe accident of light water reactor (LWR), molten core material (corium) can be released into the wet cavity, and a fuel-coolant interaction (FCI) can occur. The molten jet with high speed is broken and fragmented into small debris, which may cause a steam explosion or a molten core concrete interaction (MCCI). Since the premixing stage where the jet breakup occurs has a large impact on the severe accident progression, the understanding and evaluation of the jet breakup phenomenon are highly important. Therefore, in this study, the jet breakup simulations were performed using the Smoothed Particle Hydrodynamics (SPH) method which is a particle-based Lagrangian numerical method. For the multi-fluid system, the normalized density approach and improved surface tension model (CSF) were applied to the in-house SPH code (single GPU-based SOPHIA code) to improve the calculation accuracy at the interface of fluids. The jet breakup simulations were conducted in two cases: (1) jet breakup without structures, and (2) jet breakup with structures (control rod guide tubes). The penetration depth of the jet and jet breakup length were compared with those of the reference experiments, and these SPH simulation results are qualitatively and quantitatively consistent with the experiments.

Face Detection using Skin Color Information and Parallel Processing Method on Multi-Core (멀티코어에서 피부색상 정보와 병렬처리 방법을 이용한 얼굴 검출)

  • Kim, Hong-Hee;Lee, Jae-Heung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.219-222
    • /
    • 2012
  • 최근 얼굴검출에 관한 연구는 FPGA를 통한 H/W설계부터 DSP, GPU, ARM Core에 효율적인 S/W 설계까지 다양하게 연구되고 있다. 본 연구에서는 Multi-Core에 효과적인 얼굴검출 방법을 제안한다. 피부색을 통한 얼굴 후보를 추출하고 그 외의 배경 이미지는 삭제하여 연산처리를 빠르게 하였다. Viola-Jones가 제안한 얼굴검출 알고리즘을 POSIX Thread를 사용하여 병렬 처리하였고 그 성능을 단일 코어와 멀티코어에서 측정하였다. 단일 코어에서는 성능의 향상이 없었으나 멀티코어에서는 약 1.8배 속도가 향상되었고 검출 성공률은 기존과 동일하였다.

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Computing Performance Comparison of CPU and GPU Parallelization for Virtual Heart Simulation (가상 심장 시뮬레이션에서 CPU와 GPU 병렬처리의 계산 성능 비교)

  • Kim, Sang Hee;Jeong, Da Un;Setianto, Febrian;Lim, Ki Moo
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.3
    • /
    • pp.128-137
    • /
    • 2020
  • Cardiac electrophysiology studies often use simulation to predict how cardiac will behave under various conditions. To observe the cardiac tissue movement, it needs to use the high--resolution heart mesh with a sophisticated and large number of nodes. The higher resolution mesh is, the more computation time is needed. To improve computation speed and performance, parallel processing using multi-core processes and network computing resources is performed. In this study, we compared the computational speeds of CPU parallelization and GPU parallelization in virtual heart simulation for efficiently calculating a series of ordinary differential equations (ODE) and partial differential equations (PDE) and determined the optimal CPU and GPU parallelization architecture. We used 2D tissue model and 3D ventricular model to compared the computation performance. Then, we measured the time required to the calculation of ODEs and PDEs, respectively. In conclusion, for the most efficient computation, using GPU parallelization rather than CPU parallelization can improve performance by 4.3 times and 2.3 times in calculations of ODEs and PDE, respectively. In CPU parallelization, it is best to use the number of processors just before the communication cost between each processor is incurred.

Numerical simulation on LMR molten-core centralized sloshing benchmark experiment using multi-phase smoothed particle hydrodynamics

  • Jo, Young Beom;Park, So-Hyun;Park, Juryong;Kim, Eung Soo
    • Nuclear Engineering and Technology
    • /
    • v.53 no.3
    • /
    • pp.752-762
    • /
    • 2021
  • The Smoothed Particle Hydrodynamics is one of the most widely used mesh-free numerical method for thermo-fluid dynamics. Due to its Lagrangian nature and simplicity, it is recently gaining popularity in simulating complex physics with large deformations. In this study, the 3D single/two-phase numerical simulations are performed on the Liquid Metal Reactor (LMR) centralized sloshing benchmark experiment using the SPH parallelized using a GPU. In order to capture multi-phase flows with a large density ratio more effectively, the original SPH density and continuity equations are re-formulated in terms of the normalized-density. Based upon this approach, maximum sloshing height and arrival time in various experimental cases are calculated by using both single-phase and multi-phase SPH framework and the results are compared with the benchmark results. Overall, the results of SPH simulations show excellent agreement with all the benchmark experiments both in qualitative and quantitative manners. According to the sensitivity study of the particle-size, the prediction accuracy is gradually increasing with decreasing the particle-size leading to a higher resolution. In addition, it is found that the multi-phase SPH model considering both liquid and air provides a better prediction on the experimental results and the reality.

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL (OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화)

  • Park, Joo-Yul;Hong, Jung-Hyun;Chung, Ki-Seok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.325-334
    • /
    • 2016
  • Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.

A Study on GPGPU Performance for the Configurations of Threads (GPGPU에서 쓰레드 구성을 위한 성능에 관한 연구)

  • Kim, Hyun Kyu;Lee, Hyo Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.146-148
    • /
    • 2012
  • 최근 GPGPU를 활용한 병렬처리가 각광을 받고 있는 가운데 GPU의 구조적 특성인 매니코어(many core)기반에서 쓰레드(thread)의 구성이 성능에 얼마나 영향을 미치는지에 관해 수치적 해답을 얻고자 하였다. 이는 멀티코어 (multi core)기반으로 작성된 프로그램을 GPGPU로 변환하는 과정에서 쓰레드의 최대활용도를 빠르게 추측 할 수 있도록 도움을 얻고자 하는데 일차적인 목적이 있다. 현재 GPGPU의 쓰레드 구성은 입력되는 데이터의 양을 고려하여 충분한 테스트를 거쳐 경험적인 최적화 수를 지정해 주워야 한다. 이번 연구를 통해 GPGPU로 변환하는 과정에서 최적의 쓰레드 수구성 방법을 추측 할 수 있으며 더 나아가 동적으로 최적의 수를 구할 수 있도록 하는데 목적이 있다.

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.