Search | Korea Science

Yoon, Jong Seon;Jeon, Byoung Jin;Jung, Hye Dong;Choi, Hyoung Gwon
- Transactions of the Korean Society of Mechanical Engineers B
- /
- v.40 no.9
- /
- pp.597-604
- /
- 2016
A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.
https://doi.org/10.3795/KSME-B.2016.40.9.597 인용 PDF KSCI

Jeon, Byoung Jin;Lee, Jae Ryong;Yoon, Han Young;Choi, Hyoung Gwon
- Transactions of the Korean Society of Mechanical Engineers B
- /
- v.38 no.1
- /
- pp.71-79
- /
- 2014
A parallelization of the bi-conjugate gradient solver for the pressure equation of the CUPID (component unstructured program for interfacial dynamics) code, which was developed for analyzing the components of a pressurized water-cooled reactor, was studied in a symmetric multi-processing system. The parallel performance was investigated for three typical parallel programming models (MPI, OpenMP, Hybrid) by solving incompressible backward-facing step flow at various grid resolutions. It was confirmed that parallel performance was low when problem size was small or the memory requirement for each thread was considerably higher than the cache memory. Furthermore, it was shown that MPI was better than OpenMP regardless of the problem size, and Hybrid was the best when the number of threads was relatively small.
https://doi.org/10.3795/KSME-B.2014.38.1.071 인용 PDF KSCI

Jeon, B.J.;Choi, H.G.
- Journal of computational fluids engineering
- /
- v.19 no.2
- /
- pp.24-29
- /
- 2014
In this study, a parallel performance of CUPID-code has been investigated for both Ethernet and Infiniband network system to examine the effect of cache memory and network-speed. Bi-conjugate gradient solver of CUPID-code has been parallelised by using domain decomposition method and message passing interface (MPI). It is shown that the parallel performance of Ethernet-network system is worse than that of Infiniband-network system due to the slow network-speed and a small cache memory. It is also found that the parallel performance of each system deteriorates for a small problem due to the communication overhead, but the performance of Infiniband-network system is better than Ethernet-network system due to a much faster network-speed. For a large problem, the parallel performance depends less on network system.
https://doi.org/10.6112/kscfe.2014.19.2.024 인용 PDF KSCI

Jeon, Byoung Jin;Choi, Hyoung Gwon
- Transactions of the Korean Society of Mechanical Engineers B
- /
- v.39 no.2
- /
- pp.191-197
- /
- 2015
The message passing interface (MPI) and hybrid programming models for the parallel computation of a pressure equation were compared in a distributed memory system. Both models were based on domain decomposition, and two numbers of the sub-domain were selected by considering the efficiency of the hybrid model. The parallel performances for various problem sizes were measured using up to 96 threads. It was found that in addition to the cache-memory size, the overhead of the MPI communication/OpenMP directives affected the parallel performance. For small problems, the parallel performance was low because the percentage of the overhead of the MPI communication/OpenMP directives increased as the number of threads increased, and MPI was better than the hybrid model because it had a smaller communication overhead. For large problems, the parallel performance was high because, in addition to the cache effect, the percentage of the communication overhead was relatively low compared to that for small problems, and the hybrid model was better than MPI because the communication overhead of MPI was more dominant than that of the OpenMP directives in the hybrid model.
https://doi.org/10.3795/KSME-B.2015.39.2.191 인용 PDF KSCI