• Title/Summary/Keyword: Parallelization method

Search Result 92, Processing Time 0.024 seconds

Development of a Parallel Cell-Based DSMC Method Using Unstructured Meshes (비정렬격자에서 병렬화된 격자중심 직접모사 기법 개발)

  • Kim, Hyeong-Sun;Kim, Min-Gyu;Gwon, O-Jun
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.30 no.2
    • /
    • pp.1-11
    • /
    • 2002
  • In the present study, a parallel DSCM technique based on a cell-based data structure is developed for the efficient simulation of rarefied gas flows especially od PC clusters. Dynamic load balancing is archieved by decomposing the computational domain into several sub-domains and accounting for the number of particles and the number cells of each domain. Mesh adaptation algorithm is also applied to improve the resolution of the solution and to reduce the grid dependency. It was demonstrated that accurate solutions can be obtained after several levels of mesh adapation starting from a coars initial grid. The method was applied to a two-dimensioanal supersonic leading-edge flow and the axi-symmetric Rothe nozzle flow to validate the efficiency of the present method. It was found that the present method is a very effective tool for the efficient simulation of rarefied gas flow on PC-based parallel machines.

Strongly Coupled Method for 2DOF Flutter Analysis (강성 결합 기법을 통한 2계 자유도 플러터 해석)

  • Ju, Wan-Don;Lee, Gwan-Jung;Lee, Dong-Ho;Lee, Gi-Hak
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.34 no.1
    • /
    • pp.24-31
    • /
    • 2006
  • In the present study, a strongly coupled analysis code is developed for transonic flutter analysis. For aerodynamic analysis, two dimensional Reynolds-Averaged Navier-Stokes equation was used for governing equation, and ε-SST for turbulence model, DP-SGS(Data Parallel Symmetric Gauss Seidel) Algorithm for parallelization algorithm. 2 degree-of-freedom pitch and plunge model was used for structural analysis. To obtain flutter response in the time domain, dual time stepping method was applied to both flow and structure solver. Strongly coupled method was implemented by successive iteration of fluid-structure interaction in pseudo time step. Computed results show flutter speed boundaries and limit cycle oscillation phenomena in addition to typical flutter responses - damped, divergent and neutral responses. It is also found that the accuracy of transonic flutter analysis is strongly dependent on the methodology of fluid-structure interaction as well as on the choice of turbulence model.

Method for Applying Wavefront Parallel Processing on Cubemap Video (큐브맵 영상에 Wavefront 병렬 처리를 적용하는 방법)

  • Hong, Seok Jong;Park, Gwang Hoon
    • Journal of Broadcast Engineering
    • /
    • v.22 no.3
    • /
    • pp.401-404
    • /
    • 2017
  • The 360 VR video has a format of a stereoscopic shape such as an isometric shape or a cubic shape or a cubic shape. Although these formats have different characteristics, they have in common that the resolution is higher than that of a normal 2D video. Therefore, it takes much longer time to perform coding/decoding on 360 VR video than 2D Video, so parallel processing techniques are essential when it comes to coding 360 VR video. HEVC, the state of art 2D video codec, uses Wavefront Parallel Processing (WPP) technology as a standard for parallelization. This technique is optimized for 2D videos and does not show optimal performance when used in 3D videos. Therefore, a suitable method for WPP is required for 3D video. In this paper, we propose WPP coding/decoding method which improves WPP performance on cube map format 3D video. The experiment was applied to the HEVC reference software HM 12.0. The experimental results show that there is no significant loss of PSNR compared with the existing WPP, and the coding complexity of 15% to 20% is further reduced. The proposed method is expected to be included in the future 3D VR video codecs.

GPU-based Acceleration of Particle Filter Signal Processing for Efficient Moving-target Position Estimation (이동 목표물의 효율적인 위치 추정을 위한 파티클 필터 신호 처리의 GPU 기반 가속화)

  • Kim, Seongseop;Cho, Jeonghun;Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.12 no.5
    • /
    • pp.267-275
    • /
    • 2017
  • Time of difference of arrival (TDOA) method using passive sonar sensor array has normally been used to estimate the location of a concealed moving target in underwater environment. Particle filter has been introduced for effective target estimation for non-Gaussian and nonlinear systems. In this paper, we propose a GPU-based acceleration of target position estimation using particle filter and propose efficient embedded system and software architecture. For the TDOA measurement from the passive sonar sensor, we use the generalized cross correlation phase transform (GCC-PHAT) method to obtain the correlation coefficient of the signal using FFT and we try to accelerate the calculation of GCC-PHAT based TDOA measurements using FFT with GPU CUDA. We also propose parallelization method of the target position estimation algorithm using the GPU CUDA to update the state of each particle for the target position estimation using the measured values. The target estimation algorithm was verified using Matlab and implemented using GPU CUDA. Then, we realized the proposed signal processing acceleration system using NVIDIA Jetson TX1 as the target board to analyze in terms of the execution time. The execution time of the algorithm is reduced by 55% to the CPU standalone-operation on the target board. Experiment results show that the proposed architecture is a feasible solution in terms of high-performance and area-efficient architecture.

SIMD Instruction-based Fast HEVC RExt Decoder (SIMD 명령어 기반 HEVC RExt 복호화기 고속화)

  • Mok, Jung-Soo;Ahn, Yong-Jo;Ryu, Hochan;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.20 no.2
    • /
    • pp.224-237
    • /
    • 2015
  • In this paper, we introduce the fast decoding method with the SIMD (Single Instruction Multiple Data) instructions for HEVC RExt (High Efficiency Video Coding Range Extensions). Several tools of HEVC RExt such as intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules can be classified as the proper modules for applying the SIMD instructions. In consideration of bit-depth increasement of RExt, intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules are accelerated by SSE (Streaming SIMD Extension) instructions. In addition, we propose effective implementations for interpolation filter, inverse-quantization, and clipping modules by utilizing a set of AVX2 (Advanced Vector eXtension 2) instructions that can use 256 bits register. The evaluation of the proposed methods were performed on the private HEVC RExt decoder developed based on HM 16.0. The experimental results show that the developed RExt decoder reduces 12% average decoding time, compared with the conventional sequential method.

BCDR algorithm for network estimation based on pseudo-likelihood with parallelization using GPU (유사가능도 기반의 네트워크 추정 모형에 대한 GPU 병렬화 BCDR 알고리즘)

  • Kim, Byungsoo;Yu, Donghyeon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.381-394
    • /
    • 2016
  • Graphical model represents conditional dependencies between variables as a graph with nodes and edges. It is widely used in various fields including physics, economics, and biology to describe complex association. Conditional dependencies can be estimated from a inverse covariance matrix, where zero off-diagonal elements denote conditional independence of corresponding variables. This paper proposes a efficient BCDR (block coordinate descent with random permutation) algorithm using graphics processing units and random permutation for the CONCORD (convex correlation selection method) based on the BCD (block coordinate descent) algorithm, which estimates a inverse covariance matrix based on pseudo-likelihood. We conduct numerical studies for two network structures to demonstrate the efficiency of the proposed algorithm for the CONCORD in terms of computation times.

Development of a 3-D Parallel DSMC Method for Rarefied Gas Flows Using Unstructured Meshes (비정렬 격자계를 이용한 희박기체 영역의 3차원 병렬 직접모사법 개발)

  • Kim, Min Gyu;Gwon, O Jun
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.31 no.2
    • /
    • pp.1-9
    • /
    • 2003
  • In the present study, a 3-D Parallel DSMC method in developed on unstructured meshes for the efficient simulation of rarefied gas flows. Particle tracing between cells in achieved based on a linear shape function extended to three dimensions. For high parallel efficiency, successive domain decomposition is applied to achieve load balancing between processors by accounting for the number of particles. A particle weighting technique is also adopted to handle flows containing gases of significantly dirrerent number densities in the same flow domain. Application is made for flow past a 3-D delta wing and the result is compared with that from experiment and other calculation. Flow around a rocket payload at 100km altitude is also solved and the effect of plume back flow from the nozzle in studied.

Development of High Performance Massively Parallel Processing Simulator for Semiconductor Etching Process (건식 식각 공정을 위한 초고속 병렬 연산 시뮬레이터 개발)

  • Lee, Jae-Hee;Kwon, Oh-Seob;Ban, Yong-Chan;Won, Tae-Young
    • Journal of the Korean Institute of Telematics and Electronics D
    • /
    • v.36D no.10
    • /
    • pp.37-44
    • /
    • 1999
  • This paper report the implementation results of Monte Carlo numerical calculation for ion distributions in plasma dry etching chamber and of the surface evolution simulator using cell removal method for topographical evolution of the surface exposed to etching ion. The energy and angular distributions of ion across the plasma sheath were calculated by MC(Monte Carlo) algorithm. High performance MPP(Massively Parallel Processing) algorithm developed in this paper enables efficient parallel and distributed simulation with an efficiency of more than 95% and speedup of 16 with 16 processors. Parallelization of surface evolution simulator based on cell removal method reduces simulation time dramatically to 15 minutes and increases capability of simulation required enormous memory size of 600Mb.

  • PDF

Fast CU Encoding Schemes Based on Merge Mode and Motion Estimation for HEVC Inter Prediction

  • Wu, Jinfu;Guo, Baolong;Hou, Jie;Yan, Yunyi;Jiang, Jie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.3
    • /
    • pp.1195-1211
    • /
    • 2016
  • The emerging video coding standard High Efficiency Video Coding (HEVC) has shown almost 40% bit-rate reduction over the state-of-the-art Advanced Video Coding (AVC) standard but at about 40% computational complexity overhead. The main reason for HEVC computational complexity is the inter prediction that accounts for 60%-70% of the whole encoding time. In this paper, we propose several fast coding unit (CU) encoding schemes based on the Merge mode and motion estimation information to reduce the computational complexity caused by the HEVC inter prediction. Firstly, an early Merge mode decision method based on motion estimation (EMD) is proposed for each CU size. Then, a Merge mode based early termination method (MET) is developed to determine the CU size at an early stage. To provide a better balance between computational complexity and coding efficiency, several fast CU encoding schemes are surveyed according to the rate-distortion-complexity characteristics of EMD and MET methods as a function of CU sizes. These fast CU encoding schemes can be seamlessly incorporated in the existing control structures of the HEVC encoder without limiting its potential parallelization and hardware acceleration. Experimental results demonstrate that the proposed schemes achieve 19%-46% computational complexity reduction over the HEVC test model reference software, HM 16.4, at a cost of 0.2%-2.4% bit-rate increases under the random access coding configuration. The respective values under the low-delay B coding configuration are 17%-43% and 0.1%-1.2%.

An Optimization Method for Hologram Generation on Multiple GPU-based Parallel Processing (다중 GPU기반 홀로그램 생성을 위한 병렬처리 성능 최적화 기법)

  • Kook, Joongjin
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.9-15
    • /
    • 2019
  • Since the computational complexity for hologram generation increases exponentially with respect to the size of the point cloud, parallel processing using CUDA and/or OpenCL library based on multiple GPUs has recently become popular. The CUDA kernel for parallelization needs to consist of threads, blocks, and grids properly in accordance with the number of cores and the memory size in the GPU. In addition, in case of multiple GPU environments, the distribution in grid-by-grid, in block-by-block, or in thread-by-thread is needed according to the number of GPUs. In order to evaluate the performance of CGH generation, we compared the computational speed in CPU, in single GPU, and in multi-GPU environments by gradually increasing the number of points in a point cloud from 10 to 1,000,000. We also present a memory structure design and a calculation method required in the CUDA-based parallel processing to accelerate the CGH (Computer Generated Hologram) generation operation in multiple GPU environments.