• Title/Summary/Keyword: Compute unified device architecture

Search Result 61, Processing Time 0.029 seconds

A PRICING METHOD OF HYBRID DLS WITH GPGPU

  • YOON, YEOCHANG;KIM, YONSIK;BAE, HYEONG-OHK
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.20 no.4
    • /
    • pp.277-293
    • /
    • 2016
  • We develop an efficient numerical method for pricing the Derivative Linked Securities (DLS). The payoff structure of the hybrid DLS consists with a standard 2-Star step-down type ELS and the range accrual product which depends on the number of days in the coupon period that the index stay within the pre-determined range. We assume that the 2-dimensional Geometric Brownian Motion (GBM) as the model of two equities and a no-arbitrage interest model (One-factor Hull and White interest rate model) as a model for the interest rate. In this study, we employ the Monte Carlo simulation method with the Compute Unified Device Architecture (CUDA) parallel computing as the General Purpose computing on Graphic Processing Unit (GPGPU) technology for fast and efficient numerical valuation of DLS. Comparing the Monte Carlo method with single CPU computation or MPI implementation, the result of Monte Carlo simulation with CUDA parallel computing produces higher performance.

Computationally Efficient Implementation of a Hamming Code Decoder Using Graphics Processing Unit

  • Islam, Md Shohidul;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of Communications and Networks
    • /
    • v.17 no.2
    • /
    • pp.198-202
    • /
    • 2015
  • This paper presents a computationally efficient implementation of a Hamming code decoder on a graphics processing unit (GPU) to support real-time software-defined radio, which is a software alternative for realizing wireless communication. The Hamming code algorithm is challenging to parallelize effectively on a GPU because it works on sparsely located data items with several conditional statements, leading to non-coalesced, long latency, global memory access, and huge thread divergence. To address these issues, we propose an optimized implementation of the Hamming code on the GPU to exploit the higher parallelism inherent in the algorithm. Experimental results using a compute unified device architecture (CUDA)-enabled NVIDIA GeForce GTX 560, including 335 cores, revealed that the proposed approach achieved a 99x speedup versus the equivalent CPU-based implementation.

Implementation of Particle Swarm Optimization Method Using CUDA (CUDA를 이용한 Particle Swarm Optimization 구현)

  • Kim, Jo-Hwan;Kim, Eun-Su;Kim, Jong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.1019-1024
    • /
    • 2009
  • In this paper, particle swarm optimization(PSO) is newly implemented by CUDA(Compute Unified Device Architecture) and is applied to function optimization with several benchmark functions. CUDA is not CPU but GPU(Graphic Processing Unit) that resolves complex computing problems using parallel processing capacities. In addition, CUDA helps one to develop GPU softwares conveniently. Compared with the optimization result of PSO executed on a general CPU, CUDA saves about 38% of PSO running time as average, which implies that CUDA is a promising frame for real-time optimization and control.

Kinematic Wave Rainfall-Runoff Model Using CUDA FORTRAN (CUDA FORTRAN을 이용한 운동파 강우유출모형)

  • Kim, Boram;Kim, Dae-Hong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.271-271
    • /
    • 2018
  • 그래픽 처리 장치(GPU: Graphic Processing Units)는 그래픽 처리에 특화된 수많은 산술논리연산자 (ALU: Arithmetic Logic Unit)와 이에 관련된 인스트럭션Instruction)으로 인해 중앙 처리 장치(CPU: Central Processing Units) 보다 훨씬 빠른 계산 처리를 수행할 수 있다. 최근에는 FORTRAN에 의해 구현된 많은 수치모형들이 현실적인 모델링 방법의 발달로 인해 더 많은 계산량과 계산시간을 필요로 한다. 이 연구에서는 GPU 상의 범용 계산GPGPU : General-Purpose computing on Graphics Processing Units) 기반 운동파 강우유출모형(Kinematic Wave Rainfall-Runoff Model)이 CUDA(Compute Unified Device Architecture) FORTRAN을 사용하여 구현되었다. CUDA FORTRAN 운동파 강우유출모형의 계산 결과는 검증된 CPU 기반 운동파 강우유출모형의 계산 결과와 비교하여 검증되었으며, 잘 일치함을 보여 주었다. CUDA FORTRAN 운동파 강우유출모형은 CPU 기반 모형에 비해 약 20 배 더 빠른 계산 시간을 보였다. 또한 계산 영역이 커짐에 따라 CPU 버전에 비해 CUDA FORTRAN 버전의 계산 효율이 향상되었다.

  • PDF

Application Analysis of GPU-Accelerated Kinematic Wave Model Using CUDA Fortran (CUDA FORTEAN을 이용한 GPU 가속 운동파모형 적용성 분석)

  • Kim, Boram;Kim, Hyung-Jun;Kim, Sooyoung;Yoon, Kwang Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.346-346
    • /
    • 2022
  • 본 연구에서는 GPU(Graphic Processing Unit) 가속 분포형모형을 실제 유역에 적용하여 강우 유출모의 결과의 정확성과 모의시간의 효율성에 대한 분석을 수행하였다. 분포형모형의 지배방정식은 운동파모형과 Green-Ampt모형으로 구성되어 있으며, 운동파모형은 유한체적법을 이용하여 이산화 하였다. GPU 가속 모형은 CUDA(Compute Unified Device Architecture) 포트란(Fortran)을 사용하여 개발된 모형으로 수치모의시 연산시간 단축을 고려한 모형이다. 모형의 정확성과 효율성은 미호천 유역에서 발생하는 강우유출현상에 GPU 가속 운동파모형을 적용하여 분석하였다. 수치모의 결과값은 대상유역에 속한 수위관측소의 관측값과 비교하여 정확성을 검증하였고, 수치모의 소요시간은 CPU(Central Processing Unit) 기반 운동파모형의 수치모의 소요시간과 비교하여 효율성을 검증하였다. GPU 가속 운동파모형의 수치모의 결과는 관측값과 유사한 결과를 나타냈으며, 수치모의 소요시간은 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.

  • PDF

Parallel Processing of Satellite Images using CUDA Library: Focused on NDVI Calculation (CUDA 라이브러리를 이용한 위성영상 병렬처리 : NDVI 연산을 중심으로)

  • LEE, Kang-Hun;JO, Myung-Hee;LEE, Won-Hee
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.19 no.3
    • /
    • pp.29-42
    • /
    • 2016
  • Remote sensing allows acquisition of information across a large area without contacting objects, and has thus been rapidly developed by application to different areas. Thus, with the development of remote sensing, satellites are able to rapidly advance in terms of their image resolution. As a result, satellites that use remote sensing have been applied to conduct research across many areas of the world. However, while research on remote sensing is being implemented across various areas, research on data processing is presently insufficient; that is, as satellite resources are further developed, data processing continues to lag behind. Accordingly, this paper discusses plans to maximize the performance of satellite image processing by utilizing the CUDA(Compute Unified Device Architecture) Library of NVIDIA, a parallel processing technique. The discussion in this paper proceeds as follows. First, standard KOMPSAT(Korea Multi-Purpose Satellite) images of various sizes are subdivided into five types. NDVI(Normalized Difference Vegetation Index) is implemented to the subdivided images. Next, ArcMap and the two techniques, each based on CPU or GPU, are used to implement NDVI. The histograms of each image are then compared after each implementation to analyze the different processing speeds when using CPU and GPU. The results indicate that both the CPU version and GPU version images are equal with the ArcMap images, and after the histogram comparison, the NDVI code was correctly implemented. In terms of the processing speed, GPU showed 5 times faster results than CPU. Accordingly, this research shows that a parallel processing technique using CUDA Library can enhance the data processing speed of satellites images, and that this data processing benefits from multiple advanced remote sensing techniques as compared to a simple pixel computation like NDVI.

A Study on Improved Image Matching Method using the CUDA Computing (CUDA 연산을 이용한 개선된 영상 매칭 방법에 관한 연구)

  • Cho, Kyeongrae;Park, Byungjoon;Yoon, Taebok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2749-2756
    • /
    • 2015
  • Recently, Depending on the quality of data increases, the problem of time-consuming to process the image is raised by being required to accelerate the image processing algorithms, in a traditional CPU and CUDA(Compute Unified Device Architecture) based recognition system for computing speed and performance gains compared to OpenMP When character recognition has been learned by the system to measure the input by the character data matching is implemented in an environment that recognizes the region of the well, so that the font of the characters image learning English alphabet are each constant and standardized in size and character an image matching method for calculating the matching has also been implemented. GPGPU (General Purpose GPU) programming platform technology when using the CUDA computing techniques to recognize and use the four cores of Intel i5 2500 with OpenMP to deal quickly and efficiently an algorithm, than the performance of existing CPU does not produce the rate of four times due to the delay of the data of the partition and merge operation proposed a method of improving the rate of speed of about 3.2 times, and the parallel processing of the video card that processes a result, the sequential operation of the process compared to CPU-based who performed the performance gain is about 21 tiems improvement in was confirmed.

Multiple Camera Based Imaging System with Wide-view and High Resolution and Real-time Image Registration Algorithm (다중 카메라 기반 대영역 고해상도 영상획득 시스템과 실시간 영상 정합 알고리즘)

  • Lee, Seung-Hyun;Kim, Min-Young
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.49 no.4
    • /
    • pp.10-16
    • /
    • 2012
  • For high speed visual inspection in semiconductor industries, it is essential to acquire two-dimensional images on regions of interests with a large field of view (FOV) and a high resolution simultaneously. In this paper, an imaging system is newly proposed to achieve high quality image in terms of precision and FOV, which is composed of single lens, a beam splitter, two camera sensors, and stereo image grabbing board. For simultaneously acquired object images from two camera sensors, Zhang's camera calibration method is applied to calibrate each camera first of all. Secondly, to find a mathematical mapping function between two images acquired from different view cameras, the matching matrix from multiview camera geometry is calculated based on their image homography. Through the image homography, two images are finally registered to secure a large inspection FOV. Here the inspection system of using multiple images from multiple cameras need very fast processing unit for real-time image matching. For this purpose, parallel processing hardware and software are utilized, such as Compute Unified Device Architecture (CUDA). As a result, we can obtain a matched image from two separated images in real-time. Finally, the acquired homography is evaluated in term of accuracy through a series of experiments, and the obtained results shows the effectiveness of the proposed system and method.

Fast Generation of Intermediate View Image Using GPGPU-Based Disparity Increment Method (GPGPU 기반의 변위증분 방법을 이용한 중간시점 고속 생성)

  • Koo, Ja-Myung;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.8
    • /
    • pp.1908-1918
    • /
    • 2013
  • Free-view, auto-stereoscopic video service is a next generation broadcasting system which offers a three-dimensional video, images of the various point are needed. This paper proposes a method that parallelizes the algorithm for arbitrary intermediate view-point image fast generation and make it faster using General Propose Graphic Processing Unit(GPGPU) with help of the Compute Unified Device Architecture(CUDA). It uses a parallelized stereo-matching method between the leftmost and the rightmost depth images to obtain disparity information and It use data calculated disparity increment per depth value. The disparity increment is used to find the location in the intermediate view-point image for each depth in the given images. Then, It is eliminate to disocclusions complement each other and remaining holes are filled image using hole-filling method and to get the final intermediate view-point image. The proposed method was implemented and applied to several test sequences. The results revealed that the quality of the generated intermediate view-point image corresponds to 30.47dB of PSNR in average and it takes about 38 frames per second to generate a Full HD intermediate view-point image.

GPU-accelerated Lattice Boltzmann Simulation for the Prediction of Oil Slick Movement in Ocean Environment (GPU 가속 기술을 이용한 격자 볼츠만법 기반 원유 확산 과정 시뮬레이션)

  • Ha, Sol;Ku, Namkug;Roh, Myung-Il
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.6
    • /
    • pp.399-406
    • /
    • 2013
  • This paper describes a new simulation technique for advection-diffusion phenomena over the sea surface using the lattice Boltzmann method (LBM), capable of predicting oil dispersion from tankers. The LBM is used to solve the pollutant transport problem within the framework of the ocean environment. The sea space is represented by the lattices, where each lattice has the information on oil transportation. Since dispersed oils (i.e., oil droplets) at sea are transported by convection due to waves, buoyancy, and turbulent diffusion, the conservation of mass and many physical oil transport rules were used in the prediction model. Since the LBM is modeled using the uniform lattices and simple rules, it can be easily accelerated by the parallel mechanism, for example, GPU-accelerated method. The proposed model using the LBM is used to simulate a simple pollution event with the oil pollutants of 10,000 kL. The simulation results indicate that the LBM method accelerated with the GPU is 6 times faster than that without the GPU.