• Title/Summary/Keyword: GPGPU computing

Search Result 87, Processing Time 0.021 seconds

GPU-accelerated Reliability Analysis Method using Dynamic Reliability Block Diagram based on DEVS Formalism (DEVS 형식론 기반의 Dynamic Reliability Block Diagram과 GPU 가속 기술을 이용한 신뢰도 분석 방법)

  • Ha, Sol;Ku, Namkug;Roh, Myung-Il
    • Journal of the Korea Society for Simulation
    • /
    • v.22 no.4
    • /
    • pp.109-118
    • /
    • 2013
  • This paper adopts the system configuration to assess the reliability instead of making a fault tree (FT), which is a traditional method to analyze reliability of a certain system; this is the reliability block diagram (RBD) method. The RBD method is a graphical presentation of a system diagram connecting the subsystems of components according to their functions or reliability relationships. The equipment model for the reliability simulation is modeled based on the discrete event system specification (DEVS) formalism. In order to make various alternatives of target system, this paper also adopts the system entity structure (SES), an ontological framework that hierarchically represents the elements of a system and their relationships. To enhance the calculation time of reliability analysis, GPU-based accelerations are adopted to the reliability simulation.

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Automatic Optimization Methods for Image Processing Programs Using OpenCL (OpenCL을 이용한 이미지 처리 프로그램의 자동 최적화 방법)

  • Shin, Jaeho;Jo, Gangwon;Lee, Ilkoo;Lee, Jaejin
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.188-193
    • /
    • 2017
  • In this paper, we propose automatic OpenCL optimization techniques that offer the best performance for image processing programs on any hardware system. Developers should seek a proper way of parallelization and an appropriate work-group size for the architecture of target compute devices to achieve the best performance. However, testing potential devices to find them is both time-consuming and costly. Our techniques automatically set up hardware-optimized parallelization and find a suitable work-group size for the target device. Furthermore, using OpenCL does not always provide better performance in image processing. Hence, we also propose a way to automatically search for a threshold image size to allow image processing programs to decide whether or not to use OpenCL. Our findings demonstrate that out techniques improve the image processing performance significantly.

Mobile Advanced Driver Assistance System using OpenCL : Pedestrian Detection (OpenCL을 이용한 모바일 ADAS : 보행자 검출)

  • Kim, Jong-Hee;Lee, Chung-Su;Kim, Hakil
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.10
    • /
    • pp.190-196
    • /
    • 2014
  • This paper proposes a mobile-optimized pedestrian detection method using Cascade of HOG(Histograms of Oriented Gradients) for ADAS(Advanced Driver Assistance System) on smartphones. In order to use the limited resource of mobile platforms efficiently, the method is implemented by the OpenCL(Open Computing Language) library, and its processing time is reduced in the following two aspects. Firstly, the method sets a program build option specifically and adjusts work group sizes as variety of kernels in the host code. Secondly, it utilizes local memory and a LUT(Look-Up Table) in the kernel code to accelerate the program. For performance evaluation, the developed algorithm is compared with the mobile CPU-based OpenCV(Open Computer Vision) for Android function. The experimental results show that the processing speed is 25% faster than the OpenCV hogcascade.

Optimization of Color Format Conversion of WebCam Images Using the CUDA (CUDA를 이용한 웹캠 영상의 색상 형식 변환 최적화)

  • Kim, Jin-Woo;Jung, Yun-Hye;Park, Jin-Hong;Park, Yong-Jin;Han, Tack-Don
    • Journal of Korea Game Society
    • /
    • v.11 no.1
    • /
    • pp.147-157
    • /
    • 2011
  • Webcam doesn't perform memory-alignment in order to reduce the transmission time of image data. Memory-unaligned image data is unsuitable for the processing on GPU. Accordingly, we convert it to available color format for optimization in high speed image processing. In this paper, we propose a technique that accelerates webcam's color format conversion by using NVDIA CUDA. We propose an optimization which is about memory accesses and thread composition, also evaluate memory and computing performance for verifying a hypothesis which is the performance of the proposed architecture and optimizing degree on low-performance GPU. Following the optimization technique, we show performance improvements over maximum 68 percent.

Performance Evaluation of the GPU Architecture Executing Parallel Applications (병렬 응용프로그램 실행 시 GPU 구조에 따른 성능 분석)

  • Choi, Hong-Jun;Kim, Cheol-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.10-21
    • /
    • 2012
  • The role of GPU has evolved from graphics-specific processing to general-purpose processing with the development of unified shader core architecture. Especially, execution methods for general-purpose parallel applications using GPU have been researched intensively, since the parallel hardware architecture can be utilized efficiently when the parallel applications are executed. However, current GPU architecture has limitations in executing general-purpose parallel applications, since the GPU is not specialized for general-purpose computing yet. To improve the GPU performance when general-purpose parallel applications are executed, the GPU architecture should be evolved. In this work, we analyze the GPU performance according to the architecture varying the number of cores and clock frequency. Our simulation results show that the GPU performance improves by up to 125.8% and 16.2% as the number of cores increases and the clock frequency increases, respectively. However, note that the improvement of the GPU performance is saturated even though the number of cores increases and the clock frequency increases continuously, since the data cannot be provided to the GPU due to the limit of memory bandwidth. Consequently, to accomplish high performance effectiveness on GPU, computational resources must be more suitably considered.

Real-time Color Recognition Based on Graphic Hardware Acceleration (그래픽 하드웨어 가속을 이용한 실시간 색상 인식)

  • Kim, Ku-Jin;Yoon, Ji-Young;Choi, Yoo-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.1-12
    • /
    • 2008
  • In this paper, we present a real-time algorithm for recognizing the vehicle color from the indoor and outdoor vehicle images based on GPU (Graphics Processing Unit) acceleration. In the preprocessing step, we construct feature victors from the sample vehicle images with different colors. Then, we combine the feature vectors for each color and store them as a reference texture that would be used in the GPU. Given an input vehicle image, the CPU constructs its feature Hector, and then the GPU compares it with the sample feature vectors in the reference texture. The similarities between the input feature vector and the sample feature vectors for each color are measured, and then the result is transferred to the CPU to recognize the vehicle color. The output colors are categorized into seven colors that include three achromatic colors: black, silver, and white and four chromatic colors: red, yellow, blue, and green. We construct feature vectors by using the histograms which consist of hue-saturation pairs and hue-intensity pairs. The weight factor is given to the saturation values. Our algorithm shows 94.67% of successful color recognition rate, by using a large number of sample images captured in various environments, by generating feature vectors that distinguish different colors, and by utilizing an appropriate likelihood function. We also accelerate the speed of color recognition by utilizing the parallel computation functionality in the GPU. In the experiments, we constructed a reference texture from 7,168 sample images, where 1,024 images were used for each color. The average time for generating a feature vector is 0.509ms for the $150{\times}113$ resolution image. After the feature vector is constructed, the execution time for GPU-based color recognition is 2.316ms in average, and this is 5.47 times faster than the case when the algorithm is executed in the CPU. Our experiments were limited to the vehicle images only, but our algorithm can be extended to the input images of the general objects.