• Title/Summary/Keyword: GPU process

Search Result 147, Processing Time 0.169 seconds

GPU-based Parallel Ant Colony System for Traveling Salesman Problem

  • Rhee, Yunseok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • In this paper, we design and implement a GPU-based parallel algorithm to effectively solve the traveling salesman problem through an ant color system. The repetition process of generating hundreds or thousands of tours simultaneously in TSP utilizes GPU's task-level parallelism, and the update process of pheromone trails data actively exploits data parallelism by 32x32 thread blocks. In particular, through simultaneous memory access of multiple threads, the coalesced accesses on continuous memory addresses and concurrent accesses on shared memory are supported. This experiment used 127 to 1002 city data provided by TSPLIB, and compared the performance of sequential and parallel algorithms by using Intel Core i9-9900K CPU and Nvidia Titan RTX system. Performance improvement by GPU parallelization shows speedup of about 10.13 to 11.37 times.

An Efficient Method to Track GPS L1 C/A and Galileo E1B CBOC(6,1,1/11) Signal Simultaneously using a Low Cost GPU in SDR

  • Park, Jong-Il;Park, Chansik
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.9 no.4
    • /
    • pp.337-345
    • /
    • 2020
  • In this paper, an efficient signal tracking method to simultaneously track both GPS L1 C/A and Galileo E1B CBOC(6,1,1/11) using a low cost GPU is proposed. In the existing method that each GNSS signal is processed within 1 ms, more than 2 ms processing time is required in GPU to process 4 ms CBOC signal. It means that real time operation is possible if only Galileo E1B CBOC signal is concerned. But when both GPS C/A and Galileo CBOC is required, it cannot process GPS C/A signal in real time. To process 1 ms GPS C/A and 4 ms Galileo CBOC signal in real time, 4 ms Galileo CBOC signal is divided into 4 by 1 ms signal block in the proposed method. Specially, a buffer that simultaneously manages 1 ms and 4 ms signals is designed. In addition, a module that accumulates the 1 ms correlation value of the Galileo CBOC by 4 ms and passes it to the PLL and DLL is implemented. The operation and performance are evaluated with real measurements in the GPU based SDR. The experimental results show that tracking of more than 16 satellites of GPS C/A and Galileo E1B is possible using the proposed method.

A Parallel Processing Technique for Large Spatial Data (대용량 공간 데이터를 위한 병렬 처리 기법)

  • Park, Seunghyun;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.1-9
    • /
    • 2015
  • Graphical processing unit (GPU) contains many arithmetic logic units (ALUs). Because many ALUs can be exploited to process parallel processing, GPU provides efficient data processing. The spatial data require many geographic coordinates to represent the shape of them in a map. The coordinates are usually stored as geodetic longitude and latitude. To display a map in 2-dimensional Cartesian coordinate system, the geodetic longitude and latitude should be converted to the Universal Transverse Mercator (UTM) coordinate system. The conversion to the other coordinate system and the rendering process to represent the converted coordinates to screen use complex floating-point computations. In this paper, we propose a parallel processing technique that processes the conversion and the rendering using the GPU to improve the performance. Large spatial data is stored in the disk on files. To process the large amount of spatial data efficiently, we propose a technique that merges the spatial data files to a large file and access the file with the method of memory mapped file. We implement the proposed technique and perform the experiment with the 747,302,971 points of the TIGER/Line spatial data. The result of the experiment is that the conversion time for the coordinate systems with the GPU is 30.16 times faster than the CPU only method and the rendering time is 80.40 times faster than the CPU.

Performance Improvement of Web Service Based on GPGPU and Task Queue

  • Kim, Changsu;Kim, Kyunghwan;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.4
    • /
    • pp.257-262
    • /
    • 2021
  • Providing web services to users has become expensive in recent times. For better web services, a web server is provided with high-performance technology. To achieve great web service experiences, tools such as general-purpose graphics processing units (GPGPUs), artificial intelligence, high-performance computing, and three-dimensional simulation are widely used. However, graphics processing units (GPUs) are used in high-speed operations and have limited general applications. In this study, we developed a task queue in a GPU to improve the performance of a web service using a multiprocessor and studied how to receive and process user requests in bulk. We propose the use of a GPGPU-based task queue to process user requests more than GPGPU based a central processing unit thread, and to process more GPU threads on task queue at about 136% to 233%, and proved that the proposed method is effective for web service.

A Study on High Speed Face Tracking using the GPGPU-based Depth Information (GPGPU 기반의 깊이 정보를 이용한 고속 얼굴 추적에 대한 연구)

  • Kim, Woo-Youl;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1119-1128
    • /
    • 2013
  • In this paper, we propose an algorithm to detect and track the human face with a GPU-based high speed. Basically the detection algorithm uses the existing Adaboost algorithm but the search area is dramatically reduced by detecting movement and skin color region. Differently from detection process, tracking algorithm uses only depth information. Basically it uses a template matching method such that it searches a matched block to the template. Also, In order to fast track the face, it was computed in parallel using GPU about the template matching. Experimental results show that the GPU speed when compared with the CPU has been increased to up to 49 times.

Design of Virtual Machine for Vertex Shader (정점 셰이더의 가상 기계 구현)

  • Ha, Chang-Soo;Kim, Ju-Hong;Choi, Byeong-Yoon
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.1003-1006
    • /
    • 2005
  • Vertex shader of GPU in personal computer is advanced in functions as to be half of traditional fixed T&L functions. And, capacity of memory for saving resources to process instructions is unlimited. GPU that can be programmed by programmer is needed for mobile system as well as personal computer. In this paper, we implement software virtual machine for vertex shader using C++ Language. Our goal is designing hardware GPU that can apply to mobile system. The virtual machine consists of nVidia GPU instructions. Input Data to virtual machine is generated by Microsoft fxc compiler. That is to say, Input Data is compiled shader program written in HLSL, Cg, or ASM. The virtual machine will be a reference model for designing hardware GPU and can be used for Testbed to test added or modified instruction.

  • PDF

Implementation of Integrated CPU-GPU for Efficient Uniform Memory Access Method and Verification System (CPU-GPU간 긴밀성을 위한 효율적인 공유메모리 접근 방법과 검증 시스템 구현)

  • Park, Hyun-moon;Kwon, Jinsan;Hwang, Tae-ho;Kim, Dong-Sun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.57-65
    • /
    • 2016
  • In this paper, we propose a system for efficient use of shared memory between CPU and GPU. The system, called Fusion Architecture, assures consistency of the shared memory and minimizes cache misses that frequently occurs on Heterogeneous System Architecture or Unified Virtual Memory based systems. It also maximizes the performance for memory intensive jobs by efficient allocation of GPU cores. To test between architectures on various scenarios, we introduce the Fusion Architecture Analyzer, which compares OpenMP, OpenCL, CUDA, and the proposed architecture in terms of memory overhead and process time. As a result, Proposed fusion architectures show that the Fusion Architecture runs benchmarks 55% faster and reduces memory overheads by 220% in average.

Accelerating the Sweep3D for a Graphic Processor Unit

  • Gong, Chunye;Liu, Jie;Chen, Haitao;Xie, Jing;Gong, Zhenghu
    • Journal of Information Processing Systems
    • /
    • v.7 no.1
    • /
    • pp.63-74
    • /
    • 2011
  • As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the finegrained parallel architecture of the GPU. Our results show that the overall performance of Sweep3D on the CPU-GPU hybrid platform can be improved up to 4.38 times as compared to the CPU-based implementation.

Spectral Modeling Synthesis of Haegeum using GPU (GPU를 이용한 해금의 스펙트럼 모델링)

  • Islam, Md Shohidul;Islam, Md Rashedul;Farid, Fahmid Al;Kim, Jong-Myon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.5-8
    • /
    • 2014
  • This paper presents a parallel approach of formant synthesis method for haegeum on graphics processing units (GPU) using spectral modeling. Spectral modeling synthesis (SMS) is a technique that models time-varying spectra as a combination of sinusoids and a time-varying filtered noise component. A second-order digital resonator by the impulse-invariant transform (IIT) is applied to generate deterministic components and the results are band-pass filtered to adjust magnitude. The noise is calculated by first generating the sinusoids with formant synthesis, subtracting them from the original sound, and then removing some harmonics remained. The synthesized sounds are consequently by adding sinusoids, which are shown to be similar to the original Haegeum sounds. Furthermore, GPU accelerates the synthesis process enabling- real time music synthesis system development, supporting more sound effect, and multiple musical sound compositions.

  • PDF

Accelerating Soft-Decision Reed-Muller Decoding Using a Graphics Processing Unit

  • Uddin, Md. Sharif;Kim, Cheol Hong;Kim, Jong-Myon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.4 no.2
    • /
    • pp.369-378
    • /
    • 2014
  • The Reed-Muller code is one of the efficient algorithms for multiple bit error correction, however, its high-computation requirement inherent in the decoding process prohibits its use in practical applications. To solve this problem, this paper proposes a graphics processing unit (GPU)-based parallel error control approach using Reed-Muller R(r, m) coding for real-time wireless communication systems. GPU offers a high-throughput parallel computing platform that can achieve the desired high-performance decoding by exploiting massive parallelism inherent in the algorithm. In addition, we compare the performance of the GPU-based approach with the equivalent sequential approach that runs on the traditional CPU. The experimental results indicate that the proposed GPU-based approach exceedingly outperforms the sequential approach in terms of execution time, yielding over 70× speedup.