Search | Korea Science

Implementation of Viterbi Decoder on Massively Parallel GPU for DVB-T Receiver (DVB-T 수신기를 위한 대규모 병렬처리 GPU 기반의 비터비 복호기 구현)

Lee, KyuHyung;Lee, Ho-Kyoung;Heo, Seo Weon
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.9
- /
- pp.3-11
- /
- 2013
Recently, a plenty of researches have been conducted using the massively parallel processing of GPU for the implementation of communication system. In this paper, we tried to reduce software simulation time applying GPU with sliding block method to Viterbi decoder in DVB-T system which is one of European DTV standards. First of all, we implement DVB-T system by CPU and estimate cost time whereby the system processes one OFDM symbol. Secondly, we implement Viterbi decoder by software using NVIDIA's massive GPU processor. In our work, stream process method is applied to reduce the overhead for data transfer between CPU and GPU, as well as coalescing method to lower the global memory access time. In addition, data structure design method is used to maximize the shared memory usage. Consequently, our proposed method is approximately 11 times faster in 2K mode and 60 times faster in 8K mode for the process in Viterbi decoder.
https://doi.org/10.5573/ieek.2013.50.9.003 인용 PDF KSCI

Performance Analysis and Enhancing Techniques of Kd-Tree Traversal Methods on GPU (GPU용 Kd-트리 탐색 방법의 성능 분석 및 향상 기법)

Chang, Byung-Joon;Ihm, In-Sung
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.2
- /
- pp.177-185
- /
- 2010
Ray-object intersection is an important element in ray tracing that takes up a substantial amount of computing time. In general, such spatial data structure as kd-tree has been frequently used for static scenes to accelerate the intersection computation. Recently, a few variants of kd-tree traversal have been proposed suitable for the GPU that has a relatively restricted computing architecture compared to the CPU. In this article, we propose yet another two implementation techniques that can improve those previous ones. First, we present a cached stack method that is aimed to reduce the costly global memory access time needed when the stack is allocated to global memory. Secondly, we present a rope-with-short-stack method that eases the substantial memory requirement, often necessary for the previous rope method. In order to show the effectiveness of our techniques, we compare their performances with those of the previous GPU traversal methods. The experimental results will provide prospective GPU ray tracer developers with valuable information, helping them choose a proper kd-tree traversal method.
PDF KSCI

GPU-based Parallel Ant Colony System for Traveling Salesman Problem

Rhee, Yunseok
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.2
- /
- pp.1-8
- /
- 2022
In this paper, we design and implement a GPU-based parallel algorithm to effectively solve the traveling salesman problem through an ant color system. The repetition process of generating hundreds or thousands of tours simultaneously in TSP utilizes GPU's task-level parallelism, and the update process of pheromone trails data actively exploits data parallelism by 32x32 thread blocks. In particular, through simultaneous memory access of multiple threads, the coalesced accesses on continuous memory addresses and concurrent accesses on shared memory are supported. This experiment used 127 to 1002 city data provided by TSPLIB, and compared the performance of sequential and parallel algorithms by using Intel Core i9-9900K CPU and Nvidia Titan RTX system. Performance improvement by GPU parallelization shows speedup of about 10.13 to 11.37 times.
https://doi.org/10.9708/jksci.2022.27.02.001 인용 PDF KSCI HTML

Domain decomposition for GPU-Based continuous energy Monte Carlo power reactor calculation

Choi, Namjae;Joo, Han Gyu
- Nuclear Engineering and Technology
- /
- v.52 no.11
- /
- pp.2667-2677
- /
- 2020
A domain decomposition (DD) scheme for GPU-based Monte Carlo (MC) calculation which is essential for whole-core depletion is introduced within the framework of the modified history-based tracking algorithm. Since GPU-offloaded MC calculations suffer from limited memory capacity, employing DDMC is inevitable for the simulation of depleted cores which require large storage to save hundreds of newly generated isotopes. First, an automated domain decomposition algorithm named wheel clustering is devised such that each subdomain contains nearly the same number of fuel assemblies. Second, an innerouter iteration algorithm allowing overlapped computation and communication is introduced which enables boundary neutron transactions during the tracking of interior neutrons. Third, a bank update scheme which is to include the boundary sources in a way to be adequate to the peculiar data structures of the GPU-based neutron tracking algorithm is presented. The verification and demonstration of the DDMC method are done for 3D full-core problems: APR1400 fresh core and a mock-up depleted core. It is confirmed that the DDMC method performs comparably with the standard MC method, and that the domain decomposition scheme is essential to carry out full 3D MC depletion calculations with limited GPU memory capacities.
https://doi.org/10.1016/j.net.2020.04.024 인용 PDF KSCI

Accurate and efficient GPU ray-casting algorithm for volume rendering of unstructured grid data

Gu, Gibeom;Kim, Duksu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.608-618
- /
- 2020
We present a novel GPU-based ray-casting algorithm for volume rendering of unstructured grid data. Our volume rendering system uses a ray-casting method that guarantees accurate rendering results. We also employ the per-pixel intersection list concept in the Bunyk algorithm to guarantee an accurate result for non-convex meshes. For efficient memory access for the lists on the GPU, we represent the intersection lists for all faces as an array with our novel construction algorithm. With the intersection lists, we perform ray-casting on a GPU, and a GPU thread handles each ray. To increase ray-coherency in a thread block and improve memory access efficiency, we extend a prior image-tile-based work distribution method to fit modern GPU architectures. We also show that a prior approach using a per-thread local buffer to reduce redundant computation is not appropriate for modern GPU architectures. Instead, we take an on-demand calculation strategy that achieves better performance even though it allows duplicate computations. We applied our method to three unstructured grid datasets with different characteristics. With a GPU, our method achieved up to 36.5 times higher performance for the ray-casting process and 19.7 times higher performance for the whole volume rendering process compared with the Bunyk algorithm using a CPU core. Also, our approach showed up to 8.2 times higher performance than a GPU-based cell projection method while generating more accurate rendering results. These results demonstrate the efficiency and accuracy of our method.
https://doi.org/10.4218/etrij.2019-0185 인용 PDF KSCI

An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU (Coloring이 적용된 Gauss-Seidel 해법을 통한 CPU와 GPU의 연산 효율에 관한 연구)

Yoon, Jong Seon;Jeon, Byoung Jin;Choi, Hyoung Gwon
- Transactions of the Korean Society of Mechanical Engineers B
- /
- v.41 no.2
- /
- pp.117-124
- /
- 2017
The performance of the colored Gauss-Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss-Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss-Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.
https://doi.org/10.3795/KSME-B.2017.41.2.117 인용 PDF KSCI

Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU (GPU의 공유메모리를 활용한 확장편집거리 병렬계산)

Kim, Youngho;Na, Joong Chae;Sim, Jeong Seop
- KIPS Transactions on Computer and Communication Systems
- /
- v.4 no.7
- /
- pp.213-218
- /
- 2015
Given two strings X and Y (|X|=m, |Y|=n) over an alphabet ${\Sigma}$, the extended edit distance between X and Y can be computed using dynamic programming in O(mn) time and space. Recently, a parallel algorithm that takes O(m+n) time and O(mn) space using m threads to compute the extended edit distance between X and Y was presented. In this paper, we present an improved parallel algorithm using the shared memory on GPU. The experimental results show that our parallel algorithm runs about 19~25 times faster than the previous parallel algorithm.
https://doi.org/10.3745/KTCCS.2015.4.7.213 인용 PDF KSCI

Analyzing delay of Kernel function owing to GPU memory input from multiple VMs in RPC-based GPU virtualization environments (RPC 기반 GPU 가상화 환경에서 다중 가상머신의 GPU 메모리 입력으로 인한 커널 함수의 지연 문제 분석)

Kang, Jihun;Kim, Soo Kyun
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2021.07a
- /
- pp.541-542
- /
- 2021
클라우드 컴퓨팅 환경에서는 고성능 컴퓨팅을 지원하기 위해 사용자에게 GPU(Graphic Processing Unit)가 할당된 가상머신을 제공하여 사용자가 고성능 응용을 실행할 수 있도록 지원한다. 일반적인 컴퓨팅 환경에서 한 명의 사용자가 GPU를 독점해서 사용하기 때문에 자원 경쟁으로 인한 문제가 상대적으로 적게 발생하지만 독립적인 여러 사용자가 컴퓨팅 자원을 공유하는 클라우드 환경에서는 자원 경쟁으로 인해 서로 성능 영향을 미치는 문제를 발생시킨다. 본 논문에서는 여러 개의 가상머신이 단일 GPU를 공유하는 RPC(Remote Procedure Call) 기반 GPU 가상화 환경에서 다수의 가상머신이 GPGPU(General Purpose computing on Graphics Processing Units) 작업을 수행할 때 GPU 메모리 입력 경쟁으로 인해 발생하는 커널 함수의 실행 지연 문제를 분석한다.
PDF

Analyzing the performance of training tasks based on GPU memory use manner of TensorFlow in Container environments (컨테이너 환경에서 텐서플로의 GPU 메모리 사용방식에 따른 학습 작업의 성능 분석)

Jihun Kang;Joon-Min Gil
- Proceedings of the Korea Information Processing Society Conference
- /
- 2023.05a
- /
- pp.60-62
- /
- 2023
인공지능의 학습 작업은 연산량이 많아 고성능 연산 장치인 GPU(Graphics Processing Unit)를 필요로 하며, GPU 장치의 성능은 학습 작업의 실행 성능에 직접적으로 영향을 미치는 요소 중 하나로 작용한다. 인공지능 작업을 처리하기 위해 많이 사용되는 텐서플로의 경우 GPU를 사용해 연산을 수행할 때 기본적으로 거의 모든 GPU 메모리 영역을 단일 학습 작업이 점유하도록 GPU 메모리를 관리한다. 이 방법은 컴퓨팅 자원 중 확장성이 가장 낮은 GPU 메모리의 단편화를 방지하기 위해 사용되는 방법이지만, 하나의 학습 작업이 GPU를 점유하게 되면, 실제 GPU 메모리 사용량과 상관없이 다른 프로세스는 GPU를 사용할 수 없는 문제를 유발한다. 특히, 전이학습, 소규모 학습과 같이 상대적으로 작업 규모가 작은 경우에는 전체 GPU 메모리 용량 중 대부분의 영역이 낭비된다. 본 논문에서는 컨테이너 환경에서 텐서플로의 기본 GPU 메모리 사용 방식으로 인해 다수의 학습 작업을 동시 실행하는 것이 불가능한 문제를 확인하고 GPU 메모리 사용량을 제한한 경우와 하지 않은 경우에 실제 GPU 메모리 사용량과 학습 작업의 실행 시간에 대한 성능 비교를 통해 GPU 메모리의 단편화 방지가 성능에 유의미한 요소인지 검증한다.
https://doi.org/10.3745/PKIPS.y2023m05a.60 인용 PDF

GPU Resource Contention Management Technique for Simultaneous GPU Tasks in the Container Environments with Share the GPU (GPU를 공유하는 컨테이너 환경에서 GPU 작업의 동시 실행을 위한 GPU 자원 경쟁 관리기법)

Kang, Jihun
- KIPS Transactions on Computer and Communication Systems
- /
- v.11 no.10
- /
- pp.333-344
- /
- 2022
In a container-based cloud environment, multiple containers can share a graphical processing unit (GPU), and GPU sharing can minimize idle time of GPU resources and improve resource utilization. However, in a cloud environment, GPUs, unlike CPU or memory, cannot logically multiplex computing resources to provide users with some of the resources in an isolated form. In addition, containers occupy GPU resources only when performing GPU operations, and resource usage is also unknown because the timing or size of each container's GPU operations is not known in advance. Containers unrestricted use of GPU resources at any given point in time makes managing resource contention very difficult owing to where multiple containers run GPU tasks simultaneously, and GPU tasks are handled in black box form inside the GPU. In this paper, we propose a container management technique to prevent performance degradation caused by resource competition when multiple containers execute GPU tasks simultaneously. Also, this paper demonstrates the efficiency of container management techniques that analyze and propose the problem of degradation due to resource competition when multiple containers execute GPU tasks simultaneously through experiments.
https://doi.org/10.3745/KTCCS.2022.11.10.333 인용 PDF KSCI

Search Result 150, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)