• Title/Summary/Keyword: GPU implementation

Search Result 147, Processing Time 0.029 seconds

Implementation of LTE uplink System for SDR Platform using CUDA and UHD (CUDA와 UHD를 이용한 SDR 플랫폼 용 LTE 상향링크 시스템 구현)

  • Ahn, Chi Young;Kim, Yong;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.9 no.2
    • /
    • pp.81-87
    • /
    • 2013
  • In this paper, we present an implementation of Long Term Evolution (LTE) Uplink (UL) system on a Software Defined Radio (SDR) platform using a conventional Personal Computer (PC), which adopts Graphic Processing Units (GPU) and Universal Software Radio Peripheral2 (USRP2) with URSP Hardware Driver (UHD) for SDR software modem and Radio Frequency (RF) transceiver, respectively. We have adopted UHD because UHD provides flexibility in the design of transceiver chain. Also, Cognitive Radio (CR) engine have been implemented by using libraries from UHD. Meanwhile, we have implemented the software modem in our system on GPU which is suitable for parallel computing due to its powerful Arithmetic and Logic Units (ALUs). From our experiment tests, we have measured the total processing time for a single frame of both transmit and receive LTE UL data to find that it takes about 5.00ms and 6.78ms for transmit and receive, respectively. It particularly means that the implemented system is capable of real-time processing of all the baseband signal processing algorithms required for LTE UL system.

Implementation of Retransmission in TDD LTE MU-MIMO system using GPU (GPU를 이용한 TDD LTE MU-MIMO 시스템에서의 재전송 구현)

  • Park, Jonggeun;Choi, Seungwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.13 no.2
    • /
    • pp.35-42
    • /
    • 2017
  • The TDD LTE MU-MIMO HARQ system is designed and implemented using GPU based on 3GPP Rel.10 standard. The system consists of the DU part of the base station and the terminal using the general computer based on the GeForce GTX TITAN graphics card provided by NIVIDIA, and constructed the part of the RU using USRP N210 provided by Ettus. In the implementation part, SDR standard is applied, so that various communication standards can be compatible with software. The retransmission is implemented by combining the previous data with the retransmission data using Chase Combining among HARQ methods. In order to confirm that the retransmission was successful, the performance evaluation used LLR constellation. First, if there is an error in the data, the LLR value is not distributed at the corresponding position. in this case, a retransmission is performed to chase combine the previously stored error data and retransmitted data. As a result, the LLR value was distributed at the position of the corresponding LLR value per bit. Through this, it can be confirmed that error - free data is formed by using Chase Combining after retransmission.

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

Digital Image based Real-time Sea Fog Removal Technique using GPU (GPU를 이용한 영상기반 고속 해무제거 기술)

  • Choi, Woon-sik;Lee, Yoon-hyuk;Seo, Young-ho;Choi, Hyun-jun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.12
    • /
    • pp.2355-2362
    • /
    • 2016
  • Seg fog removal is an important issue concerned by both computer vision and image processing. Sea fog or haze removal is widely used in lots of fields, such as automatic control system, CCTV, and image recognition. Color image dehazing techniques have been extensively studied, and expecially the dark channel prior(DCP) technique has been widely used. This paper propose a fast and efficient image prior - dark channel prior to remove seg-fog from a single digital image based on the GPU. We implement the basic parallel program and then optimize it to obtain performance acceleration with more than 250 times. While paralleling and the optimizing the algorithm, we improve some parts of the original serial program or basic parallel program according to the characteristics of several steps. The proposed GPU programming algorithm and implementation results may be used with advantages as pre-processing in many systems, such as safe navigation for ship, topographical survey, intelligent vehicles, etc.

Design of Line Scratch Detection and Restoration Algorithm using GPU (GPU를 이용한 선형 스크래치 탐지와 복원 알고리즘의 설계)

  • Lee, Joon-Goo;Shim, She-Yong;You, Byoung-Moon;Hwang, Doo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.9-16
    • /
    • 2014
  • This paper proposes a linear scratch detection and restoration algorithm using pixel data comparison in a single frame or consecutive frames. There exists a high parallelism in that a scratch detection and restoration algorithm needs a large amount of comparison operations. The proposed scratch detection and restoration algorithm is designed with a GPU for fast computation. We test the proposed algorithm in sequential and parallel processing with the set of digital videos in National Archive of Korea. In the experiments, the scratch detection rate of consecutive frames is as fast as about 20% for that of a single frame. The detection and restoration rates of a GPU-based algorithm are similar to those of a CPU-based algorithm, but the parallel implementation speeds up to about 50 times.

Implementation and Performance Evaluation of Vector based Rasterization Algorithm using a Many-Core Processor (매니코어 프로세서를 이용한 벡터 기반 래스터화 알고리즘 구현 및 성능평가)

  • Shon, Dong-Koo;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.2
    • /
    • pp.87-93
    • /
    • 2013
  • In this paper, we implemented and evaluated the performance of a vector-based rasterization algorithm of 3D graphics using a SIMD-based many-core processor that consists of 4,096 processing elements. In addition, we compared the performance and efficiency of the rasterization algorithm using the many-core processor and commercial GPU (Graphics Processing Unit) system which consists of 7 GPUs and each of which have 512 cores. Experimental results showed that the SIMD-based many-core processor outperforms the commercial GPU system in terms of execution time (3.13x speedup), energy efficiency (17.5x better), and area efficiency (13.3x better). These results demonstrate that the SIMD-based many-core processor has potential as an embedded mobile processor.

Implementation of Stereo Matching Algorithm Using GPU (GPU를 이용한 스테레오 정합 알고리즘 구현)

  • Choi, Hyun-Jun;Choi, Ji-Youn;Seo, Young-Ho;Kim, Dong-Wook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.11a
    • /
    • pp.206-208
    • /
    • 2010
  • 본 논문에서는 최종 변이영상의 정확도를 높이기 위해 영상의 특징점을 이용한 적응적 가변 정합창 방법과 교차 일치성 검사의 신뢰도를 높이는 방법을 제안한다. 제안한 적응적 가변 정합창 방법은 색상정보를 이용하여 영상을 분할하고 분할된 각 영상의 특징점을 찾아 그 특징점들의 유무에 따라 정합창의 크기를 적응적으로 가변시키는 방법이다. 또한 제안한 알고리즘을 GPU를 기반으로 구현하여 연산속도가 평균 128배 빨라졌다.

  • PDF

Implementation of PSO(Particle Swarm Optimization) Algorithm using Parallel Processing of GPU (GPU의 병렬 처리 기능을 이용한 PSO(Particle Swarm Optimization) 알고리듬 구현)

  • Kim, Eun-Su;Kim, Jo-Hwan;Kim, Jong-Wook
    • Proceedings of the KIEE Conference
    • /
    • 2008.10b
    • /
    • pp.181-182
    • /
    • 2008
  • 본 논문에서는 연산 최적화 알고리듬 중 PSO(Particle Swarm Optimization) 알고리듬을 NVIDIA사(社)에서 제공한 CUDA(Compute Unified Device Architecture)를 이용하여 새롭게 구현하였다. CUDA는 CPU가 아닌 GPU(Graphic Processing Unit)의 다양한 병렬 처리 능력을 사용해 복잡한 컴퓨팅 문제를 해결하는 소프트웨어 개발을 가능케 하는 기술이다. 이 기술을 연산 최적화 알고리듬 중 PSO에 적용함으로써 알고리듬의 수행 속도를 개선하였다. CUDA를 적용한 PSO 알고리듬의 검증을 위해 언어 기반으로 프로그래밍하고 다양한 Test Function을 통해 시뮬레이션 하였다. 그리고 기존의 PSO 알고리듬과 비교 분석하였다. 또한 알고리듬의 성능 향상으로 여러 가지 최적화 분야에 적용 할 수 있음을 보인다.

  • PDF

Accelerating Molecular Dynamics Simulation Using Graphics Processing Unit

  • Myung, Hun-Joo;Sakamaki, Ryuji;Oh, Kwang-Jin;Narumi, Tetsu;Yasuoka, Kenji;Lee, Sik
    • Bulletin of the Korean Chemical Society
    • /
    • v.31 no.12
    • /
    • pp.3639-3643
    • /
    • 2010
  • We have developed CUDA-enabled version of a general purpose molecular dynamics simulation code for GPU. Implementation details including parallelization scheme and performance optimization are described. Here we have focused on the non-bonded force calculation because it is most time consuming part in molecular dynamics simulation. Timing results using CUDA-enabled and CPU versions were obtained and compared for a biomolecular system containing 23558 atoms. CUDA-enabled versions were found to be faster than CPU version. This suggests that GPU could be a useful hardware for molecular dynamics simulation.

Implementation of Particle Swarm Optimization Method Using CUDA (CUDA를 이용한 Particle Swarm Optimization 구현)

  • Kim, Jo-Hwan;Kim, Eun-Su;Kim, Jong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.1019-1024
    • /
    • 2009
  • In this paper, particle swarm optimization(PSO) is newly implemented by CUDA(Compute Unified Device Architecture) and is applied to function optimization with several benchmark functions. CUDA is not CPU but GPU(Graphic Processing Unit) that resolves complex computing problems using parallel processing capacities. In addition, CUDA helps one to develop GPU softwares conveniently. Compared with the optimization result of PSO executed on a general CPU, CUDA saves about 38% of PSO running time as average, which implies that CUDA is a promising frame for real-time optimization and control.