• Title/Summary/Keyword: NVIDIA

Search Result 163, Processing Time 0.029 seconds

GPU-based Parallel Ant Colony System for Traveling Salesman Problem

  • Rhee, Yunseok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • In this paper, we design and implement a GPU-based parallel algorithm to effectively solve the traveling salesman problem through an ant color system. The repetition process of generating hundreds or thousands of tours simultaneously in TSP utilizes GPU's task-level parallelism, and the update process of pheromone trails data actively exploits data parallelism by 32x32 thread blocks. In particular, through simultaneous memory access of multiple threads, the coalesced accesses on continuous memory addresses and concurrent accesses on shared memory are supported. This experiment used 127 to 1002 city data provided by TSPLIB, and compared the performance of sequential and parallel algorithms by using Intel Core i9-9900K CPU and Nvidia Titan RTX system. Performance improvement by GPU parallelization shows speedup of about 10.13 to 11.37 times.

2-Stage Detection and Classification Network for Kiosk User Analysis (디스플레이형 자판기 사용자 분석을 위한 이중 단계 검출 및 분류 망)

  • Seo, Ji-Won;Kim, Mi-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.5
    • /
    • pp.668-674
    • /
    • 2022
  • Machine learning techniques using visual data have high usability in fields of industry and service such as scene recognition, fault detection, security and user analysis. Among these, user analysis through the videos from CCTV is one of the practical way of using vision data. Also, many studies about lightweight artificial neural network have been published to increase high usability for mobile and embedded environment so far. In this study, we propose the network combining the object detection and classification for mobile graphic processing unit. This network detects pedestrian and face, classifies age and gender from detected face. Proposed network is constructed based on MobileNet, YOLOv2 and skip connection. Both detection and classification models are trained individually and combined as 2-stage structure. Also, attention mechanism is used to improve detection and classification ability. Nvidia Jetson Nano is used to run and evaluate the proposed system.

GPU Acceleration of Range Doppler Algorithm for Real-Time SAR Image Generation (실시간 SAR 영상 생성을 위한 Range Doppler Algorithm의 GPU 가속)

  • Dong-Min Jeong;Woo-Kyung Lee;Myeong-Jin Lee;Yun-Ho Jung
    • Journal of IKEEE
    • /
    • v.27 no.3
    • /
    • pp.265-272
    • /
    • 2023
  • In this paper, a GPU-accelerated kernel of range Doppler algorithm (RDA) was developed for real-time image formation based on frequency modulated continuous wave (FMCW) synthetic aperture radar (SAR). A pinned memory was used to minimize the data transfer time between the host and the GPU device, and the kernel was configured to perform all RDA operations on the GPU to minimize the number of data transfers. The dataset was obtained through the FMCW drone SAR experiment, and the GPU acceleration effect was measured in an intel i7-9700K CPU, 32GB RAM, and Nvidia RTX 3090 GPU environment. Including the data transfer time between host and devices, it was measured to be accelerated up to 3.41 times compared to the CPU, and when only the acceleration effect of operation was measured without including the data transfer time, it was confirmed that it could be accelerated up to 156 times.

A Execution Performance Analysis of Applications using Multi-Process Service over GPU (다중 프로세스 서비스를 이용한 GPU 응용 동시 실행 성능 분석)

  • Kim, Se-Jin;Oh, Ji-Sun;Kim, Yoonhee
    • KNOM Review
    • /
    • v.22 no.1
    • /
    • pp.60-67
    • /
    • 2019
  • Graphical Processing Units(GPUs) achieve high performance undertaking from relatively uniformed computation in parallel. The technology related to General Purpose GPU(GPGPU) has been enhanced, which provides concurrent kernel execution of multi and diverse applications at the same time, but it is still limited to support resource sharing or planning. NVIDIA recently introduces Multi-Process Service(MPS), which allows kernels from different applications can be execute concurrently. However, the strength of MPS comes along with the characteristics of applications and the order of their execution. This paper shows the performance analysis of diverse scientific applications in real world. Based on the analysis, we prove that it is important to the identify characteristics of co-run applications, and to schedule multiple applications via profiling to maximize MPS functionality.

Development of a Flooding Detection Learning Model Using CNN Technology (CNN 기술을 적용한 침수탐지 학습모델 개발)

  • Dong Jun Kim;YU Jin Choi;Kyung Min Park;Sang Jun Park;Jae-Moon Lee;Kitae Hwang;Inhwan Jung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.6
    • /
    • pp.1-7
    • /
    • 2023
  • This paper developed a training model to classify normal roads and flooded roads using artificial intelligence technology. We expanded the diversity of learning data using various data augmentation techniques and implemented a model that shows good performance in various environments. Transfer learning was performed using the CNN-based Resnet152v2 model as a pre-learning model. During the model learning process, the performance of the final model was improved through various parameter tuning and optimization processes. Learning was implemented in Python using Google Colab NVIDIA Tesla T4 GPU, and the test results showed that flooding situations were detected with very high accuracy in the test dataset.

Research Trends in Domestic and International Al chips (국내외 인공지능 반도체에 대한 연구 동향 )

  • Hyun Ji Kim;Se Young Yoon;Hwa Jeong Seo
    • Smart Media Journal
    • /
    • v.13 no.3
    • /
    • pp.36-44
    • /
    • 2024
  • Recently, large-scale artificial intelligence (AI) such as ChatGPT have been developed, and as AI is used across various industrial fields, attention is focused on AI chips (semiconductors). AI chips refer to chips designed for calculations for AI algorithms, and many companies at domestic and abroad, such as NVIDIA, Tesla, and ETRI, are developing AI chips. In this paper, we survey research trends on nine types of AI chips. Currently, many attempts have been made to improve the computational performance of most AI chips, and semiconductors for specific purposes are also being designed. In order to compare various AI semiconductors, each chip is analyzed in terms of operation unit, speed, power, and energy efficiency. We introduce currently existing optimization methodologies for AI computation. Based on this, future research directions for AI semiconductors are presented in this paper.

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit (그래픽 프로세서를 이용한 탄성파 수치모사의 계산속도 향상)

  • Nakata, Norimitsu;Tsuji, Takeshi;Matsuoka, Toshifumi
    • Geophysics and Geophysical Exploration
    • /
    • v.14 no.1
    • /
    • pp.98-104
    • /
    • 2011
  • Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses of elastic bodies. To overcome the long duration of the calculations, we use a Graphic Processing Unit (GPU) to accelerate the elastic wave simulation. Because a GPU has many processors and a wide memory bandwidth, we can use it in a parallelised computing architecture. The GPU board used in this study is an NVIDIA Tesla C1060, which has 240 processors and a 102 GB/s memory bandwidth. Despite the availability of a parallel computing architecture (CUDA), developed by NVIDIA, we must optimise the usage of the different types of memory on the GPU device, and the sequence of calculations, to obtain a significant speedup of the computation. In this study, we simulate two- (2D) and threedimensional (3D) elastic wave propagation using the Finite-Difference Time-Domain (FDTD) method on GPUs. In the wave propagation simulation, we adopt the staggered-grid method, which is one of the conventional FD schemes, since this method can achieve sufficient accuracy for use in numerical modelling in geophysics. Our simulator optimises the usage of memory on the GPU device to reduce data access times, and uses faster memory as much as possible. This is a key factor in GPU computing. By using one GPU device and optimising its memory usage, we improved the computation time by more than 14 times in the 2D simulation, and over six times in the 3D simulation, compared with one CPU. Furthermore, by using three GPUs, we succeeded in accelerating the 3D simulation 10 times.

Weather Radar Image Gener ation Method Using Inter polation based on CUDA

  • Yang, Liu;Jang, Bong-Joo;Lim, Sanghun;Kwon, Ki-Chang;Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.4
    • /
    • pp.473-482
    • /
    • 2015
  • Doppler weather radar is an important tool for meteorological research. Through several decades of development, Doppler weather radar has enormous progress in understanding, detection and warning of meso and micro scale weather system. It makes a significant contribution to weather forecast and weather disaster warning. But the large amount of data process limits the application of Doppler weather radar. This paper proposed for fast weather radar data processing based on CUDA. CDUA is a powerful platform for highly parallel programming developed by NVIDIA. Through running plenty of threads, radar data can be calculated at same time. In experiment, CUDA parallel program can significantly improve weather data processing time.

WRF Physics Models Using GP-GPUs with CUDA Fortran (WRF 물리 과정의 GP-GPU 계산을 위한 CUDA Fortran 프로그램 구현)

  • Kim, Youngtae;Lee, Yong Hee;Chung, Kwan-Young
    • Atmosphere
    • /
    • v.23 no.2
    • /
    • pp.231-235
    • /
    • 2013
  • We parallelized WRF major physics routines for Nvidia GP-GPUs with CUDA Fortran. GP-GPUs are originally designed for graphic processing, but show high performance with low electricity for calculating numerical models. In the CUDA environment, a data domain is allocated into thread blocks and threads in each thread block are computing in parallel. We parallelized the WRF program to use of thread blocks efficiently. We validated the GP-GPU program with the original CPU program, and the WRF model using GP-GPUs shows efficient speedup.

Fast Generating of Digital Hologram Using GPGPU (GPGPU를 이용한 고속 디지털 홀로그램 생성 기법)

  • Song, Joong-Seok;Choi, Ji-Yoon;Seo, Young-Ho;Park, Jong-Il
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.11a
    • /
    • pp.34-35
    • /
    • 2010
  • 본 논문은 깊이영상(depth-map image)으로 만든 3차원 객체를 가지고, 디지털 홀로그램을 고속으로 생성하는 기법을 제안한다. 디지털 홀로그램을 생성하는 과정은 여러개의 독립적 처리로 병렬화 할 수 있는 구조이기 때문에 GPU에서 병렬처리함으로써 고속화 할 수 있다. 병렬처리를 이용한 고속화의 효율을 높이기 위해 최근 NVIDIA사에서 발표한 CUDA를 이용하였다. 디지털 홀로그램의 고속 재생을 위한 중간과정에서 GPU상의 고속 메모리의 사용을 극대화하고, 알고리즘 구현을 최적화함으로써 고속화 효율을 높일 수 있었다. 결과적으로 본 논문에서는 기존 CPU에서의 처리속도에 비해 약 64배 정도 속도를 개선할 수 있었다.

  • PDF