• Title/Summary/Keyword: graphic processing unit(GPU)

Search Result 81, Processing Time 0.023 seconds

Real-Time GPU Task Monitoring and Node List Management Techniques for Container Deployment in a Cluster-Based Container Environment (클러스터 기반 컨테이너 환경에서 실시간 GPU 작업 모니터링 및 컨테이너 배치를 위한 노드 리스트 관리기법)

  • Jihun, Kang;Joon-Min, Gil
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.11
    • /
    • pp.381-394
    • /
    • 2022
  • Recently, due to the personalization and customization of data, Internet-based services have increased requirements for real-time processing, such as real-time AI inference and data analysis, which must be handled immediately according to the user's situation or requirement. Real-time tasks have a set deadline from the start of each task to the return of the results, and the guarantee of the deadline is directly linked to the quality of the services. However, traditional container systems are limited in operating real-time tasks because they do not provide the ability to allocate and manage deadlines for tasks executed in containers. In addition, tasks such as AI inference and data analysis basically utilize graphical processing units (GPU), which typically have performance impacts on each other because performance isolation is not provided between containers. And the resource usage of the node alone cannot determine the deadline guarantee rate of each container or whether to deploy a new real-time container. In this paper, we propose a monitoring technique for tracking and managing the execution status of deadlines and real-time GPU tasks in containers to support real-time processing of GPU tasks running on containers, and a node list management technique for container placement on appropriate nodes to ensure deadlines. Furthermore, we demonstrate from experiments that the proposed technique has a very small impact on the system.

An Algorithm for Finding Surface Atoms of a Protein Molecule Based on Voxel Map Representation (복셀 맵을 이용한 단백질 표면 원자의 발견 알고리즘)

  • Kim, Byung-Joo;Kim, Ku-Jin;Seong, Joon-Kyung
    • The KIPS Transactions:PartA
    • /
    • v.19A no.2
    • /
    • pp.73-76
    • /
    • 2012
  • In this paper, we propose an efficient method to extract surface atoms from a protein molecule. Surface atoms are defined as a set of atoms who can contact given probe solvent $P$, where $P$ does not collide with the molecule. The atoms contained in the molecule are represented as a set of spheres with van der Waals radii. The probe solvent also is represented as a sphere. We propose a method to extract the surface atoms by computing the offset surface of the molecule with respect to the radius of $P$. For efficient computation of the offset surface of a molecule, a voxel map is constructed for the offset surfaces of the spheres. Based on GPU (graphic processor unit) acceleration, a data parallel algorithm is used to extract the surface atoms in 42.87 milliseconds for the molecule containing up to 6,412 atoms.

High Throughput Parallel KMP Algorithm Considering CPU-GPU Memory Hierarchy (CPU-GPU 메모리 계층을 고려한 고처리율 병렬 KMP 알고리즘)

  • Park, Soeun;Kim, Daehee;Lee, Myungho;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.5
    • /
    • pp.656-662
    • /
    • 2018
  • Pattern matching algorithm is widely used in many application fields such as bio-informatics, intrusion detection, etc. Among many string matching algorithms, KMP (Knuth-Morris-Pratt) algorithm is commonly used because of its fast execution time when using large texts. However, the processing speed of KMP algorithm is also limited when the text size increases significantly. In this paper, we propose a high throughput parallel KMP algorithm considering CPU-GPU memory hierarchy based on OpenCL in GPGPU (General Purpose computing on Graphic Processing Unit). We focus on the optimization for the allocation of work-times and work-groups, the local memory copy of the pattern data and the failure table, and the overlapping of the data transfer with the string matching operations. The experimental results show that the execution time of the optimized parallel KMP algorithm is about 3.6 times faster than that of the non-optimized parallel KMP algorithm.

A Case Study of the Base Technology for the Smart Grid Security: Focusing on a Performance Improvement of the Basic Algorithm for the DDoS Attacks Detection Using CUDA

  • Huh, Jun-Ho;Seo, Kyungryong
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.2
    • /
    • pp.411-417
    • /
    • 2016
  • Since the development of Graphic Processing Unit (GPU) in 1999, the development speed of GPUs has become much faster than that of CPUs and currently, the computational power of GPUs exceeds CPUs dozens and hundreds times in terms of decimal calculations and costs much less. Owing to recent technological development of hardwares, general-purpose computing and utilization using GPUs are on the rise. Thus, in this paper, we have identified the elements to be considered for the Smart Grid Security. Focusing on a Performance Improvement of the Basic Algorithm for the Stateful Inspection to Detect DDoS Attacks using CUDA. In the program, we compared the search speeds of GPU against CPU while they search for the suffix trees. For the computation, the system constraints and specifications were made identical during the experiment. We were able to understand from the results of the experiment that the problem-solving capability improves when GPU is used. The other finding was that performance of the system had been enhanced when shared memory was used explicitly instead of a global memory as the volume of data became larger.

Multi-Scale Contact Analysis Between Net and Numerous Particles (그물망과 대량입자의 멀티 스케일 접촉해석)

  • Jun, Chul Woong;Sohn, Jeong Hyun
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.38 no.1
    • /
    • pp.17-23
    • /
    • 2014
  • Graphics processing units (GPUs) are ideal for solving problems involving parallel data computations. In this study, the GPU is used for effectively carrying out a multi-body dynamic simulation with particle dynamics. The Hilber-Hushes-Taylor (HHT) implicit integration algorithm is used to solve the integral equations. For detecting collisions among particles, the spatial subdivision algorithm and discrete-element methods (DEM) are employed. The developed program is verified by comparing its results with those of ADAMS. The numerical efficiencies of the serial program using the CPU and the parallel program using the GPU are compared in terms of the number of particles, and it is observed that when the number of particles is greater, more computing time is saved by using the GPU. In the present example, when the number of particles is 1,300, the computational speed of the parallel analysis program is about 5 times faster than that of the serial analysis program.

Development of a 3D Shape Reconstruction System for Defects on a Hot Steel Surface (고온 금속 표면 결함에 대한 3차원 형상 추출 시스템 개발)

  • Jang, Yu Jin;Lee, Joo Seob
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.5
    • /
    • pp.459-464
    • /
    • 2015
  • An on-line quality control of hot steel products is one of the important issues in the steel industry because of cost minimization. In recent years, relative depth information of surface defects is increasingly required for strict quality control. In this paper, a 3D shape reconstruction scheme for defects on a hot steel surface based on a multi-spectral photometric stereo method is proposed. After simultaneously illuminating a hot steel surface by using vertical/horizontal linearly polarized lights of green and blue light sources, the corresponding 4 images are obtained. The photometric stereo method is then applied with the aid of a GPU (Graphic Processing Unit) to reconstruct the shape of the target surface based on these images. The proposed scheme was validated through experiments.

3D Tile Application Method for Improvement of Performance of V-world 3D Map Service (브이월드 3D 지도 서비스 성능 향상을 위한 3D 타일 적용 방안 연구)

  • Kim, Tae Hoon;Jang, Han Sol;Yoo, Sung Hwan;Go, Jun Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.55-61
    • /
    • 2017
  • The V-world, korean type spatial information open platform, provides various services to easily utilize 2D, 3D map and administrative information of the country. Among them, V-world 3D map service, modeled in individual building unit, require requests for each building model file and the draw calls for drawing models on the screen by the request. This causes a large number of model requests and draw calls to occur that increase the latency occurring during the transmission and conversion process between the central processing unit(CPU) and the graphic processing unit(GPU), which lead to the performance degradation of the 3D map service. In this paper, we propose a performance improvement plan to reduce the performance degradation of 3D map service caused by multiple model requests and draw calls. Therefore, we tried to reduce the number of requests and draw calls for the model file by applying a 3D tile model that combined multiple building models to single tile. In addition, we applied the quadtree algorithm to reduce the time required to load the model file by shortening the retrieval time of the model. This is expected to contribute to improving the performance of 3D map service of V-world.

GPU-based Parallel Ant Colony System for Traveling Salesman Problem

  • Rhee, Yunseok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • In this paper, we design and implement a GPU-based parallel algorithm to effectively solve the traveling salesman problem through an ant color system. The repetition process of generating hundreds or thousands of tours simultaneously in TSP utilizes GPU's task-level parallelism, and the update process of pheromone trails data actively exploits data parallelism by 32x32 thread blocks. In particular, through simultaneous memory access of multiple threads, the coalesced accesses on continuous memory addresses and concurrent accesses on shared memory are supported. This experiment used 127 to 1002 city data provided by TSPLIB, and compared the performance of sequential and parallel algorithms by using Intel Core i9-9900K CPU and Nvidia Titan RTX system. Performance improvement by GPU parallelization shows speedup of about 10.13 to 11.37 times.

Molecular Interaction Interface Computing Based on Voxel Map (복셀맵을 기반으로 한 분자 간 상호작용 인터페이스의 계산)

  • Choi, Jihoon;Kim, Byungjoo;Kim, Ku-jin
    • Journal of the Korea Computer Graphics Society
    • /
    • v.18 no.3
    • /
    • pp.1-7
    • /
    • 2012
  • In this paper, we propose a method to compute the interface between protein molecules. When a molecules is represented as a set of spheres with van der Waals radii, the distance from a spatial point p to the molecule corresponds to the distance from p to the closet sphere. The molecular interface is composed of equi-distant points from two molecules. Our algorithm decomposes the space into a set of voxels, and then constructs a voxel map by storing the information of spheres intersecting each voxel. By using the voxel map, we compute the distance between a point and the molecule. We also use GPU for the parallel processing, and efficiently approximate the interface of a pair of molecules.

Development of a Reliable Real-time 3D Reconstruction System for Tiny Defects on Steel Surfaces (금속 표면 미세 결함에 대한 신뢰성 있는 실시간 3차원 형상 추출 시스템 개발)

  • Jang, Yu Jin;Lee, Joo Seob
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.19 no.12
    • /
    • pp.1061-1066
    • /
    • 2013
  • In the steel industry, the detection of tiny defects including its 3D characteristics on steel surfaces is very important from the point of view of quality control. A multi-spectral photometric stereo method is an attractive scheme because the shape of the defect can be obtained based on the images which are acquired at the same time by using a multi-channel camera. Moreover, the calculation time required for this scheme can be greatly reduced for real-time application with the aid of a GPU (Graphic Processing Unit). Although a more reliable shape reconstruction of defects can be possible when the numbers of available images are increased, it is not an easy task to construct a camera system which has more than 3 channels in the visible light range. In this paper, a new 6-channel camera system, which can distinguish the vertical/horizontal linearly polarized lights of RGB light sources, was developed by adopting two 3-CCD cameras and two polarized lenses based on the fact that the polarized light is preserved on the steel surface. The photometric stereo scheme with 6 images was accelerated by using a GPU, and the performance of the proposed system was validated through experiments.