• Title/Summary/Keyword: 코드 병렬화

Search Result 96, Processing Time 0.028 seconds

Implementation of GPU Based Polymorphic Worm Detection Method and Its Performance Analysis on Different GPU Platforms (GPU를 이용한 Polymorphic worm 탐지 기법 구현 및 GPU 플랫폼에 따른 성능비교)

  • Lee, Sunwon;Song, Chihwan;Lee, Injoon;Joh, Taewon;Kang, Jaewoo
    • Annual Conference of KIPS
    • /
    • 2010.11a
    • /
    • pp.1458-1461
    • /
    • 2010
  • 작년 7월 7일에 있었던 DDoS 공격과 같이 악성 코드로 인한 피해의 규모가 해마다 증가하고 있다. 특히 변형 웜(Polymorphic Worm)은 기존의 방법으로 1차 공격에서의 탐지가 어렵기 때문에 그 위험성이 더 크다. 이에 본 연구에서는 바이오 인포매틱스(Bioinformatics) 분야에서 유전자들의 유사성과 특징을 찾기 위한 방법 중 하나인 Local Alignment를 소개하고 이를 변형 웜 탐지에 적용한다. 또한 수행의 병렬화 및 알고리즘 변형을 통하여 기존 알고리즘의 $O(n^4)$수행시간이라는 단점을 극복한다. 병렬화는 NVIDIA사의 GPU를 이용한 CUDA 프로그래밍과 AMD사의 GPU를 사용한 OpenCL 프로그래밍을 통하여 수행되었다. 이로써 각 GPGPU 플랫폼에서의 Local Alignment를 이용한 변형 웜 탐지 알고리즘의 성능을 비교하였다.

NTGST-Based Parallel Computer Vision Inspection for High Resolution BLU (NTGST 병렬화를 이용한 고해상도 BLU 검사의 고속화)

  • 김복만;서경석;최흥문
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.19-24
    • /
    • 2004
  • A novel fast parallel NTGST is proposed for high resolution computer vision inspection of the BLUs in a LCD production line. The conventional computation- intensive NTGST algorithm is modified and its C codes are optimized into fast NTGST to be adapted to the SIMD parallel architecture. And then, the input inspection image is partitioned and allocated to each of the P processors in multi-threaded implementation, and the NTGST is executed on SIMD architecture of N data items simultaneously in each thread. Thus, the proposed inspection system can achieve the speedup of O(NP). Experiments using Dual-Pentium III processor with its MMX and extended MMX SIMD technology show that the proposed parallel NTGST is about Sp=8 times faster than the conventional NTGST, which shows the scalability of the proposed system implementation for the fast, high resolution computer vision inspection of the various sized BLUs in LCD production lines.

Design Considerations on Large-scale Parallel Finite Element Code in Shared Memory Architecture with Multi-Core CPU (멀티코어 CPU를 갖는 공유 메모리 구조의 대규모 병렬 유한요소 코드에 대한 설계 고려 사항)

  • Cho, Jeong-Rae;Cho, Keunhee
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.2
    • /
    • pp.127-135
    • /
    • 2017
  • The computing environment has changed rapidly to enable large-scale finite element models to be analyzed at the PC or workstation level, such as multi-core CPU, optimal math kernel library implementing BLAS and LAPACK, and popularization of direct sparse solvers. In this paper, the design considerations on a parallel finite element code for shared memory based multi-core CPU system are proposed; (1) the use of optimized numerical libraries, (2) the use of latest direct sparse solvers, (3) parallelism using OpenMP for computing element stiffness matrices, and (4) assembly techniques using triplets, which is a type of sparse matrix storage. In addition, the parallelization effect is examined on the time-consuming works through a large scale finite element model.

Efficient Exploring Multiple Execution Path for Dynamic Malware Analysis (악성코드 동적 분석을 위한 효율적인 다중실행경로 탐색방법)

  • Hwang, Ho;Moon, Daesung;Kim, Ikkun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.2
    • /
    • pp.377-386
    • /
    • 2016
  • As the number of malware has been increased, it is necessary to analyze malware rapidly against cyber attack. Additionally, Dynamic malware analysis has been widely studied to overcome the limitation of static analysis such as packing and obfuscation, but still has a problem of exploring multiple execution path. Previous works for exploring multiple execution path have several problems that it requires much time to analyze and resource for preparing analysis environment. In this paper, we proposed efficient exploring approach for multiple execution path in a single analysis environment by pipelining processes and showed the improvement of speed by 29% in 2-core and 70% in 4-core through experiment.

Study on MPI-based parallel sequence similarity search in the LINUX cluster (클러스터 환경에서의 MPI 기반 병렬 서열 유사성 검색에 관한 연구)

  • Hong, Chang-Bum;Cha, Jeoung-Ho;Lee, Sung-Hoon;Shin, Seung-Woo;Park, Keun-Joon;Park, Keun-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.6 s.44
    • /
    • pp.69-78
    • /
    • 2006
  • In the field of the bioinformatics, it plays an important role in predicting functional information or structure information to search similar sequence in biological DB. Biolrgical sequences have been increased dramatically since Human Genome Project. At this point, because the searching speed for the similar sequence is highly regarded as the important factor for predicting function or structure, the SMP(Sysmmetric Multi-Processors) computer or cluster is being used in order to improve the performance of searching time. As the method to improve the searching time of BLAST(Basic Local Alighment Search Tool) being used for the similarity sequence search, We suggest the nBLAST algorithm performing on the cluster environment in this paper. As the nBLAST uses the MPI(Message Passing Interface), the parallel library without modifying the existing BLAST source code, to distribute the query to each node and make it performed in parallel, it is possible to easily make BLAST parallel without complicated procedures such as the configuration. In addition, with the experiment performing the nBLAST in the 28 nodes of LINUX cluster, the enhanced performance according to the increase in the number of the nodes has been confirmed.

  • PDF

Implementation of Efficient Power Method on CUDA GPU (CUDA 기반 GPU에서 효율적인 Power Method의 구현)

  • Kim, Jung-Hwan;Kim, Jin-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.9-16
    • /
    • 2011
  • GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.

Development of an Unstructured Parallel Overset Mesh Technique for Unsteady Flow Simulations around bodies with Relative Motion (상대운동이 있는 물체주위의 비정상 유동해석을 위한 병렬화된 비정렬 중첩격자기법 개발)

  • Jung, Mun-Seung;Kwon, Oh-Joon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.33 no.2
    • /
    • pp.1-10
    • /
    • 2005
  • An unstructured parallel overset mesh method has been developed for the simulation of unsteady flows around multiple bodies in relative motion. For this purpose, an efficient and robust search method is proposed for the unstructured grid system. A new data-structure is also proposed to handle the variable number of data on parallel sub-domain boundary. The interpolation boundary is defined for data communication between grid systems. An interpolation method to retain second-order spatial accuracy and to treat the points inside the neighboring solid bodies are also suggested. A single store separating from the Eglin/Pylon configuration is calculated and the result is compared with experimental data for validation. Simulation of unsteady flows around multiple bodies in relative motion is also performed.

Application of MPI Technique for Distributed Rainfall-Runoff Model (분포형 강우유출모형 병렬화 처리기법 적용)

  • Chung, Sung-Young;Park, Jin-Hyeog;Hur, Young-Teck;Jung, Kwan-Sue
    • Journal of Korea Water Resources Association
    • /
    • v.43 no.8
    • /
    • pp.747-755
    • /
    • 2010
  • Distributed Models have relative weak points due to the amount of computer memory and calculation time required for calculating water flow using a numerical analysis based on kinematic wave theory when compared to the conceptual models used so far. Typically, the distributed models have been mainly applied to small basins. It was necessary to decrease the resolution of the grid to make it applicable for large scale watersheds, and because it would take up too much time to calculate using a higher resolution. That has been one of the more difficult factors in applying the model for actual work. In this paper, MPI (Message Passing Interface) technique was applied to solve the problem of calculation time as it is one of the demerits of the distributed model for performing physical and complicated numerical calculations for large scale watersheds. The comparison studies were performed a single domain and a divided small domain in Yongdam Dam watershed in case of typoon 'Ewiniar' at 2006. They were compared to analyze the application effects of parallelization technique. As a result, a maximum of 10 times the amount of calculation time was saved but keeping the level of quality for discharge by using parallelization code rather than a single processor.

High Speed and Robust Processor based on Parallelized Error Correcting Code Module (병렬화된 에러 보정 코드 모듈 기반 프로세서 속도 및 신뢰도 향상)

  • Kang, Myeong-jin;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.9
    • /
    • pp.1180-1186
    • /
    • 2020
  • One of the Embedded systems Tiny Processing Unit (TPU) usually acts in harsh environments like external shock or insufficient power. In these cases, data could be polluted, and cause critical problems. As a solution to data pollution, many embedded systems are using Error Correcting Code (ECC) to protect and restore data. However, ECC processing in TPU increases the overall processing time by increasing the time of instruction fetch which is the bottleneck. In this paper, we propose an architecture of parallelized ECC block to the reduce bottleneck of TPU. The proposed architecture results in the reduction of time 10% compared to the original model, although memory usage increased slightly. The test is evaluated with a matrix product that has various instructions. TPU with proposed parallelized ECC block shows 7% faster than the original TPU with ECC and was able to perform the proposed test accurately.

A Study on Parallel Performance Optimization Method for Acceleration of High Resolution SAR Image Processing (고해상도 SAR 영상처리 고속화를 위한 병렬 성능 최적화 기법 연구)

  • Lee, Kyu Beom;Kim, Gyu Bin;An, Sol Bo Reum;Cho, Jin Yeon;Lim, Byoung-Gyun;Kim, Dong-Hyun;Kim, Jeong Ho
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.46 no.6
    • /
    • pp.503-512
    • /
    • 2018
  • SAR(Synthetic Aperture Radar) is a technology to acquire images by processing signals obtained from radar, and there is an increasing demand for utilization of high-resolution SAR images. In this paper, for high-speed processing of high-resolution SAR image data, a study for SAR image processing algorithms to achieve optimal performance in multi-core based computer architecture is performed. The performance deterioration due to a large amount of input/output data for high resolution images is reduced by maximizing the memory utilization, and the parallelization ratio of the code is increased by using dynamic scheduling and nested parallelism of OpenMP. As a result, not only the total computation time is reduced, but also the upper bound of parallel performance is increased and the actual parallel performance on a multi-core system with 10 cores is improved by more than 8 times. The result of this study is expected to be used effectively in the development of high-resolution SAR image processing software for multi-core systems with large memory.