• Title/Summary/Keyword: GPU 최적화

Search Result 109, Processing Time 0.04 seconds

PDF 1.4-1.6 Passward Cracking Optimal Implementation on CUDA GPU (CUDA GPU 상의 PDF 1.4-1.6 해독 최적 구현)

  • Kim, Hyun-Jun;Eum, Si-Uoo;Seo, Hwa-Jeong
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.187-190
    • /
    • 2022
  • PDF (Portable Document Format)는 1992년 Adobe 에서 개발한 파일 형식으로 ISO 32000 으로 표준화 되어 전세계적으로 사용되고 있다. PDF와 같이 주로 사용되는 파일은 암호 해독(Password Cracking)의 대상이 될 수 있다. 본 논문에서는 PDF 1.4-1.6 암호 해독을 위해 CUDA GPU 상의 최적 구현하였다. 암호 해독에 사용되는 MD5와 RC4 알고리즘의 최적화와 CUDA GPU의 요소를 사용하였으며 RTX 3060 환경에서 크래킹 도구 해시캣과 비교하여 22.5%의 성능 향상을 달성하였다.

Efficient Representation of Pore Flow, Absorption, Emission and Diffusion using GPU-Accelerated Cloth-Liquid Interaction

  • Jong-Hyun Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.23-29
    • /
    • 2024
  • In this paper, we propose a fast GPU-based method for representing pore flow, absorption, emission, and diffusion effects represented by cloth-liquid interactions using smoothed particle hydrodynamics (SPH), a particle-based fluid solver: 1) a unified framework for GPU-based representation of various physical effects represented by cloth-liquid interactions; 2) a method for efficiently calculating the saturation of a node based on SPH and transferring it to the surrounding porous particles; 3) a method for improving the stability based on Darcy's law to reliably calculate the direction of fluid absorption and release; 4) a method for controlling the amount of fluid absorbed by the porous particles according to the direction of flow; and finally, 5) a method for releasing the SPH particles without exceeding their maximum mass. The main advantage of the proposed method is that all computations are computed and run on the GPU, allowing us to quickly model porous materials, porous flows, absorption, reflection, diffusion, etc. represented by the interaction of cloth and fluid.

Parallel Computation of FDTD algorithm using CUDA (CUDA를 이용한 FDTD 알고리즘의 병렬처리)

  • Lee, Ho-Young;Park, Jong-Hyun;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.82-87
    • /
    • 2010
  • Modern GPUs(Graphic Processing Units) provide computing capability higher than that of the general CPUs(Central Processor Units). With supports of programmability of graphics pipeline GP-GPU(General Purpose computation on GPU) has gained much attention expanding its application area. This paper compares sequential and massively parallel implementations of FDTD(Finite Difference Time Domain) algorithm using CUDA(Compute Unified Device Architecture). Experimental results show upto 45X speedup over conventional CPU execution.

The Performance Analysis of GPU-based Cloth simulation according to the Change of Work Group Configuration (워크 그룹 구성 변화에 따른 GPU 기반 천 시뮬레이션의 성능 분석)

  • Choi, Young-Hwan;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.29-36
    • /
    • 2017
  • In these days, 3D dynamic simulation is closely related to many industries. In the past, physically-based 3D simulation was used mainly in the car crash or construction related fields, but it also plays an important role in movies or games today. Many mathematical computations are needed to represent the 3D object realistically, but it is difficult to process a large amount of calculations for simulation of application based on CPU in real-time. Recently, with the advanced graphic hardware and improved architecture, GPU can be utilized for the general purposes of computation function as well as graphic computation. Many approaches using GPU have been applied for various research fields. In this paper, we analyze the performance variation of two cloth simulation algorithms based on GPU according to the change of execution properties of GPU shaders in oder to optimize the performance of GPU-based cloth simulation. Cloth simulation is implemented by the spring centric algorithm and node centric algorithm with GPU parallel computing using compute shader of GLSL 4.3. We compare the performance of between these algorithms according to the change of the size and dimension of work group. The experiment is repeated to 10 times during 5,000 frames for each test and experimental results are provided by averaging of FPS. The experimental result shows that the node centric algorithm is executed in higher speed than the spring centric algorithm.

A Study on the Performance Improvement of Software Digital Filter using GPU (GPU를 이용한 소프트웨어 디지털 필터의 성능개선에 관한 연구)

  • Yeom, Jae-Hwan;Oh, Se-Jin;Roh, Duk-Gyoo;Jung, Dong-Kyu;Hwang, Ju-Yeon;Oh, Chungsik;Kim, Hyo-Ryoung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.19 no.4
    • /
    • pp.153-161
    • /
    • 2018
  • This paper describes the performance improvement of Software (SW) digital filter using GPU (Graphical Processing Unit). The previous developed SW digital filter has a problem that it operates on a CPU (Central Processing Unit) basis and has a slow speed. The GPU was introduced to filter the data of the EAVN (East Asian VLBI Network) observation to improve the operation speed and to process data with other stations through filtering, respectively. In order to enhance the computational speed of the SW digital filter, NVIDIA Titan V GPU board with built-in Tensor Core is used. The processing speed of about 0.78 (1Gbps, 16MHz BW, 16-IF) and 1.1 (2Gbps, 32MHz BW, 16-IF) times for the observing time was achieved by filtering the 95 second observation data of 2 Gbps (512 MHz BW, 1-IF), respectively. In addition, 2Gbps data is digitally filtered for the 1 and 2Gbps simultaneously observed with KVN (Korean VLBI Network), and compared with the 1Gbps, we obtained similar values such as cross power spectrum, phase, and SNR (Signal to Noise Ratio). As a result, the effectiveness of developed SW digital filter using GPU in this research was confirmed for utilizing the data processing and analysis. In the future, it is expected that the observation data will be able to be filtered in real time when the distributed processing optimization of source code for using multiple GPU boards.

An efficient acceleration algorithm of GPU ray tracing using CUDA (CUDA를 이용한 효과적인 GPU 광선추적 가속 알고리즘)

  • Ji, Joong-Hyun;Yun, Dong-Ho;Ko, Kwang-Hee
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.469-474
    • /
    • 2009
  • This paper proposes an real time ray tracing system using optimized kd-tree traversal environment and ray/triangle intersection algorithm. The previous kd-tree traversal algorithms search for the upper nodes in a bottom-up manner. In a such way we need to revisit the already visited parent node or use redundant memory after failing to find the intersected primitives in the leaf node. Thus ray tracing for relatively complex scenes become more difficult. The new algorithm contains stacks implemented on GPU's local memory on CUDA framework, thus elegantly eliminate the problems of previous algorithms. After traversing the node we perform the latest CPU-based ray/triangle intersection algorithm 'Plucker coordinate test', which is further accelerated in massively parallel thanks to CUDA. Plucker test can drastically reduce the computational costs since it does not use barycentric coordinates but only simple test using the relations between a ray and the triangle edges. The entire system is consist of a single ray kernel simply and implemented without introduction of complicated synchronization or ray packets. Consequently our experiment shows the new algorithm can is roughly twice as faster as the previous.

  • PDF

PDF Version 1.4-1.6 Password Cracking in CUDA GPU Environment (PDF 버전 1.4-1.6의 CUDA GPU 환경에서 암호 해독 최적 구현)

  • Hyun Jun, Kim;Si Woo, Eum;Hwa Jeong, Seo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.2
    • /
    • pp.69-76
    • /
    • 2023
  • Hundreds of thousands of passwords are lost or forgotten every year, making the necessary information unavailable to legitimate owners or authorized law enforcement personnel. In order to recover such a password, a tool for password cracking is required. Using GPUs instead of CPUs for password cracking can quickly process the large amount of computation required during the recovery process. This paper optimizes on GPUs using CUDA, with a focus on decryption of the currently most popular PDF 1.4-1.6 version. Techniques such as eliminating unnecessary operations of the MD5 algorithm, implementing 32-bit word integration of the RC4 algorithm, and using shared memory were used. In addition, autotune techniques were used to search for the number of blocks and threads that affect performance improvement. As a result, we showed throughput of 31,460 kp/s (kilo passwords per second) and 66,351 kp/s at block size 65,536, thread size 96 in RTX 3060, RTX 3090 environments, and improved throughput by 22.5% and 15.2%, respectively, compared to the cracking tool hashcat that achieves the highest throughput.

Development of GPU Based High-speed Contents Quality Check System (GPU 기반 콘텐츠 품질검사 실시간 고속화 시스템 개발)

  • Lee, Moonsik;Choi, Sungwoo;An, Kiok;Kim, Mingi;Jung, Byunghee
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.06a
    • /
    • pp.55-58
    • /
    • 2014
  • 방송 제작 환경은 고품질의 콘텐츠를 빠르고 효율적으로 서비스하기 위하여 IT 기반 시스템으로의 전환을 진행하여 완성 단계에 이르렀으며, 대부분의 방송 콘텐츠는 파일 기반으로 제작 및 보관되고 있다. 과거 테이프 기반에서 파일 기반 콘텐츠로 전환되면서 신호 레벨로 진행되던 전통적인 품질 관리에 대한 새로운 방안이 요구되었으며, 이를 위하여 파일 기반 콘텐츠에 최적화된 콘텐츠 품질검사 시스템 개발이 진행되어 왔다. 이미지 처리에 기반하는 오류 검출 알고리듬의 복잡성으로 인하여 실시간 검사를 지원하지 못하여 HD 실시간 시스템에의 적용에 어려움이 있었으며, 대용량의 아카이브 시스템에서는 품질검사 시간에 대한 단축이 지속적으로 요구되고 있다. 이에 본 논문에서는 방송 환경에서 발생하는 블록 오류 등 다양한 A/V 오류를 고속으로 검출하기 위하여 최근에 급부상하고 있는 GPU 기반의 병렬처리를 이용하는 품질검사 실시간 고속화 시스템의 구현에 대하여 기술하고자 한다.

  • PDF

Geographic information 3D Synthetic Model based on Regular Mesh (Regular Mesh 기반 지리정보 3D 합성모델)

  • Jung, Ji-Hwan;Hwang, Sun-Myung;Kim, Sung-Ho
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.4
    • /
    • pp.616-625
    • /
    • 2011
  • There are two representative geometry rendering methods. One is Geometry Clipmaps, another is ROAM 2.0. We propose an extended Geometry Clipmaps algorithm which does not focus on CPU operation but the GPU for faster and wider visibility area. The extended algorithm presents mesh configuration method of each level by LOD, how to configurate Mesh network between levels, mesh block method for rendering optimization using VFC, and image mapping method to get high resolution up to 1 m.

모바일 GPU 기반의 고속 3차원 공간 정보 취득 기술

  • Jeong, Tae-Hyeon;Park, Jun-Hyeong;Park, In-Gyu
    • Broadcasting and Media Magazine
    • /
    • v.26 no.4
    • /
    • pp.48-60
    • /
    • 2021
  • 복잡한 알고리즘을 요구하는 3차원 공간 정보 취득 기술은 대부분 고성능의 하드웨어를 필요로 한다. 그러나 최근 스마트폰과 같은 모바일 플랫폼의 성능이 급격히 발전하면서 기존 알고리즘을 가속화해 온 디바이스로 이식하는 연구가 증가하고 있다. 이러한 추세에 따라 본 기고문은 플랫폼 제한 없는 GPU 병렬처리 프레임워크 OpenCL을 활용한 3차원 공간 정보 취득 기술의 가속화 방법을 소개하고자 한다. 본 고의 구성은 다음과 같다. 먼저 모바일 GPU 환경에서의 OpenCL 최적화 방법을 살펴본다. 이후 고전적인 기하학 기반의 스테레오 정합 알고리즘을 가속화한 방법을 소개한다. 마지막으로는 심층 신경망 네트워크와 가속화된 고전적 스테레오 알고리즘을 결합한 온 디바이스 친화적인 융합 알고리즘을 소개한다.