• Title/Summary/Keyword: GPU Memory

Search Result 151, Processing Time 0.025 seconds

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

  • Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.9
    • /
    • pp.21-28
    • /
    • 2015
  • In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.

Real-Time GPU Technique for Extracting Mesh Isosurfaces from BCC Volume Datasets (BCC 볼륨 데이터로부터 실시간으로 메시 형태의 등가면을 추출하는 GPU 기법)

  • Kim, Hyunjun;Kim, Minho
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.4
    • /
    • pp.17-26
    • /
    • 2020
  • We present a real-time GPU(Graphic Processing Unit) marching tetrahedra technique that extracts isosurfaces in the indexed mesh format from BCC(Body Centered Cubic) volume datasets. Compared to classical marching tetrahedra, our method shows better performance with little memory overhead. Our technique is composed of five stages. In the first stage, which needs to be done only once, we build min/max blocks that is to be used for empty space skipping to boost the performance. Next, we extract active blocks that contain the current isovalue. In the next two stages, we extract the edges and cells that contain the isosurface and then the final triangular mesh is generated in the last stage. When applied 5123 or higher resolution volume dataset, our technique shows up to 5 times speed improvement compared to the classical marching tetrahedra algorithm.

An Efficient Real-time Rendering Method for Compressed Terrain Dataset with Wavelet Transform (웨이블릿 변환으로 압축된 지형 데이터의 효율적인 실시간 렌더링 기법)

  • Kim, Tae-Gwon;Lee, Eun-Seok;Shin, Byeong-Seok
    • Journal of Korea Game Society
    • /
    • v.14 no.4
    • /
    • pp.45-52
    • /
    • 2014
  • We cannot load the entire data for high-resolution terrain model to the GPU memory since its size is too big. Out-of-core approaches are commonly used to solve the problem. However, due to limited bandwidth of the secondary storage, it is difficult to render the terrain in real-time. A method that compresses the DEM data with wavelet transform on GPU, and renders the decoded data is suggested. However, it is inefficient since it has to sample the values from textures, convert them to vertices, and generate a mesh periodically. We propose a method to store the approximation coefficients of wavelet compression as vertex attributes and render the terrain by decoding the data on geometric shader. It can reduce the amount of transferring terrain texture since approximation coefficients are given as an attribute of the vertex. Also, it generate meshes without additional upload of terrain texture.

PDF Version 1.4-1.6 Password Cracking in CUDA GPU Environment (PDF 버전 1.4-1.6의 CUDA GPU 환경에서 암호 해독 최적 구현)

  • Hyun Jun, Kim;Si Woo, Eum;Hwa Jeong, Seo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.2
    • /
    • pp.69-76
    • /
    • 2023
  • Hundreds of thousands of passwords are lost or forgotten every year, making the necessary information unavailable to legitimate owners or authorized law enforcement personnel. In order to recover such a password, a tool for password cracking is required. Using GPUs instead of CPUs for password cracking can quickly process the large amount of computation required during the recovery process. This paper optimizes on GPUs using CUDA, with a focus on decryption of the currently most popular PDF 1.4-1.6 version. Techniques such as eliminating unnecessary operations of the MD5 algorithm, implementing 32-bit word integration of the RC4 algorithm, and using shared memory were used. In addition, autotune techniques were used to search for the number of blocks and threads that affect performance improvement. As a result, we showed throughput of 31,460 kp/s (kilo passwords per second) and 66,351 kp/s at block size 65,536, thread size 96 in RTX 3060, RTX 3090 environments, and improved throughput by 22.5% and 15.2%, respectively, compared to the cracking tool hashcat that achieves the highest throughput.

A Real-Time Rendering Algorithm of Large-Scale Point Clouds or Polygon Meshes Using GLSL (대규모 점군 및 폴리곤 모델의 GLSL 기반 실시간 렌더링 알고리즘)

  • Park, Sangkun
    • Korean Journal of Computational Design and Engineering
    • /
    • v.19 no.3
    • /
    • pp.294-304
    • /
    • 2014
  • This paper presents a real-time rendering algorithm of large-scale geometric data using GLSL (OpenGL shading language). It details the VAO (vertex array object) and VBO(vertex buffer object) to be used for up-loading the large-scale point clouds and polygon meshes to a graphic video memory, and describes the shader program composed by a vertex shader and a fragment shader, which manipulates those large-scale data to be rendered by GPU. In addition, we explain the global rendering procedure that creates and runs the shader program with the VAO and VBO. Finally, a rendering performance will be measured with application examples, from which it will be demonstrated that the proposed algorithm enables a real-time rendering of large amount of geometric data, almost impossible to carry out by previous techniques.

Hybrid Model Representation for Progressive Indoor Scene Reconstruction (실내공간의 점진적 복원을 위한 하이브리드 모델 표현)

  • Jung, Jinwoong;Jeon, Junho;Yoo, Daehoon;Lee, Seungyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.21 no.5
    • /
    • pp.37-44
    • /
    • 2015
  • This paper presents a novel 3D model representation, called hybrid model representation, to overcome existing 3D volume-based indoor scene reconstruction mechanism. In indoor 3D scene reconstruction, volume-based model representation can reconstruct detailed 3D model for the narrow scene. However it cannot reconstruct large-scale indoor scene due to its memory consumption. This paper presents a memory efficient plane-hash model representation to enlarge the scalability of the indoor scene reconstruction. Also, the proposed method uses plane-hash model representation to reconstruct large, structural planar objects, and at the same time it uses volume-based model representation to recover small detailed region. Proposed method can be implemented in GPU to accelerate the computation and reconstruct the indoor scene in real-time.

Scalable Ontology Reasoning Using GPU Cluster Approach (GPU 클러스터 기반 대용량 온톨로지 추론)

  • Hong, JinYung;Jeon, MyungJoong;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.61-70
    • /
    • 2016
  • In recent years, there has been a need for techniques for large-scale ontology inference in order to infer new knowledge from existing knowledge at a high speed, and for a diversity of semantic services. With the recent advances in distributed computing, developments of ontology inference engines have mostly been studied based on Hadoop or Spark frameworks on large clusters. Parallel programming techniques using GPGPU, which utilizes many cores when compared with CPU, is also used for ontology inference. In this paper, by combining the advantages of both techniques, we propose a new method for reasoning large RDFS ontology data using a Spark in-memory framework and inferencing distributed data at a high speed using GPGPU. Using GPGPU, ontology reasoning over high-capacity data can be performed as a low cost with higher efficiency over conventional inference methods. In addition, we show that GPGPU can reduce the data workload on each node through the Spark cluster. In order to evaluate our approach, we used LUBM ranging from 10 to 120. Our experimental results showed that our proposed reasoning engine performs 7 times faster than a conventional approach which uses a Spark in-memory inference engine.

Parallel String Matching and Optimization Using OpenCL on FPGA (FPGA 상에서 OpenCL을 이용한 병렬 문자열 매칭 구현과 최적화 방향)

  • Yoon, Jin Myung;Choi, Kang-Il;Kim, Hyun Jin
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.1
    • /
    • pp.100-106
    • /
    • 2017
  • In this paper, we propose a parallel optimization method of Aho-Corasick (AC) algorithm and Parallel Failureless Aho-Corasick (PFAC) algorithm using Open Computing Language (OpenCL) on Field Programmable Gate Array (FPGA). The low throughput of string matching engine causes the performance degradation of network process. Recently, many researchers have studied the string matching engine using parallel computing. FPGA's vendors offer a parallel computing platform using OpenCL. In this paper, we apply the AC and PFAC algorithm on DE1-SoC board with Cyclone V FPGA, where the optimization that considers FPGA architecture is performed. Experiments are performed considering global id, local id, local memory, and loop unrolling optimizations using PFAC algorithm. The performance improvement using loop unrolling is 129 times greater than AC algorithm that not adopt loop unrolling. The performance improvements using loop unrolling are 1.1, 0.2, and 1.5 times greater than those using global id, local id, and local memory optimizations mentioned above.

Image Space Occlusion Shading Model for Iso-surface Volume Rendering (등위면 볼륨렌더링을 위한 이미지 공간 폐색 쉐이딩 모델)

  • Kim, Seokyeon;You, Sangbong;Jang, Yun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.20 no.4
    • /
    • pp.1-7
    • /
    • 2014
  • The volume rendering has become an important technique in many applications along with hardware development. Understanding and perception of volume visualization benefit from visual cues which are available from shading. Better visual cues can be obtained from global illumination models but it's huge amount of computation and extra GPU memory need cause a lack of interactivity. In this paper, in order to improve visual cues on volume rendering, we propose an image space occlusion shading model which requires no additional resources.

iSSD-Based Collaborative Processing for Big Data Mining (효율적인 빅 데이터 마이닝을 위한 iSSD 기반 협업 처리 방안)

  • Jo, Yong-Yoen;Kim, Sang-Wook;Bae, Duck-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.460-470
    • /
    • 2017
  • We address how to handle big data mining effectively using the intelligent SSD (iSSD). ISSD is a storage device equipped with computing power inside SSD for reducing the transferring cost and for processing data nearby SSD where the data is stored. We first introduce the structural characteristics of iSSD for efficient data processing. Then, we present how to process data mining algorithms by using iSSD. Finally, we discuss how to improve the performance of data mining algorithms significantly by exploiting heterogeneous computing environment where host CPUs and GPU coexist for maximizing the performance.