• Title/Summary/Keyword: parallel computer processing

Search Result 648, Processing Time 0.028 seconds

Parallel Processing of Multi-Core Processor and GPUs in Projection Step for Efficient Fluid Simulation (효율적인 유체 시뮬레이션을 위한 투영 단계에서의 멀티 코어 프로세서와 그래픽 프로세서의 병렬처리)

  • Kim, Sun-Tae;Jung, Hwi-Ryong;Hong, Jeong-Mo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.6
    • /
    • pp.48-54
    • /
    • 2013
  • In these days, the state-of-art technologies employ the heterogeneous parallelization of CPU and GPU for fluid simulations in the field of computer graphics. In this paper, we present a novel CPU-GPU parallel algorithm that solves projection step of fluid simulation more efficiently than existing sequential CPU-GPU processing. Fluid simulation that requires high computational resources can be carried out efficiently by the proposed method.

Parallel Processing based Image Identifier Generation (병렬처리 기반 정지영상 인식자 생성)

  • Ko, Mieun;Park, Je-Ho;Park, Young B.;Seo, Wontaek
    • Journal of the Semiconductor & Display Technology
    • /
    • v.16 no.1
    • /
    • pp.6-10
    • /
    • 2017
  • Recent enhancement in the still image acquisition devices has been widely perpetrated into the daily life of the common people. Due to this trend, the voluminous still images, that are produced and shared in the personal or the massive storage, need to controlled with effective and efficient management. The human-devised or system-generated still image identifiers used for the identification of the images are at risk in the situation of unexpected changing or eliminating of the identifiers. In this paper, we propose a parallel processing based method for still image identifier generation by utilizing the still image internal features.

  • PDF

EPR : Enhanced Parallel R-tree Indexing Method for Geographic Information System (EPR : 지리 정보 시스템을 위한 향상된 병렬 R-tree 색인 기법)

  • Lee, Chun-Geun;Kim, Jeong-Won;Kim, Yeong-Ju;Jeong, Gi-Dong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.9
    • /
    • pp.2294-2304
    • /
    • 1999
  • Our research purpose in this paper is to improve the performance of query processing in GIS(Geographic Information System) by enhancing the I/O performance exploiting parallel I/O and efficient disk access. By packing adjacent spatial data, which are very likely to be referenced concurrently, into one block or continuous disk blocks, the number of disk accesses and the disk access overhead for query processing can be decreased, and this eventually leads to the I/O time decrease. So, in this paper, we proposes EPR(Enhanced Parallel R-tree) indexing method which integrates the parallel I/O method of the previous Parallel R-tree method and a packing-based clustering method. The major characteristics of EPR method are as follows. First, EPR method arranges spatial data in the increasing order of proximity by using Hilbert space filling curve, and builds a packed R-tree by bottom-up manner. Second, with packing-based clustering in which arranged spatial data are clustered into continuous disk blocks, EPR method generates spatial data clusters. Third, EPR method distributes EPR index nodes and spatial data clusters on multiple disks through round-robin striping. Experimental results show that EPR method achieves up to 30% or more gains over PR method in query processing speed. In particular, the larger the size of disk blocks is and the smaller the size of spatial data objects is, the better the performance of query processing by EPR method is.

  • PDF

Accelerating the Sweep3D for a Graphic Processor Unit

  • Gong, Chunye;Liu, Jie;Chen, Haitao;Xie, Jing;Gong, Zhenghu
    • Journal of Information Processing Systems
    • /
    • v.7 no.1
    • /
    • pp.63-74
    • /
    • 2011
  • As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the finegrained parallel architecture of the GPU. Our results show that the overall performance of Sweep3D on the CPU-GPU hybrid platform can be improved up to 4.38 times as compared to the CPU-based implementation.

A Implementation of Loop Interchange Parallel Compiler (루프인터체인지 병렬컴파일러 구현)

  • Song, Worl-Bong
    • Journal of the Korea Computer Industry Society
    • /
    • v.8 no.3
    • /
    • pp.167-172
    • /
    • 2007
  • Generally, In a application program the core part for parallel processing is a loop. therefore in this paper, loop interchange parallel compiler is proposed. this is a procedure for the automatic conversion of a loop interchange. According to execution to the outside CDOALL statements of cedar fortran, loop interchange is more effectively method the extracting parallelism in order to parallel processing in iterations. This method will be expected to effectively execution result with mixed into linear conversion and go far toward solving the effectively implementation of the non-unimodular nested loop.

  • PDF

Parallelization of Cell Contour Line Extraction Algorithm (세포 외곽선 추출 알고리즘의 병렬화)

  • Lee, Ho Seok;Yu, Suk Hyun;Kwon, Hee Yong
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.10
    • /
    • pp.1180-1188
    • /
    • 2015
  • In this paper, a parallel cell contour line extraction algorithm using CUDA, which has no inner contour lines, is proposed. The contour of a cell is very important in a cell image analysis. It could be obtained by a conventional serial contour tracing algorithm or parallel morphology operation. However, the cell image has various damages in acquisition or dyeing process. They could be turn into several inner contours, which make a cell image analysis difficult. The proposed algorithm introduces a min-max coordinates table into each CUDA thread block, and removes the inner contour in parallel. It is 4.1 to 7.6 times faster than a conventional serial contour tracing algorithm.

An Optimal Parallel Sort Algorithm for Minimum Data Movement (최소 자료 이동을 위한 최적 병렬 정렬 알고리즘)

  • Hong, Seong-Su;Sim, Jae-Hong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.3
    • /
    • pp.290-298
    • /
    • 1994
  • In this paper we propose parallel sorting algorithm, taking 0( $n^{n}$ log n) time complexity, 0( $n^{x}$ log n) cost (parallel running time * number of processors) and 0( $n^{1-}$x+ $n^{x}$ )data movement complexity under the ERWW- PRAM model. The methods for solving these problems similar. Parallel algorithm finds pivot for partitioning the data into ordered subsets of approximately equal size by using encording pointers..

  • PDF

An Optimized Iterative Semantic Compression Algorithm And Parallel Processing for Large Scale Data

  • Jin, Ran;Chen, Gang;Tung, Anthony K.H.;Shou, Lidan;Ooi, Beng Chin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2761-2781
    • /
    • 2018
  • With the continuous growth of data size and the use of compression technology, data reduction has great research value and practical significance. Aiming at the shortcomings of the existing semantic compression algorithm, this paper is based on the analysis of ItCompress algorithm, and designs a method of bidirectional order selection based on interval partitioning, which named An Optimized Iterative Semantic Compression Algorithm (Optimized ItCompress Algorithm). In order to further improve the speed of the algorithm, we propose a parallel optimization iterative semantic compression algorithm using GPU (POICAG) and an optimized iterative semantic compression algorithm using Spark (DOICAS). A lot of valid experiments are carried out on four kinds of datasets, which fully verified the efficiency of the proposed algorithm.

High Throughput Parallel KMP Algorithm Considering CPU-GPU Memory Hierarchy (CPU-GPU 메모리 계층을 고려한 고처리율 병렬 KMP 알고리즘)

  • Park, Soeun;Kim, Daehee;Lee, Myungho;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.5
    • /
    • pp.656-662
    • /
    • 2018
  • Pattern matching algorithm is widely used in many application fields such as bio-informatics, intrusion detection, etc. Among many string matching algorithms, KMP (Knuth-Morris-Pratt) algorithm is commonly used because of its fast execution time when using large texts. However, the processing speed of KMP algorithm is also limited when the text size increases significantly. In this paper, we propose a high throughput parallel KMP algorithm considering CPU-GPU memory hierarchy based on OpenCL in GPGPU (General Purpose computing on Graphic Processing Unit). We focus on the optimization for the allocation of work-times and work-groups, the local memory copy of the pattern data and the failure table, and the overlapping of the data transfer with the string matching operations. The experimental results show that the execution time of the optimized parallel KMP algorithm is about 3.6 times faster than that of the non-optimized parallel KMP algorithm.

A Fast Transmission of Mobile Agents Using Binomial Trees (바이노미얼 트리를 이용한 이동 에이전트의 빠른 전송)

  • Cho, Soo-Hyun;Kim, Young-Hak
    • The KIPS Transactions:PartA
    • /
    • v.9A no.3
    • /
    • pp.341-350
    • /
    • 2002
  • As network environments have been improved and the use of internet has been increased, mobile agent technologies are widely used in the fields of information retrieval, network management, electronic commerce, and parallel/distributed processing. Recently, a lot of researchers have studied the concepts of parallel/distributed processing based on mobile agents. SPMD is the parallel processing method which transmits a program to all the computers participated in parallel environment, and performs a work with different data. Therefore, to transmit fast a program to all the computers is one of important factors to reduce total execution time. In this paper, we consider the parallel environment consisting of mobile agents system, and propose a new method which transmits fast a mobile agent code to all the computers using binomial trees in order to efficiently perform the SPMD parallel processing. The proposed method is compared with another ones through experimental evaluation on the IBM's Aglets, and gets greatly better performance. Also this paper deals with fault tolerances which can be occurred in transmitting a mobile agent using binomial trees.