• 제목/요약/키워드: Parallel computation

검색결과 592건 처리시간 0.03초

A Study on the Convergency Property of the Auxiliary Problem Principle

  • Kim, Balho-H.
    • Journal of Electrical Engineering and Technology
    • /
    • 제1권4호
    • /
    • pp.455-460
    • /
    • 2006
  • This paper presents the convergency property of the Auxiliary Problem Principle when it is applied to large-scale Optimal Power Flow problems with Distributed or Parallel computation features. The key features and factors affecting the convergence ratio and solution stability of APP are also analyzed.

병렬 연산을 이용한 최적 확장체의 효율적 구현 (Efficient Implementation of Optimal Extension Fields Using Parallel Computation)

  • 이문규;박근수
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2003년도 봄 학술발표논문집 Vol.30 No.1 (A)
    • /
    • pp.269-271
    • /
    • 2003
  • 본 논문에서는 타원 곡선 암호의 성능을 향상시키기 위한 효율적인 최적 확장체 연산 알고리즘을 제안한다. 제안하는 알고리즘은 CPU에서 제공되는 정수 곱셈 명령 1회 실행에 두 개의 하위체 연산을 병렬적으로 수행하도록 함으로써 최적 확장체에서의 곱셈, 제곱, 역원 연산의 속도를 향상시킨다.

  • PDF

Integrated GUI Environment of Parallel Fuzzy Inference System for Pattern Classification of Remote Sensing Images

  • Lee, Seong-Hoon;Lee, Sang-Gu;Son, Ki-Sung;Kim, Jong-Hyuk;Lee, Byung-Kwon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제2권2호
    • /
    • pp.133-138
    • /
    • 2002
  • In this paper, we propose an integrated GUI environment of parallel fuzzy inference system fur pattern classification of remote sensing data. In this, as 4 fuzzy variables in condition part and 104 fuzzy rules are used, a real time and parallel approach is required. For frost fuzzy computation, we use the scan line conversion algorithm to convert lines of each fuzzy linguistic term to the closest integer pixels. We design 4 fuzzy processor unit to be operated in parallel by using FPGA. As a GUI environment, PCI transmission, image data pre-processing, integer pixel mapping and fuzzy membership tuning are considered. This system can be used in a pattern classification system requiring a rapid inference time in a real-time.

Efficient Parallel TLD on CPU-GPU Platform for Real-Time Tracking

  • Chen, Zhaoyun;Huang, Dafei;Luo, Lei;Wen, Mei;Zhang, Chunyuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권1호
    • /
    • pp.201-220
    • /
    • 2020
  • Trackers, especially long-term (LT) trackers, now have a more complex structure and more intensive computation for nowadays' endless pursuit of high accuracy and robustness. However, computing efficiency of LT trackers cannot meet the real-time requirement in various real application scenarios. Considering heterogeneous CPU-GPU platforms have been more popular than ever, it is a challenge to exploit the computing capacity of heterogeneous platform to improve the efficiency of LT trackers for real-time requirement. This paper focuses on TLD, which is the first LT tracking framework, and proposes an efficient parallel implementation based on OpenCL. In this paper, we firstly make an analysis of the TLD tracker and then optimize the computing intensive kernels, including Fern Feature Extraction, Fern Classification, NCC Calculation, Overlaps Calculation, Positive and Negative Samples Extraction. Experimental results demonstrate that our efficient parallel TLD tracker outperforms the original TLD, achieving the 3.92 speedup on CPU and GPU. Moreover, the parallel TLD tracker can run 52.9 frames per second and meet the real-time requirement.

대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현 (Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation)

  • 김종문;송윤선;김명원
    • 전자공학회논문지B
    • /
    • 제33B권2호
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

GPU의 공유메모리를 활용한 확장편집거리 병렬계산 (Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU)

  • 김영호;나중채;심정섭
    • 정보처리학회논문지:컴퓨터 및 통신 시스템
    • /
    • 제4권7호
    • /
    • pp.213-218
    • /
    • 2015
  • 알파벳 ${\Sigma}$로 구성된 길이가 각각 m, n인 두 문자열 X, Y가 주어졌을 때, X, Y의 확장편집거리는 동적프로그래밍을 이용하여 O(mn) 시간과 공간을 계산할 수 있다. 최근 m개의 쓰레드를 이용하여 O(m+n) 시간과 O(mn) 공간을 사용하여 X, Y의 확장편집거리를 계산하는 병렬알고리즘이 제시되었다. 본 논문에서는 GPU의 공유메모리를 활용하여 수행시간을 개선한 병렬알고리즘을 제시한다. 실험 결과, 개선된 병렬알고리즘이 기존의 병렬알고리즘보다 약 19~25배 이상 빠른 수행시간을 보였다.

CUDA based parallel design of a shot change detection algorithm using frame segmentation and object movement

  • Kim, Seung-Hyun;Lee, Joon-Goo;Hwang, Doo-Sung
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권7호
    • /
    • pp.9-16
    • /
    • 2015
  • This paper proposes the parallel design of a shot change detection algorithm using frame segmentation and moving blocks. In the proposed approach, the high parallel processing components, such as frame histogram calculation, block histogram calculation, Otsu threshold setting function, frame moving operation, and block histogram comparison, are designed in parallel for NVIDIA GPU. In order to minimize memory access delay time and guarantee fast computation, the output of a GPU kernel becomes the input data of another kernel in a pipeline way using the shared memory of GPU. In addition, the optimal sizes of CUDA processing blocks and threads are estimated through the prior experiments. In the experimental test of the proposed shot change detection algorithm, the detection rate of the GPU based parallel algorithm is the same as that of the CPU based algorithm, but the average of processing time speeds up about 6~8 times.

대면적 고분자전해질연료전지의 병렬계산 시뮬레이션 (Parallel Computing Simulation of Large-Scale Polymer Electrolyte Fuel Cells)

  • 곽건희;푸루소타마;강경문;주현철
    • 한국수소및신에너지학회논문집
    • /
    • 제22권6호
    • /
    • pp.868-877
    • /
    • 2011
  • This paper presents a parallel computing methodology for polymer electrolyte fuel cells (PEFCs) and detailed simulation contours of a real-scale fuel cell. In this work, a three-dimensional two-phase PEFC model is applied to a large-scale 200 $cm^2$ fuel cell geometry that requires roughly 13.5 million grid points based on grid-independence study. For parallel computing, the large-scale computational domain is decomposed into 12 sub-domains and parallel simulations are carried out using 12 processors of 2.53 GHz Intel core i7 and 48GB RECC DDR3-1333. The work represents the first attempt to parallelize a two-phase PEFC code and illustrate two-phase contours in a representative industrial cell.