• Title/Summary/Keyword: Parallel Computer

Search Result 1,772, Processing Time 0.03 seconds

MAXIMUM TOLERABLE ERROR BOUND IN DISTRIBUTED SIMULATED ANNEALING

  • Hong, Chul-Eui;McMillin, Bruce M.;Ahn, Hee-Il
    • ETRI Journal
    • /
    • v.15 no.3
    • /
    • pp.1-26
    • /
    • 1994
  • Simulated annealing is an attractive, but expensive, heuristic method for approximating the solution to combinatorial optimization problems. Attempts to parallel simulated annealing, particularly on distributed memory multicomputers, are hampered by the algorithm's requirement of a globally consistent system state. In a multicomputer, maintaining the global state S involves explicit message traffic and is a critical performance bottleneck. To mitigate this bottleneck, it becomes necessary to amortize the overhead of these state updates over as many parallel state changes as possible. By using this technique, errors in the actual cost C(S) of a particular state S will be introduced into the annealing process. This paper places analytically derived bounds on this error in order to assure convergence to the correct optimal result. The resulting parallel simulated annealing algorithm dynamically changes the frequency of global updates as a function of the annealing control parameter, i.e. temperature. Implementation results on an Intel iPSC/2 are reported.

  • PDF

A Study on the Automatic Parallelization Method and Tool Development

  • Shin, Woochang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.87-94
    • /
    • 2020
  • Recently, computer hardware is evolving toward increasing the number of computing cores, not increasing the clock speed. In order to use the performance of parallelized hardware to the maximum, the running program must also be parallelized. However, software developers are accustomed to sequential programs, and in most cases, write programs that operate sequentially. They also have a lot of difficulty designing and developing software in parallel. We propose a method to automatically convert a sequential C/C++ program into a parallelized program, and develop a parallelization tool that supports it. It supports open multiprocessing (OpenMP) and parallel patterns library (PPL) as a parallel framework. Perfect automatic parallelization is difficult due to dynamic features such as pointer operation and polymorphism in C/C++ language. This study focuses on verifying the conditions of parallelization rather than focusing on fully automatic parallelization, and providing advice to developers in detail if parallelization is not possible.

Comparison of Go and C++ TBB on Parallel Processing (Go와 C++ TBB의 병렬처리 비교)

  • Park, Dong-Ha;Moon, Bong-Kyo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.64-67
    • /
    • 2017
  • Applying concurrent structure and parallel processing are a common issue for these day's programs. In this research, Dynamic Programming is used to compare the parallel performance of Go language and Intel C++ Thread Building Blocks. The experiment was performed on 4 core machine and its result contains execution time under Simultaneous Multi-Threading environment. Static Optimal Binary Search Tree was used as an example. From the result, the speed-up of Go was higher than the number of cores, and that of TBB was close to it. TBB performed better in general, but for larger scale, Go was partially faster than the other.

Parallel Processing of Multi-Core Processor and GPUs in Projection Step for Efficient Fluid Simulation (효율적인 유체 시뮬레이션을 위한 투영 단계에서의 멀티 코어 프로세서와 그래픽 프로세서의 병렬처리)

  • Kim, Sun-Tae;Jung, Hwi-Ryong;Hong, Jeong-Mo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.6
    • /
    • pp.48-54
    • /
    • 2013
  • In these days, the state-of-art technologies employ the heterogeneous parallelization of CPU and GPU for fluid simulations in the field of computer graphics. In this paper, we present a novel CPU-GPU parallel algorithm that solves projection step of fluid simulation more efficiently than existing sequential CPU-GPU processing. Fluid simulation that requires high computational resources can be carried out efficiently by the proposed method.

Design and Performance Analysis of the H/V-bus Parallel Computer (H/V-버스 병렬컴퓨터의 설계 및 성능 분석)

  • 김종현
    • Journal of the Korea Society for Simulation
    • /
    • v.3 no.1
    • /
    • pp.29-42
    • /
    • 1994
  • The architecture of a MIMD-type parallel computer system is specified: a simulator is developed to support design and evaluation of systems based on the architecture: and conducted with the simulator to evaluate system performance. The horizontal/vertical-bus(H/V-bus) system architecture provides an NxN array of processing elements which communicate with each other through a network of N horizontal buses and N vertical buses. The simulator, written in SLAM II and FORTRAN, is designed to provide high-resolution in simulating the IPC mechanism. Parameters provide the user with independent control of system size, PE speed and IPC mechanism speed. Results generated by the simulator include execution times, PE utilizations, queue lengths, and other data. The simulator is used to study system performance when a partial differential equation is solved by parallel Gauss-Seidel method. For comparisons, the benchmark is also executed on a single-bus system simulator that is derived from the H/V-bus system simulator. The benchmark is also solved on a single PE to obtain data for computing speedups. An extensive analysis of results is presented.

  • PDF

Sensorless Drive for Mono Inverter Dual Parallel Surface Mounted Permanent Magnet Synchronous Motor Drive System (단일 인버터를 이용한 표면 부착형 영구자석 동기 전동기 병렬 구동 시스템의 센서리스 구동 방법)

  • Lee, Yongjae;Ha, Jung-Ik
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.20 no.1
    • /
    • pp.38-44
    • /
    • 2015
  • This paper presents the sensorless drive method for mono inverter dual parallel (MIDP) surface mounted permanent magnet synchronous motor (SPMSM) drive system. MIDP motor drive system is a technique that can reduce the cost of the multi motor driving system. To maximize this merit of the MIDP motor drive system, the sensorless technique is essential to eliminate the position sensors. This paper adopts an appropriate sensorless method for MIDP SPMSM drive system, which uses the reduced order observer and phase locked loop (PLL) to reduce the calculation burden. The I-F control method is implemented for start-up and low speed operation. The validity and performance of the proposed algorithm are shown via experiments with 600-W SPMSMs.

Accelerating Fingerprint Enhancement Algorithm on GPGPU using OpenCL (OpenCL을 이용한 GPGPU 기반 지문개선 알고리즘 가속화)

  • Kim, Daehee;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.4
    • /
    • pp.666-672
    • /
    • 2016
  • Recently the fingerprint is widely used as one of biometrics to improve the security of financial mobile applications, because of its user convenience and high recognition rate. However, in order to apply fingerprint algorithms to finance and security applications, the recognition rate and processing speed of the fingerprint algorithms have to be improved further. In this paper, we propose the parallel fingerprint enhancement algorithm on general-purpose computing on graphics processing unit (GPGPU) using OpenCL. We discuss the analysis of the parallelism in the fingerprint algorithm as well as the exploration of optimization parameters of the parallel fingerprint algorithm to improve the performance. The experimental results showed that the execution of parallel fingerprint enhancement algorithm on GPGPUs was accelerated from 29.4 upto 69.2 times compared with the execution of the original one on the host CPUs.

Vertex disjoint covering cycle set in hypercubes (하이퍼큐브에서의 정점을 공유하지 않는 커버링사이클 집합)

  • Park, Won;Lim, Hyeong-Seok
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.11-14
    • /
    • 2003
  • In interconnection network for parallel processing, the cycle partitioning problem for parallel transmission with faulty vertieces or edges is very important. In this paper, we assume that k($\leq$m-1) edges do not share any vertices of m dimension hypercube Q$_{m}$ and show that it is possible to construct a cycle set which consists of k cycles covering all the vertices of the hypercube and one cycle including one of the given edges. This cycle set can be used to parallel transmission between two vertices joined by faulty edges.s.

  • PDF

Parallel Reduced-Order Square-Root Unscented Kalman Filter for State Estimation of Sensorless Permanent-Magnet Synchronous Motor (센서리스 영구자석 동기전동기의 상태 추정을 위한 병렬 축소 차수 제곱근 무향 칼만 필터)

  • Moon, Cheol;Kwon, Young-Ahn
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.6
    • /
    • pp.1019-1025
    • /
    • 2016
  • This paper proposes a parallel reduced-order square-root unscented Kalman filter for state estimation of a sensorless permanent-magnet synchronous motor. The appearance of an unscented Kalman filter is caused by the linearization process error between a real system and classical Kalman model. The unscented transformation can make a more accurate Kalman model. However, the complexity is its main drawback. This paper investigates the design and implementation of the proposed filter with Potter and Carlson square-root form. The proposed parallel reduced-order square-root unscented Kalman filter reduces memory and code size, and improves numerical computation. And the performance is not significantly different from the unscented Kalman filter. The experimentation is performed for the verification of the proposed filter.

Time Complexity Measurement on CUDA-based GPU Parallel Architecture of Morphology Operation

  • Izmantoko, Yonny S.;Choi, Heung-Kook
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.4
    • /
    • pp.444-452
    • /
    • 2013
  • Operation time of a function or procedure is a thing that always needs to be optimized. Parallelizing the operation is the general method to reduce the operation time of the function. One of the most powerful parallelizing methods is using GPU. In image processing field, one of the most commonly used operations is morphology operation. Three types of morphology operations kernel, na$\ddot{i}$ve, global and shared, are presented in this paper. All kernels are made using CUDA and work parallel on GPU. Four morphology operations (erosion, dilation, opening, and closing) using square structuring element are tested on MRI images with different size to measure the speedup of the GPU implementation over CPU implementation. The results show that the speedup of dilation is similar for all kernels. However, on erosion, opening, and closing, shared kernel works faster than other kernels.