• 제목/요약/키워드: Parallel Overhead

검색결과 157건 처리시간 0.03초

이기종 병렬 시스템을 위한 자동적 병렬화 컴파일러 후위 (Backend of a Parallelizing Compiler for an Heterogeneous Parallel System)

  • 권대석;김흥환;한상영
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제27권8호
    • /
    • pp.710-718
    • /
    • 2000
  • 고전적 시스템의 성능 향상을 위해 많은 병렬 처리 시스템들이 제안되어 왔다. 그러나 이들 시스템들은 흔히 통신과 동기화 부담을 과소 평가함으로써 기대한 만큼의 성능을 보이지 못하였다. 본 논문에서는 그러한 결과를 초래하는 이유를 설명하고, 병렬화 컴파일러가 만족시켜야 하는 성능상의 요구조건을 제시한다. 병렬화 결정은 성능 저하를 피하기 위해 반드시 통신과 동기화 부담(overhead)에 대한 분석에 기초하여 이루어져야 한다. 본 연구진은 이러한 발상을 자동적 병렬화 컴파일러 SUIF에 적용하여 SUIF의 후위를 MPI 함수를 이용하는 새로운 후위로 교체하고, 여기에 병렬화 결정의 타당성을 부담 정보에 기초하여 평가하는 능력을 부여하였다. 새로운 컴파일러 후위는 병렬화 가능한 부분이 명시된 SUIF 중간 코드를, 성능 저하를 초래하지 않으면서 MPI 함수 호출을 포함하는 분산 메모리 구조 병렬 프로그램으로 변환한다.

  • PDF

MAXIMUM TOLERABLE ERROR BOUND IN DISTRIBUTED SIMULATED ANNEALING

  • Hong, Chul-Eui;McMillin, Bruce M.;Ahn, Hee-Il
    • ETRI Journal
    • /
    • 제15권3_4호
    • /
    • pp.1-26
    • /
    • 1994
  • Simulated annealing is an attractive, but expensive, heuristic method for approximating the solution to combinatorial optimization problems. Attempts to parallel simulated annealing, particularly on distributed memory multicomputers, are hampered by the algorithm's requirement of a globally consistent system state. In a multicomputer, maintaining the global state S involves explicit message traffic and is a critical performance bottleneck. To mitigate this bottleneck, it becomes necessary to amortize the overhead of these state updates over as many parallel state changes as possible. By using this technique, errors in the actual cost C(S) of a particular state S will be introduced into the annealing process. This paper places analytically derived bounds on this error in order to assure convergence to the correct optimal result. The resulting parallel simulated annealing algorithm dynamically changes the frequency of global updates as a function of the annealing control parameter, i.e. temperature. Implementation results on an Intel iPSC/2 are reported.

  • PDF

A FASTER LU DECOMPOSITION FOR PARALLEL C PROGRAMS

  • Lee, Sang-Moon;Lee, Chin-Young
    • Journal of applied mathematics & informatics
    • /
    • 제3권2호
    • /
    • pp.217-234
    • /
    • 1996
  • This report introduces a faster parallel LU decomposi-tion algorithm that gives a speedup almost equal to the number of nodes used. The new algorithm takes an advantage of an important C feature that lays out a matrix using a row major scheme and is based on the currently widely used LU decomposition algorithm with one major modification to eliminate most of the communication overhead. Empirical results are included in this report. For example solving a dense matrix that contains 100,000,000 elements gives a speedup of 50 when executed on 50 nodes of an intel Paragon in parallel.

Efficient Parallel Scan Test Technique for Cores on AMBA-based SoC

  • Song, Jaehoon;Jung, Jihun;Kim, Dooyoung;Park, Sungju
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제14권3호
    • /
    • pp.345-355
    • /
    • 2014
  • Today's System-on-a-Chip (SoC) is designed with reusable IP cores to meet short time-to-market requirements. However, the increasing cost of testing becomes a big burden in manufacturing a highly integrated SoC. In this paper, an efficient parallel scan test technique is introduced to minimize the test application time. Multiple scan enable signals are adopted to implement scan architecture to achieve optimal test application time for the test patterns scheduled for concurrent scan test. Experimental results show that testing times are considerably reduced with little area overhead.

Fully Homomorphic Encryption Based On the Parallel Computing

  • Tan, Delin;Wang, Huajun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권1호
    • /
    • pp.497-522
    • /
    • 2018
  • Fully homomorphic encryption(FHE) scheme may be the best method to solve the privacy leakage problem in the untrusted servers because of its ciphertext calculability. However, the existing FHE schemes are still not being put into the practical applications due to their low efficiency. Therefore, it is imperative to find a more efficient FHE scheme or to optimize the existing FHE schemes so that they can be put into the practical applications. In this paper, we optimize GSW scheme by using the parallel computing, and finally we get a high-performance FHE scheme, namely PGSW scheme. Experimental results show that the time overhead of the homomorphic operations in new FHE scheme will be reduced manyfold with the increasing of processing units number. Therefore, our scheme can greatly reduce the running time of homomorphic operations and improve the performance of FHE scheme through sacrificing hardware resources. It can be seen that our FHE scheme can catalyze the development of FHE.

병렬 테스트 방법을 적용한 고집적 SRAM을 위한 내장된 자체 테스트 기법 (Built-in self test for high density SRAMs using parallel test methodology)

  • 강용석;이종철;강성호
    • 전자공학회논문지C
    • /
    • 제35C권8호
    • /
    • pp.10-22
    • /
    • 1998
  • To handle the density increase of SRAMs, a new parallel testing methodology based on built-in self test (BIST) is developed, which allows to access multiple cells simultaneously. The main idea is that a march algorithm is dperformed concurently in each baisc marching block hwich makes up whole memory cell array. The new parallel access method is very efficient in speed and reuqires a very thny hardware overhead for BIST circuitry. Results show that the fault coverage of the applied march algorithm can be achieved with a lower complexity order. This new paralle testing algorithm tests an .root.n *.root.n SRAM which consists of .root.k * .root.k basic marching blocks in O(5*.root.k*(.root.k+.root.k)) test sequence.

  • PDF

전철 간섭 대책용 분포형 외부전원시스템이 병행하는 배관에 미치는 영향 (Influence on Parallel Pipelines of Distributed ICCP Systems for Mitigation of DC Traction Interference)

  • 이현구;하윤철;하태현;배정효;김대경
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2005년도 추계학술대회 논문집 전기기기 및 에너지변환시스템부문
    • /
    • pp.285-287
    • /
    • 2005
  • When an underground pipeline runs parallel with DC traction systems, it suffers from DC traction interference. Because the train is fed by the substation through the overhead wire and return current back to the substation via the rails. If these return rails are poorly insulated from earth, DC current leak into the earth and can be picked up by nearby pipeline. It may bring about large-scale accidents even in cathodically protected systems. In this paper we analyze the influence on parallel pipelines of distributed ICCP(impressed current cathodic protection) systems for mitigation of DC traction interference using the simulation software CatPro.

  • PDF

Design and Implementation of a Massively Parallel Multithreaded Architecture: DAVRID

  • Sangho Ha;Kim, Junghwan;Park, Eunha;Yoonhee Hah;Sangyong Han;Daejoon Hwang;Kim, Heunghwan;Seungho Cho
    • Journal of Electrical Engineering and information Science
    • /
    • 제1권2호
    • /
    • pp.15-26
    • /
    • 1996
  • MPAs(Massively Parallel Architectures) should address two fundamental issues for scalability: synchronization and communication latency. Dataflow architecture faces problems of excessive synchronization overhead and inefficient execution of sequential programs while they offer the ability to exploit massive parallelism inherent in programs. In contrast, MPAs based on von Neumann computational model may suffer from inefficient synchronization mechanism and communication latency. DAVRID (DAtaflow/Von Neumann RISC hybrID) is a massively parallel multithreaded architecture which takes advantages of von Neumann and dataflow models. It has good single thread performance as well as tolerates synchronization and communication latency. In this paper, we describe the DAVRID architecture in detail and evaluate its performance through simulation runs over several benchmarks.

  • PDF

고속 네트웍 기반의 분산병렬시스템에서의 성능 향상 분석 모델 (Speedup Analysis Model for High Speed Network based Distributed Parallel Systems)

  • 김화성
    • 한국통신학회논문지
    • /
    • 제26권12C호
    • /
    • pp.218-224
    • /
    • 2001
  • 분산병렬처리의 목적은 다양한 내재 병렬 형태의 특징을 갖는 연산 집약적 문제를 고속 네트웍으로 연결되어진 다수의 고성능 및 병렬 컴퓨터들의 각기 다른 능력을 최대한 이용하여 해결함에 있다. 본 논문에서는 분산병렬시스템을 이용하는 경우의 성능 향상 분석을 위해 일반적인 그래프 표현 방법을 포함하는 계산 모델을 제안하고 프로그램의 수행을 위한 스케쥴링 시에 성능 향상이 어떠한 요인에 의해 달성되는지를 분석한다. 제안된 표현 방법은 동기종 및 이기종 시스템 모두에 적용되어질 수 있다. 분산병렬 시스템에서 스케줄링을 통하여 더 많은 속도향상을 얻기 위해서는 태스크와 병렬 컴퓨터간의 병렬특성의 일치가 주의 질게 다루어져야 하며 태스크의 이동으로 인한 통신 오버 헤드가 최소화 되어야 한다.

  • PDF

이더넷과 인피니밴드 네트워크 기반의 분산 메모리 시스템에서 병렬성능 분석 (PERFORMANCE ANALYSIS OF THE PARALLEL CUPID CODE IN DISTRIBUTED MEMORY SYSTEM BASED ETHERNET AND INFINIBAND NETWORK)

  • 전병진;최형권
    • 한국전산유체공학회지
    • /
    • 제19권2호
    • /
    • pp.24-29
    • /
    • 2014
  • In this study, a parallel performance of CUPID-code has been investigated for both Ethernet and Infiniband network system to examine the effect of cache memory and network-speed. Bi-conjugate gradient solver of CUPID-code has been parallelised by using domain decomposition method and message passing interface (MPI). It is shown that the parallel performance of Ethernet-network system is worse than that of Infiniband-network system due to the slow network-speed and a small cache memory. It is also found that the parallel performance of each system deteriorates for a small problem due to the communication overhead, but the performance of Infiniband-network system is better than Ethernet-network system due to a much faster network-speed. For a large problem, the parallel performance depends less on network system.