• Title/Summary/Keyword: shared parallel systems

Search Result 68, Processing Time 0.026 seconds

Performance Optimization of Parallel Algorithms

  • Hudik, Martin;Hodon, Michal
    • Journal of Communications and Networks
    • /
    • v.16 no.4
    • /
    • pp.436-446
    • /
    • 2014
  • The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.

Fuzzy-based Processor Allocation Strategy for Multiprogrammed Shared-Memory Multiprocessors (다중프로그래밍 공유메모리 다중프로세서 시스템을 위한 퍼지 기반 프로세서 할당 기법)

  • 김진일;이상구
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.10 no.5
    • /
    • pp.409-416
    • /
    • 2000
  • In the shared-memory mutiprocessor systems, shared processing techniques such as time-sharing, space¬sharing, and gang-scheduling are used to improve the overall system utilization for the parallel operations. Recently, LLPC(Loop-Level Process Control) allocation technique was proposed. It dynamically adjusts the needed number of processors for the execution of the parallel code portions based on the current system load in the given job. This method allocates as many available processors as possible, and does not save any processors for the parallel sections of other later-arriving applications. To solve this problem, in this paper, we propose a new processor allocation technique called FPA(Fuzzy Processor Allocation) that dynamically adjusts the number of processors by fuzzifYing the amounts ofueeded number of processors, loads, and estimated execution times of job. The proposed method provides the maximum possibility of the parallism of each job without system overload. We compare the performances of our approaches with the conventional results. The experiments show that the proposed method provides a better performance.

  • PDF

Efficient m-step Generalization of Iterative Methods

  • Kim, Sun-Kyung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.11 no.5
    • /
    • pp.163-169
    • /
    • 2006
  • In order to use parallel computers in specific applications, algorithms need to be developed and mapped onto parallel computer architectures. Main memory access for shared memory system or global communication in message passing system deteriorate the computation speed. In this paper, it is found that the m-step generalization of the block Lanczos method enhances parallel properties by forming in simultaneous search direction vector blocks. QR factorization, which lowers the speed on parallel computers, is not necessary in the m-step block Lanczos method. The m-step method has the minimized synchronization points, which resulted in the minimized global communications and main memory access compared to the standard methods.

  • PDF

Reducing False Sharing based on Memory Reference Patterns in Distributed Shared Memory Systems (분산 공유 메모리 시스템에서 메모리 참조 패턴에 근거한 거짓 공유 감속 기법)

  • Jo, Seong-Je
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1082-1091
    • /
    • 2000
  • In Distributed Shared Memory systems, false sharing occurs when two different data items, not shared but accessed by two different processors, are allocated to a single block and is an important factor in degrading system performance. The paper first analyzes shared memory allocation and reference patterns in parallel applications that allocate memory for shared data objects using a dynamic memory allocator. The shared objects are sequentially allocated and generally show different reference patterns. If the objects with the same size are requested successively as many times as the number of processors, each object is referenced by only a particular processor. If the objects with the same size are requested successively much more than the number of processors, two or more successive objects are referenced by only particular processors. On the basis of these analyses, we propose a memory allocation scheme which allocates each object requested by different processors to different pages and evaluate the existing memory allocation techniques for reducing false sharing faults. Our allocation scheme reduces a considerable amount of false sharing faults for some applications with a little additional memory space.

  • PDF

Memory Allocation Scheme for Reducing False Sharing on Multiprocessor Systems (다중처리기 시스템에서 거짓 공유 완화를 위한 메모리 할당 기법)

  • Han, Boo-Hyung;Cho, Seong-Je
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.383-393
    • /
    • 2000
  • In shared memory multiprocessor systems, false sharing occurs when several independent data objects, not shared but accessed by different processors, are allocated to the same coherency unit of memory. False sharing is one of the major factors that may degrade the performance of memory coherency protocols. This paper presents a new shared memory allocation scheme to reduce false sharing of parallel applications where master processor controls allocation of all the shared objects. Our scheme allocates the objects to temporary address space for the moment, and actually places each object in the address space of processor that first accesses the object later. Its goal is to allocate independent objects that may have different access patterns to different pages. We use execution-driven simulation of real parallel applications to evaluate the effectiveness of our scheme. Experimental results show that by using our scheme a considerable amount of false sharing faults can be reduced with low overhead.

  • PDF

A Processor Allocation Policy using Program Characteristics on Shared Bus (공유 버스상에서 프로그램 특성을 사용한 프로세서 할당 정책)

  • Jeong, In-Beom;Lee, Jun-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.9
    • /
    • pp.1073-1082
    • /
    • 1999
  • 본 논문에서는 시스템 내의 프로세서들을 효과적으로 사용하기 위한 적응적 프로세서 할당 정책을 제안한다. 프로그램의 병렬성을 향상시키기 위하여 일반적으로 병렬 처리에 사용될 프로세서 개수를 증가시킨다. 그러나 증가된 프로세서들은 그레인 크기에 변화를 일으키며 이는 캐쉬 성능에 영향을 미친다. 특히 대역이 제한된 공유 버스를 사용하는 시스템에서는 프로세서 개수의 증가는 공유 버스에 대한 접근 경쟁을 크게 증가하므로 버스에서 대기하는 시간이 프로세서 증가에 의한 계산 능력 이득을 상쇄시키는 주요한 원인이 되고 있다. 본 논문에서 제안한 적응적 프로세서 할당 정책은 프로그램이 수행되는 도중에 임의의 기간동안 공유버스에 대기중인 프로세서 분포에 관한 정보를 얻는다. 그리고 이 정보를 바탕으로 프로세서 개수를 변경하는 방법이다. 모의 시험에서 적응적 프로세서 할당 정책은 프로그램들의 버스 트래픽 특성에 따른 최적의 적합한 프로세서 개수를 발견함을 보인다. 그리고 적응적 프로세서 할당 정책은 고정된 프로세서 개수를 사용한 가장 좋은 성능보다는 다소 떨어진 성능을 나타내었으나 시스템의 프로세서 활용성을 높여 효과적 시스템 사용에 기여함을 보인다. Abstract In this paper, the adaptive processor allocation policy is suggested to make effective use of processors in system. To enhance the parallelism, the number of processors used in the parallel computing may be increased. However, increasing the number of processors affects the grain size of the parallel program. Therefore, it affects the cache performance. In particular, when the shared bus is employed, since increasing the number of processors can result in a significant amount of contention to achieve the shared-bus, the increased computing power is offset by the bus waiting time due to these contentions. The adaptive processor allocation policy acquires the information about the distribution of waiting processors on shared bus for any execution period of programs. And it changes the number of processors working in parallel processing during the program's run. Our simulation results show that the adaptive processor allocation policy finds the optimum feasible number of processors based on the bus traffic characteristic of programs. Thus, it contributes to effective system utilization, even though it performs slightly less efficiently than using a fixed number of processors with the best performance.

On-the -fly Detection of the First Races for Shared-Memory Parallel Programs with Ordered Synchronization (순서적 동기화를 포함하는 공유 메모리 병렬프로그램에서의 수행중 최초경합 탐지 기법)

  • Park, Hui-Dong;Jeon, Yong-Gi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.8
    • /
    • pp.884-894
    • /
    • 1999
  • 순서적 동기화 및 내포 병렬성을 포함하는 공유메모리 병렬 프로그램에서의 경합(race)은 프로그램 수행에서 원하지 않은 비결정성(nondeterminism)을 야기할 수 있기 때문에 반드시 탐지되어져야 한다. 특히 프로그램 수행에서 최초경합(first race)을 탐지하는 것은 중요한데, 그 이유는 이 경합을 제거하면 다른 경합이 나타나지 않을 수도 있기 때문이다. 본 논문에서는 결정적 공유메모리 병렬프로그램을 위한 2단계 수행중 (two-pass on-the-fly) 최초경합 탐지 기법을 제시하며, 이것은 공유메모리 병렬 프로그램의 특정 수행에서 "최초로 발생되는" 경합들을 탐지하는 기법이다. 그리고 HPF 컴파일러를 이용하여 본 탐지 프로토콜을 공인된 벤치마크 프로그램에 적용하여, 병렬 프로그램 디버깅 시 고려하여야 할 파라미터들에 대한 실험으로부터 본 기법의 효율성을 보였다.Abstract Detecting races is important in debugging shared-memory parallel programs which have ordered synchronization and nested parallelism, because the races result in unintended non- deterministic executions of the programs. The first races are important in debugging, because the removal of such races may make other races disappear. It is even possible that all races reported would disappear once the first races are removed. This paper presents a new two-pass on-the-fly algorithm to detect the first races in such parallel programs. The algorithm reported in this paper is an on-the-fly algorithm that detects the races that "occur first" in a particular execution of shared-memory parallel programs. The experiment has accomplished, where two certified benchmark programs which can be executed under High Performance Fortran environments to get some parameters which improve debugging performance with our algorithm. with our algorithm.

A Feasibility Design of PEMFC Parallel Operation for a Fuel Cell Generation System

  • Kang, Hyun-Soo;Choe, Gyu-Yeong;Lee, Byoung-Kuk;Hur, Jin
    • Journal of Electrical Engineering and Technology
    • /
    • v.3 no.3
    • /
    • pp.408-421
    • /
    • 2008
  • In this paper, the parallel operation for a FC generation system is introduced and designed in order to increase the capacity for the distributed generation of a proton exchange membrane fuel cell (PEMFC) system. The equipment is the type that is used by parallel operated PEMFC generation systems which have two PEMFC systems, two dc/dc boost converters with shared dc link, and a grid-connected dc/ac inverter for embedded generation. The system requirement for the purpose of parallel operated generation using PEMFC system is also described. Aspects related to the mechanical (MBOP) and electrical (EBOP) component, size, and system complexity of the distributed generation system, it is explained in order to design an optimal distributed generation system using PEMFC. The optimal controller design for the parallel operation of the converter is suggested and informative simulations and experimental results are provided.

Memory Behavior in Scientific vs. Commercial Applications

  • Kim, Taegyoun;Heejung Wang;Lee, Kangwoo
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.421-425
    • /
    • 1999
  • As the market size of multiprocessor systems for commercial applications, parallel systems, especially cache-coherent shared-memory multiprocessors that are conventionally designed for scientific applications need to be tuned in different fashion to achieve the best performance for new application area. In this paper, indepth investigation on the memory behavior which is the primary cause for performance changes were made. We chose representative benchmarks in scientific and commercial application areas. After running execution-driven simulation for bus-based cache-coherent shared-memory multiprocessors, we experienced significant differences and conclude that the systems must be carefully and differently designed to achieve the best performance when they are built for distinct applications.

  • PDF

An Efficient Processor Synchronization Scheme on Shared Memory Multiprocessor (공유메모리 다중처리기에서 효율적인 프로세서 동기화 기법)

  • 윤석한;원철호;김덕진
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.5
    • /
    • pp.683-692
    • /
    • 1995
  • Many kinds of large scale multiprocessing and parallel-processing systems have recently been developed. The contention on the shared data caused by multiple processors may degrade system performance. So, processor synchronization has become one of the important issues in these systems. To solve the synchornization issues, a lot of software and hardware schemes based on spin lock have been proposed. Although software schemes are easy to implement, hardware schemes are preferred in many systems to gain optimized performance. This paper proposes an efficient processor synchronization scheme, called QCX,and describes its design considerations, hardware, algorithm, protocol. Also, in this paper, the performance of QCX has been evaluated with QOLB[5] and LBP[7] using a simulation. The simulation, with varying the number of processor and the contention on shared variables, measured the average execution times of a workload. The simulation results show that the performances of QCX is best when practicability is considered. QCX is more efficient than QOLB and LBP in two aspects. First, the hardware of QCX is more simple and cost-effective because the cache structure need not be changed. Secondly, QCX is more general because it uses a generic atomic instruction.

  • PDF