• Title/Summary/Keyword: task parallelism

Search Result 40, Processing Time 0.022 seconds

A Representation for Multithreaded Data-parallel Programs : PCFG(Parallel Control Flow Graph) (다중스레드 데이타 병렬 프로그램의 표현 : PCFG(Parallel Control Flow Graph))

  • 김정환
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.12
    • /
    • pp.655-664
    • /
    • 2002
  • In many data-parallel applications massive parallelism can be easily extracted through data distribution. But it often causes very long communication latency. This paper shows that task parallelism, which is extracted from data-parallel programs, can be exploited to hide such communication latency Unlike the most previous researches over exploitation of task parallelism which has not been considered together with data parallelism, this paper describes exploitation of task parallelism in the context of data parallelism. PCFG(Parallel Control Flow Graph) is proposed to represent a multithreaded program consisting of a few task threads each of which can include a few data-parallel loops. It is also described how a PCFG is constructed from a source data-parallel program through HDG(Hierarchical Dependence Graph) and how the multithreaded program can be constructed from the PCFG.

Effect of Representation Methods on Time Complexity of Genetic Algorithm based Task Scheduling for Heterogeneous Network Systems

  • Kim, Hwa-Sung
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.1 no.1
    • /
    • pp.35-53
    • /
    • 1997
  • This paper analyzes the time complexity of Genetic Algorithm based Task Scheduling (GATS) which is designed for the scheduling of parallel programs with diverse embedded parallelism types in a heterogeneous network systems. The analysis of time complexity is performed based on two representation methods (REIA, REIS) which are proposed in this paper to encode the scheduling information. And the heterogeneous network systems consist of a set of loosely coupled parallel and vector machines connected via a high-speed network. The objective of heterogeneous network computing is to solve computationally intensive problems that have several types of parallelism, on a suite of high performance and parallel machines in a manner that best utilizes the capabilities of each machine. Therefore, when scheduling in heterogeneous network systems, the matching of the parallelism characteristics between tasks and parallel machines should be carefully handled in order to obtain more speedup. This paper shows how the parallelism type matching affects the time complexity of GATS.

  • PDF

Generic Scheduling Method for Distributed Parallel Systems (분산병렬 시스템에서 유전자 알고리즘을 이용한 스케쥴링 방법)

  • Kim, Hwa-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.1B
    • /
    • pp.27-32
    • /
    • 2003
  • This paper presents the Genetic Algorithm based Task Scheduling (GATS) method for the scheduling of programs with diverse embedded parallelism types in Distributed Parallel Systems, which consist of a set of loosely coupled parallel and vector machines connected via high speed networks The distributed parallel processing tries to solve computationally intensive problems that have several types of parallelism, on a suite of high performance and parallel machines in a manner that best utilizes the capabilities of each machine. When scheduling in distributed parallel systems, the matching of the parallelism characteristics between tasks and parallel machines rather than load balancing should be carefully handled with the minimization of communication cost in order to obtain more speedup. This paper proposes the based initialization methods for an initial population and the knowledge-based mutation methods to accommodate the parallelism type matching in genetic algorithms.

Scheduling Scheme for Compound Nodes of Hierarchical Task Graph using Thread (스레드를 이용한 계층적 태스크 그래프(HTG)의 복합 노드 스케쥴링 기법)

  • Kim, Hyun-Chul;Kim, Hyo-Cheol
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.8
    • /
    • pp.445-455
    • /
    • 2002
  • In this paper, we present a new task scheduling scheme ior the efficient execution of the tasks of compound nodes of hierarchical task graph(HTG) on shared memory system. The proposed scheme for exploitation functional parallelism is autoscheduling that performs the role of scheduling by processor itself without any dedicated global scheduler. To adapt the proposed scheduling scheme for various platforms, Including a uni-processor systems, Java threads were used for implementation, and the performance is analyzed in comparison with a conventional bit vector method. The experimental results showed that the proposed method was found to be more efficient in its execution time and exhibited good load-balancing when using the experimental parameter values. Furthermore, the memory size could be reduced when using the proposed algorithm compared with a conventional scheme.

GPU-Based Acceleration of Quantum-Inspired Evolutionary Algorithm (GPU를 이용한 Quantum-Inspired Evolutionary Algorithm 가속)

  • Ryoo, Ji-Hyun;Park, Han-Min;Choi, Ki-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.1-9
    • /
    • 2012
  • Quantum-Inspired Evolutionary Algorithm(QEA) contains sufficient data-level parallelism to be naturally accelerated on GPUs. For an efficient reduction of execution time, however, careful task-mapping should be done to properly reflect the characteristics of CPU and GPU. Furthermore, when deciding which part of the application should run on GPU, we need to consider the data transfer between CPU and GPU memory spaces as well as the data-level parallelism. In addition, the usage of zero-copy host memory, proper choice of the execution configuration, and thread organization considering memory coalescing is important to further reduce the execution time. With all these techniques, we could run QEA 3.69 times faster on average in comparison with the multi-threading CPU for the case of 0-1 knapsack problem with 30,000 items.

Task Parallelism System of Application for Multicore-Based Mobile Platform (멀티코어 기반 모바일 플랫폼을 위한 애플리케이션의 태스크 병렬화 시스템)

  • Lim, Geunsik;Lee, Seho;Eom, Young Ik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.6
    • /
    • pp.521-530
    • /
    • 2013
  • This paper proposes a task parallelism system (BioMP) to improve applications' execution time of multicore based mobile device. When application developers append the functions of parallel specification into the existing software, our proposed system supports the parallel processing of threads as well as a compatibility. BioMP improves the software in order that an existing large-scale source can recognize the multicore architecture. From our experiment, our idea improved the execution time of application until about 64% against the existing system in multicore environment based on quad core. In addition, BioMP does not require any additional modification of a mobile platform because BioMP is independent component. Consequently, when application developers release multicore-aware applications into the application store, users can immediately run without any modification of the mobile device.

Proposition and Evaluation of Parallelism-Independent Scheduling Algorithms for DAGs of Tasks with Non-Uniform Execution Time

  • Kirilka Nikolova;Atusi Maeda;Sowa, Masa-Hiro
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.289-293
    • /
    • 2000
  • We propose two new algorithms for parallelism-independent scheduling. The machine code generated from the compiler using these algorithms in its scheduling phase is parallelism-independent code, executable in minimum time regardless of the number of the processors in the parallel computer. Our new algorithms have the following phases: finding the minimum number of processors on which the program can be executed in minimal time, scheduling by an heuristic algorithm for this predefined number of processors, and serialization of the parallel schedule according to the earliest start time of the tasks. At run time tasks are taken from the serialized schedule and assigned to the processor which allows the earliest start time of the task. The order of the tasks decided at compile time is not changed at run time regardless of the number of the available processors which means there is no out-of-order issue and execution. The scheduling is done predominantly at compile time and dynamic scheduling is minimized and diminished to allocation of the tasks to the processors. We evaluate the proposed algorithms by comparing them in terms of schedule length to the CP/MISF algorithm. For performance evaluation we use both randomly generated DAGs (directed acyclic graphs) and DACs representing real applications. From practical point of view, the algorithms we propose can be successfully used for scheduling programs for in-order superscalar processors and shared memory multiprocessor systems. Superscalar processors with any number of functional units can execute the parallelism-independent code in minimum time without necessity for dynamic scheduling and out-of-order issue hardware. This means that the use of our algorithms will lead to reducing the complexity of the hardware of the processors and the run-time overhead related to the dynamic scheduling.

  • PDF

Parallelism point selection in nested parallelism situations with focus on the bandwidth selection problem (평활량 선택문제 측면에서 본 중첩병렬화 상황에서 병렬처리 포인트선택)

  • Cho, Gayoung;Noh, Hohsuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.383-396
    • /
    • 2018
  • Various parallel processing R packages are used for fast processing and the analysis of big data. Parallel processing is used when the work can be decomposed into tasks that are non-interdependent. In some cases, each task decomposed for parallel processing can also be decomposed into non-interdependent subtasks. We have to choose whether to parallelize the decomposed tasks in the first step or to parallelize the subtasks in the second step when facing nested parallelism situations. This choice has a significant impact on the speed of computation; consequently, it is important to understand the nature of the work and decide where to do the parallel processing. In this paper, we provide an idea of how to apply parallel computing effectively to problems by illustrating how to select a parallelism point for the bandwidth selection of nonparametric regression.

Comparison of Performance in Classification, Seriation, and Grouping of Kin Terms in Korean Children (한국아동의 친척명 분류, 서열, 군집 수행의 비교)

  • YI, Soon Hyung
    • Korean Journal of Child Studies
    • /
    • v.9 no.2
    • /
    • pp.133-156
    • /
    • 1988
  • This study investigated developmental change with reference to continuity theory in the acquisition of concepts of kin relation, task difficulty with reference to cognitive complexity, and interrelationships in the performance of cognitive tasks of kinship concepts with reference to cognitive parallelism. The subjects consisted of 6-, 8-, 10, and 12-year-old randomly selected children attending kindergartens or elementary schools in Seoul. The schools were located in various residental areas regarded as either middle or lower class. The 81 boys and 80 girls participated in 3 experiments on classification, seriation, and grouping. The instrument for the classification, seriation, and grouping tasks was composed of 10 10cm black on white line drawings of the head and upper torso area of persons in kin relationship. The data was analyzed with MANOVA. A significant age effect was found in the 3 quasi- experiments. There were significant effects on task difficulty. The biosocial power distribution indirectly influenced children's acquisition of kin relational concepts; that is, children performed better in male-kin than in female-kin tasks. There was a high correlation in performance between the 3 cognitive tasks. These findings support the continuity theory (except for seriation), a model which arranges kin-names in order of cognitive load, the centric status of men in society, and the theory of cognitive developmental parallelism.

  • PDF

Performance Comparisons on Processor Allocation Algorithms by Using Simulation Techniques (시뮬레이션 기법을 이용한 프로세러 할당 알고리즘들의 성능비교)

  • 최준구
    • Journal of the Korea Society for Simulation
    • /
    • v.3 no.1
    • /
    • pp.43-53
    • /
    • 1994
  • With remarkable progress of hardware technologies, multiprocessor systems equipped with thousands of processors will be available in near future. In order to increase the performance of these systems, many processor allocation algorithms have been proposed. However, few studies have been conducted in order to compare the performance of these algorithms. In this paper, simulation techniques are used in order to compare the performance of the processor allocation algorithms proved to be useful. These are: an algorithm using equipartion, an algorithm using average parallelism, an algorithm using execution signatures, and an algorithm using the number of tasks in a task precedence graph. Simulation shows that the algorithm using execution signatures performs best while the algorithm using average parallelism performs worst with small allocated processors. Surprisingly, the algorithm using equipartition performs well despite the fact that it has smallest overhead. Overall, it can be recommended that the algorithm using equipartition be used without any execution history and that the algorithm using execution signatures be used with some execution history.

  • PDF