• Title/Summary/Keyword: Parallel computation

Search Result 592, Processing Time 0.032 seconds

Parallelism point selection in nested parallelism situations with focus on the bandwidth selection problem (평활량 선택문제 측면에서 본 중첩병렬화 상황에서 병렬처리 포인트선택)

  • Cho, Gayoung;Noh, Hohsuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.383-396
    • /
    • 2018
  • Various parallel processing R packages are used for fast processing and the analysis of big data. Parallel processing is used when the work can be decomposed into tasks that are non-interdependent. In some cases, each task decomposed for parallel processing can also be decomposed into non-interdependent subtasks. We have to choose whether to parallelize the decomposed tasks in the first step or to parallelize the subtasks in the second step when facing nested parallelism situations. This choice has a significant impact on the speed of computation; consequently, it is important to understand the nature of the work and decide where to do the parallel processing. In this paper, we provide an idea of how to apply parallel computing effectively to problems by illustrating how to select a parallelism point for the bandwidth selection of nonparametric regression.

Parallel Genetic Algorithm-Tabu Search Using PC Cluster System for Optimal Reconfiguration of Distribution Systems (배전계통 최적 재구성 문제에 PC 클러스터 시스템을 이용한 병렬 유전 알고리즘-타부 탐색법 구현)

  • Mun Kyeong-Jun;Song Myoung-Kee;Kim Hyung-Su;Kim Chul-Hong;Park June Ho;Lee Hwa-Seok
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.53 no.10
    • /
    • pp.556-564
    • /
    • 2004
  • This paper presents an application of parallel Genetic Algorithm-Tabu Search(GA-TS) algorithm to search an optimal solution of a reconfiguration in distribution system. The aim of the reconfiguration of distribution systems is to determine switch position to be opened for loss minimization in the radial distribution systems, which is a discrete optimization problem. This problem has many constraints and very difficult to solve the optimal switch position because it has many local minima. This paper develops parallel GA-TS algorithm for reconfiguration of distribution systems. In parallel GA-TS, GA operators are executed for each processor. To prevent solution of low fitness from appearing in the next generation, strings below the average fitness are saved in the tabu list. If best fitness of the GA is not changed for several generations, TS operators are executed for the upper 10% of the population to enhance the local searching capabilities. With migration operation, best string of each node is transferred to the neighboring node aster predetermined iterations are executed. For parallel computing, we developed a PC-cluster system consisting of 8 PCs. Each PC employs the 2 GHz Pentium Ⅳ CPU and is connected with others through ethernet switch based fast ethernet. To show the usefulness of the proposed method, developed algorithm has been tested and compared on a distribution systems in the reference paper. From the simulation results, we can find that the proposed algorithm is efficient and robust for the reconfiguration of distribution system in terms of the solution qualify. speedup. efficiency and computation time.

Parallel Genetic Algorithm-Tabu Search Using PC Cluster System for Optimal Reconfiguration of Distribution Systems

  • Mun Kyeong-Jun;Lee Hwa-Seok;Park June-Ho
    • KIEE International Transactions on Power Engineering
    • /
    • v.5A no.2
    • /
    • pp.116-124
    • /
    • 2005
  • This paper presents an application of the parallel Genetic Algorithm-Tabu Search (GA- TS) algorithm, and that is to search for an optimal solution of a reconfiguration in distribution systems. The aim of the reconfiguration of distribution systems is to determine the appropriate switch position to be opened for loss minimization in radial distribution systems, which is a discrete optimization problem. This problem has many constraints and it is very difficult to solve the optimal switch position because of its numerous local minima. This paper develops a parallel GA- TS algorithm for the reconfiguration of distribution systems. In parallel GA-TS, GA operators are executed for each processor. To prevent solution of low fitness from appearing in the next generation, strings below the average fitness are saved in the tabu list. If best fitness of the GA is not changed for several generations, TS operators are executed for the upper 10$\%$ of the population to enhance the local searching capabilities. With migration operation, the best string of each node is transferred to the neighboring node after predetermined iterations are executed. For parallel computing, we developed a PC-cluster system consisting of 8 PCs. Each PC employs the 2 GHz Pentium IV CPU and is connected with others through switch based rapid Ethernet. To demonstrate the usefulness of the proposed method, the developed algorithm was tested and is compared to a distribution system in the reference paper From the simulation results, we can find that the proposed algorithm is efficient and robust for the reconfiguration of distribution system in terms of the solution quality, speedup, efficiency, and computation time.

Analysis of Distributed Computational Loads in Large-scale AC/DC Power System using Real-Time EMT Simulation (대규모 AC/DC 전력 시스템 실시간 EMP 시뮬레이션의 부하 분산 연구)

  • In Kwon, Park;Yi, Zhong Hu;Yi, Zhang;Hyun Keun, Ku;Yong Han, Kwon
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.8 no.2
    • /
    • pp.159-179
    • /
    • 2022
  • Often a network becomes complex, and multiple entities would get in charge of managing part of the whole network. An example is a utility grid. While the entire grid would go under a single utility company's responsibility, the network is often split into multiple subsections. Subsequently, each subsection would be given as the responsibility area to the corresponding sub-organization in the utility company. The issue of how to make subsystems of adequate size and minimum number of interconnections between subsystems becomes more critical, especially in real-time simulations. Because the computation capability limit of a single computation unit, regardless of whether it is a high-speed conventional CPU core or an FPGA computational engine, it comes with a maximum limit that can be completed within a given amount of execution time. The issue becomes worsened in real time simulation, in which the computation needs to be in precise synchronization with the real-world clock. When the subject of the computation allows for a longer execution time, i.e., a larger time step size, a larger portion of the network can be put on a computation unit. This translates into a larger margin of the difference between the worst and the best. In other words, even though the worst (or the largest) computational burden is orders of magnitude larger than the best (or the smallest) computational burden, all the necessary computation can still be completed within the given amount of time. However, the requirement of real-time makes the margin much smaller. In other words, the difference between the worst and the best should be as small as possible in order to ensure the even distribution of the computational load. Besides, data exchange/communication is essential in parallel computation, affecting the overall performance. However, the exchange of data takes time. Therefore, the corresponding consideration needs to be with the computational load distribution among multiple calculation units. If it turns out in a satisfactory way, such distribution will raise the possibility of completing the necessary computation in a given amount of time, which might come down in the level of microsecond order. This paper presents an effective way to split a given electrical network, according to multiple criteria, for the purpose of distributing the entire computational load into a set of even (or close to even) sized computational loads. Based on the proposed system splitting method, heavy computation burdens of large-scale electrical networks can be distributed to multiple calculation units, such as an RTDS real time simulator, achieving either more efficient usage of the calculation units, a reduction of the necessary size of the simulation time step, or both.

On the Design Technique and VLSI Structure for a Multiplierless Quincuncial Interpolation Filter (무곱셈 대각 보간 필터의 설계 및 VLSI 구현에 관한 연구)

  • 최진우;이상욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.8
    • /
    • pp.54-65
    • /
    • 1992
  • A huge amount of multiplications is required for 2-D filtering on the image data, making it difficult to implement a real-time quincuncial interpolator. In this paper, efficient design technique and VLSI structures for 2-D multipleierless filter are presented. In the filter design, by introducing an efficient scheme for discretizing the frequency response of the prototype filter, it is shown that a significant amount of the computational burden required in the conventional techniques, such as local search, branch and bound techniques, could be saved. In the case of 5$\times$5 filter, it is found that the design technique described in this paper could save about 80% of the computation time, compared to the conventional methods, while providing a comparable performance. For a hardware implementation, two different VLSI structures for 2-D multiplierless filter are also introduced in the paper : One is for block parallel processing and the other for scan-line parallel processing. In both structure, the AP(area-period) figure improves over Wu's structure[4].

  • PDF

Mixed Lubrication Analysis of Parallel Thrust Bearing by Surface Topography (표면거칠기를 고려한 평행 스러스트 베어링의 혼합윤활 해석)

  • 이동길;임윤철
    • Proceedings of the Korean Society of Tribologists and Lubrication Engineers Conference
    • /
    • 2000.06a
    • /
    • pp.134-141
    • /
    • 2000
  • The real area of contacts, average film thickness, mean real pressure, and mean hydrodynamic pressure are investigated numerically in this study, especially for the parallel thrust bearing. Model surface is generated numerically with given autocorrelation function and some surface profile parameters. Then the average Reynolds equation contained flow factors and contact factor is applied to predict the effects of surface roughness in mixed lubrication regimes. In this equation, flow factors are defined as correction terms to smooth out high frequency surface roughness and contact factor is introduced to relieve from obtaining the average film thickness. Therefore the computation time to obtain h can be reduced.

  • PDF

An Alternating Implicit Block Overlapped FDTD (AIBO-FDTD) Method and Its Parallel Implementation

  • Pongpaibool, Pornanong;Kamo, Atsushi;Watanabe, Takayuki;Asai, Hideki
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.137-140
    • /
    • 2002
  • In this paper, a new algorithm for two-dimensional (2-D) finite-difference time-domain (FDTD) method is presented. By this new method, the maximum time step size can be increased over the Courant-Friedrich-Levy (CFL) condition restraint. This new algorithm is adapted from an Alternating-Direction Implicit FDTD (ADI-FDTD) method. However, unlike the ADI-FDTD algorithm. the alternation is performed with respect to the blocks of fields rather than with respect to each respective coordinate direction. Moreover. this method can be efficiently simulated with parallel computation. and it is more efficient than the conventional FDTD method in terms of CPU time. Numerical formulations are shown and simulation results are presented to demonstrate the effectiveness and efficiency of our proposed method.

  • PDF

Mixed Lubrication Analysis of Parallel Thrust Bearing Considering Surface Roughness (표면거칠기를 고려한 평행 스러스트 베어링의 혼합윤활 해석)

  • 이동길;임윤철
    • Tribology and Lubricants
    • /
    • v.16 no.6
    • /
    • pp.455-460
    • /
    • 2000
  • The real area of contacts, average film thickness, mean real pressure, and mean hydrodynamic pressure are investigated numerically in this study, especially for the parallel thrust bearing. Model surface is generated numerically with given autocorrelation function and some surface profile parameters. Then the average Reynolds equation contained flow factors and contact factor is applied to predict the effects of surface roughness in mixed lubrication regimes. In this equation, flow factors are defined as correction terms to smooth out high frequency surface roughness and contact factor is introduced to relieve from obtaining the average film thickness. Therefore the computation time to obtain barh h can be reduced.

A Scheduling Method on Parallel Computation Models with Limited Number of Processors Using Genetic Algorithms (프로세서의 수가 한정되어있는 병렬계산모델에서 유전알고리즘을 이용한 스케쥴링해법)

  • 성기석;박지혁
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.23 no.2
    • /
    • pp.15-27
    • /
    • 1998
  • In the parallel processing systems, a compiler partitions a loaded program into tasks, allocates the tasks on multiple processors and schedules the tasks on each allocated processor. In this paper we suggest a Genetic Algorithm(GA) based scheduling method to find an optimal allocation and sequence of tasks on each Processor. The suggested method uses a chromosome which consists of task sequence and binary string that represent the number and order of tasks on each processor respectively. Two correction algorithms are used to maintain precedency constraints of the tasks in the chromosome. This scheduling method determines the optimal number of processors within limited numbers, and then finds the optimal schedule for each processor. A result from computational experiment of the suggested method is given.

  • PDF

Parallel Algorithm for Matrix-Matrix Multiplication on the GPU (GPU 기반 행렬 곱셈 병렬처리 알고리즘)

  • Park, Sangkun
    • Journal of Institute of Convergence Technology
    • /
    • v.9 no.1
    • /
    • pp.1-6
    • /
    • 2019
  • Matrix multiplication is a fundamental mathematical operation that has numerous applications across most scientific fields. In this paper, we presents a parallel GPU computation algorithm for dense matrix-matrix multiplication using OpenGL compute shader, which can play a very important role as a fundamental building block for many high-performance computing applications. Experimental results on NVIDIA Quad 4000 show that the proposed algorithm runs about 208 times faster than previous CPU algorithm and achieves performance of 75 GFLOPS in single precision for dense matrices with matrix size 4,096. Such performance proves that our algorithm is practical for real applications.