• 제목/요약/키워드: Parallel computing

검색결과 812건 처리시간 0.026초

하이브리드 병렬 프로그램을 이용한 타키온 슈퍼컴퓨터의 성능 (Performance Characterization of Tachyon Supercomputer using Hybrid Multi-zone NAS Parallel Benchmarks)

  • 박남규;정윤수;이홍석
    • 한국정보통신학회논문지
    • /
    • 제14권1호
    • /
    • pp.138-144
    • /
    • 2010
  • 최근에 도입되어 운영되고 있는 타키온 1차 시스템은 쿼드코어 AMD 바로셀로나 노드로 구성된 고성능 슈퍼컴퓨터이다. 본 논문에서는 하이브리드 병렬화 기법을 도입한 프로그램 중 하나로 사용되고 있는 멀티존(Multi-zone) NAS 병렬 벤치마크(NPB)를 이용하여 타키온 성능 및 병렬 확장성을 검증하고자 한다. 하이브리드 병렬 성능 시험을 위하여 NPB-3.3 버전 BT-MZ의 B 및 C클래스를 사용하였으며, 실제로 타키온 시스템의 1024개의 프로세스까지 병렬 확장성을 테스트를 하였다. 프로세서 1024개 이상 이용한 하이브리드 병렬컴퓨팅 계산 결과는 국내 최초이다. 이러한 하이브리드 병렬화 기법은 타키온처럼 멀티코어 기술을 적용한 고성능 컴퓨팅 시스템에서 매우 효율적이고 유용한 병렬 성능 벤치마크가 될 수 있음을 기술하였다.

PC 클러스터 시스템 기반 병렬 PSO 알고리즘의 최적조류계산 적용 (Application of Parallel PSO Algorithm based on PC Cluster System for Solving Optimal Power Flow Problem)

  • 김종율;문경준;이화석;박준호
    • 전기학회논문지
    • /
    • 제56권10호
    • /
    • pp.1699-1708
    • /
    • 2007
  • The optimal power flow(OPF) problem was introduced by Carpentier in 1962 as a network constrained economic dispatch problem. Since then, the OPF problem has been intensively studied and widely used in power system operation and planning. In these days, OPF is becoming more and more important in the deregulation environment of power pool and there is an urgent need of faster solution technique for on-line application. To solve OPF problem, many heuristic optimization methods have been developed, such as Genetic Algorithm(GA), Evolutionary Programming(EP), Evolution Strategies(ES), and Particle Swarm Optimization(PSO). Especially, PSO algorithm is a newly proposed population based heuristic optimization algorithm which was inspired by the social behaviors of animals. However, population based heuristic optimization methods require higher computing time to find optimal point. This shortcoming is overcome by a straightforward parallel processing of PSO algorithm. The developed parallel PSO algorithm is implemented on a PC cluster system with 6 Intel Pentium IV 2GHz processors. The proposed approach has been tested on the IEEE 30-bus system. The results showed that computing time of parallelized PSO algorithm can be reduced by parallel processing without losing the quality of solution.

동적 네트워크 환경하의 분산 에이전트를 활용한 병렬 유전자 알고리즘 기법 (Applying Distributed Agents to Parallel Genetic Algorithm on Dynamic Network Environments)

  • 백진욱;방정원
    • 한국컴퓨터정보학회논문지
    • /
    • 제11권4호
    • /
    • pp.119-125
    • /
    • 2006
  • 네트워크를 통하여 서로 연결된 컴퓨팅 자원들의 집합을 분산 시스템이라고 정의할 수 있다. 최적화 문제 영역에서 가장 중요한 해결 기법 중에 하나인 병렬 유전자 알고리즘은 분산 시스템을 기반으로 하고 있다. 인터넷과 이동 컴퓨팅과 같은 동적 네트워크 환경 하에서 네트워크의 상태는 가변적으로 변할 수 있어 기존의 병렬 유전자 알고리즘을 분산 시스템에서 최적화 문제를 해결하기 위하여 그대로 사용하기에는 비효율적이다. 본 논문에서는 동적 네트워크 환경 하에서 분산 에이전트를 사용하여 병렬 유전자 알고리즘을 효율적으로 사용할 수 있는 기법을 제시한다.

  • PDF

Parallel Algorithm of Improved FunkSVD Based on Spark

  • Yue, Xiaochen;Liu, Qicheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권5호
    • /
    • pp.1649-1665
    • /
    • 2021
  • In view of the low accuracy of the traditional FunkSVD algorithm, and in order to improve the computational efficiency of the algorithm, this paper proposes a parallel algorithm of improved FunkSVD based on Spark (SP-FD). Using RMSProp algorithm to improve the traditional FunkSVD algorithm. The improved FunkSVD algorithm can not only solve the problem of decreased accuracy caused by iterative oscillations but also alleviate the impact of data sparseness on the accuracy of the algorithm, thereby achieving the effect of improving the accuracy of the algorithm. And using the Spark big data computing framework to realize the parallelization of the improved algorithm, to use RDD for iterative calculation, and to store calculation data in the iterative process in distributed memory to speed up the iteration. The Cartesian product operation in the improved FunkSVD algorithm is divided into blocks to realize parallel calculation, thereby improving the calculation speed of the algorithm. Experiments on three standard data sets in terms of accuracy, execution time, and speedup show that the SP-FD algorithm not only improves the recommendation accuracy, shortens the calculation interval compared to the traditional FunkSVD and several other algorithms but also shows good parallel performance in a cluster environment with multiple nodes. The analysis of experimental results shows that the SP-FD algorithm improves the accuracy and parallel computing capability of the algorithm, which is better than the traditional FunkSVD algorithm.

병렬처리를 이용한 대규모 동적 시스템의 최적제어 (Optimal Control of Large-Scale Dynamic Systems using Parallel Processing)

  • 박기홍
    • 제어로봇시스템학회논문지
    • /
    • 제5권4호
    • /
    • pp.403-410
    • /
    • 1999
  • In this study, a parallel algorithm has been developed that can quickly solve the optiaml control problem of large-scale dynamic systems. The algorithm adopts the sequential quadratic programming methods and achieves domain decomposition-type parallelism in computing sensitivities for search direction computation. A silicon wafer thermal process problem has been solved using the algorithm, and a parallel efficiency of 45% has been achieved with 16 processors. Practical methods have also been investigated in this study as a way to further speed up the computation time.

  • PDF

컴퓨터 기하학을 위한 병렬계산 (Parallel Computing For Computational Geometry)

  • 오승준
    • 전자통신동향분석
    • /
    • 제4권1호
    • /
    • pp.93-117
    • /
    • 1989
  • Computational Geometry is concerned with the design and analysis of computational algorithms which solve geometry problems. Geometry problems have a large number of applications areas such as pattern recognition, image processing, computer graphics, VLSI design and statistics since they involve inherently geometric problems for which efficient algorithms have to be developed. Several parallel algorithms, based on various parallel computation models, have been proposed for solving geometric problems. We review the current status of the parallel algorithms in computational geometry.

A Parallel Iterative Algorithm for Solving The Eigenvalue Problem of Symmetric matrices

  • Baik, Ran
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • 제4권2호
    • /
    • pp.99-110
    • /
    • 2000
  • This paper is devoted to the parallelism of a numerical matrix eigenvalue problem. The eigenproblem arises in a variety of applications, including engineering, statistics, and economics. Especially we try to approach the industrial techniques from mathematical modeling. This paper has developed a parallel algorithm to find all eigenvalues. It is contributed to solve a specific practical problem, a vibration problem in the industry. Also we compare the runtime between the serial algorithm and the parallel algorithm for the given problems.

  • PDF

New GPU computing algorithm for wind load uncertainty analysis on high-rise systems

  • Wei, Cui;Luca, Caracoglia
    • Wind and Structures
    • /
    • 제21권5호
    • /
    • pp.461-487
    • /
    • 2015
  • In recent years, the Graphics Processing Unit (GPU) has become a competitive computing technology in comparison with the standard Central Processing Unit (CPU) technology due to reduced unit cost, energy and computing time. This paper describes the derivation and implementation of GPU-based algorithms for the analysis of wind loading uncertainty on high-rise systems, in line with the research field of probability-based wind engineering. The study begins by presenting an application of the GPU technology to basic linear algebra problems to demonstrate advantages and limitations. Subsequently, Monte-Carlo integration and synthetic generation of wind turbulence are examined. Finally, the GPU architecture is used for the dynamic analysis of three high-rise structural systems under uncertain wind loads. In the first example the fragility analysis of a single degree-of-freedom structure is illustrated. Since fragility analysis employs sampling-based Monte Carlo simulation, it is feasible to distribute the evaluation of different random parameters among different GPU threads and to compute the results in parallel. In the second case the fragility analysis is carried out on a continuum structure, i.e., a tall building, in which double integration is required to evaluate the generalized turbulent wind load and the dynamic response in the frequency domain. The third example examines the computation of the generalized coupled wind load and response on a tall building in both along-wind and cross-wind directions. It is concluded that the GPU can perform computational tasks on average 10 times faster than the CPU.

인터넷 기반의 병렬 컴퓨팅을 위한 사용자 라이브러리 설계 및 성능 분석 (Design and Analysis of User's Libraries for Parallel Computing based on the Internet)

  • 신필섭;정준목;맹혜선;홍원기;김신덕
    • 한국정보처리학회논문지
    • /
    • 제6권11호
    • /
    • pp.2932-2945
    • /
    • 1999
  • As the Internet and Java technology have been growing up, parallel processing approach to utilize those idle resources connected to the Internet has become quite attractive. In this paper, JICE(Java Internet Computing Environment) was implemented as a parallel computing platform based on the Internet using multithreading and RMI mechanisms provided by Java. The basic model of JICE is constructed as three components, such as a client, a set of workers, and a broker. A worker communicates with other workers via a globally shared memory system. It provides users with master-slave programming model and a collection of library functions. The basic model of JICE is also extended as a multimanaging system. This multimanaging system is evaluated by analysis to show its effectiveness. According to numerical analysis and experiments with several benchmarks, it is shown that the performance of basic model depends on the shared memory reference ratio and user's library is a quite promising.

  • PDF

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권6호
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.