• Title/Summary/Keyword: parallel/distribute computing

Search Result 11, Processing Time 0.026 seconds

Work Allocation Methods and Performance Comparisons on the Virtual Parallel Computing System based on the IBM Aglets (IBM Aglets를 기반으로 하는 가상 병렬 컴퓨팅 시스템에서 작업 할당 기법과 성능 비교)

  • Kim, Kyong-Ha;Kim, Young-Hak;Oh, Gil-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.4
    • /
    • pp.411-422
    • /
    • 2002
  • Recently, there have been active researches about the VPCS (Virtual Parallel Computing System) based on multiple agents. The PVCS uses personal computers or workstations that are dispersed all over the internet, rather than a high-cost supercomputer, to solve complex problems that require a huge number of calculations. It can be made up with either homogeneous or heterogeneous computers, depending on resources available on the internet. In this paper, we propose a new method in order to distribute worker agents and work packages efficiently on the VPCS based on the IBM Aglets. The previous methods use mainly the master-slave pattern for distributing worker agents and work packages. However, in these methods the workload increases dramatically at the central master as the number of agents increases. As a solution to this problem, our method appoints worker agents to distribute worker agents and workload packages. The proposed method is evaluated in several ways on the VPCS, and its results are improved to be worthy of close attention as compared with the previous ones.

New GPU computing algorithm for wind load uncertainty analysis on high-rise systems

  • Wei, Cui;Luca, Caracoglia
    • Wind and Structures
    • /
    • v.21 no.5
    • /
    • pp.461-487
    • /
    • 2015
  • In recent years, the Graphics Processing Unit (GPU) has become a competitive computing technology in comparison with the standard Central Processing Unit (CPU) technology due to reduced unit cost, energy and computing time. This paper describes the derivation and implementation of GPU-based algorithms for the analysis of wind loading uncertainty on high-rise systems, in line with the research field of probability-based wind engineering. The study begins by presenting an application of the GPU technology to basic linear algebra problems to demonstrate advantages and limitations. Subsequently, Monte-Carlo integration and synthetic generation of wind turbulence are examined. Finally, the GPU architecture is used for the dynamic analysis of three high-rise structural systems under uncertain wind loads. In the first example the fragility analysis of a single degree-of-freedom structure is illustrated. Since fragility analysis employs sampling-based Monte Carlo simulation, it is feasible to distribute the evaluation of different random parameters among different GPU threads and to compute the results in parallel. In the second case the fragility analysis is carried out on a continuum structure, i.e., a tall building, in which double integration is required to evaluate the generalized turbulent wind load and the dynamic response in the frequency domain. The third example examines the computation of the generalized coupled wind load and response on a tall building in both along-wind and cross-wind directions. It is concluded that the GPU can perform computational tasks on average 10 times faster than the CPU.

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

Efficient Workload Distribution of Photomosaic Using OpenCL into a Heterogeneous Computing Environment (이기종 컴퓨팅 환경에서 OpenCL을 사용한 포토모자이크 응용의 효율적인 작업부하 분배)

  • Kim, Heegon;Sa, Jaewon;Choi, Dongwhee;Kim, Haelyeon;Lee, Sungju;Chung, Yongwha;Park, Daihee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.8
    • /
    • pp.245-252
    • /
    • 2015
  • Recently, parallel processing methods with accelerator have been introduced into a high performance computing and a mobile computing. The photomosaic application can be parallelized by using inherent data parallelism and accelerator. In this paper, we propose a way to distribute the workload of the photomosaic application into a CPU and GPU heterogeneous computing environment. That is, the photomosaic application is parallelized using both CPU and GPU resource with the asynchronous mode of OpenCL, and then the optimal workload distribution rate is estimated by measuring the execution time with CPU-only and GPU-only distribution rates. The proposed approach is simple but very effective, and can be applied to parallelize other applications on a CPU and GPU heterogeneous computing environment. Based on the experimental results, we confirm that the performance is improved by 141% into a heterogeneous computing environment with the optimal workload distribution compared with using GPU-only method.

Indivisible load scheduling applied to Linear Programming (선형계획법을 적용한 임의 분할 불가능한 부하 분배계획)

  • Son, Kyung-Ho;Lee, Dal-Ho;Kim, Hyoung-Joog
    • 한국정보통신설비학회:학술대회논문집
    • /
    • 2005.08a
    • /
    • pp.382-387
    • /
    • 2005
  • There are many studies on arbitrarily divisible load scheduling problem in a distributed computing network consisting of processors interconnected through communication links. It is not efficient to arbitrarily distribute the load that comes into the system. In this paper, how to schedule in case that arbitrarily indivisible load comes into the system is studied. Also, the cases of the divisible load mixed with the indivisible load that come into network were dealt with optimal load distribution in parallel processing system by scheduling applied to linear programming.

  • PDF

A New Fast Algorithm for Short Range Force Calculation (근거리 힘 계산의 새로운 고속화 방법)

  • Lee, Sang-Hwan;Ahn, Cheol-O
    • 유체기계공업학회:학술대회논문집
    • /
    • 2006.08a
    • /
    • pp.383-386
    • /
    • 2006
  • In this study, we propose a new fast algorithm for calculating short range forces in molecular dynamics, This algorithm uses a new hierarchical tree data structure which has a high adaptiveness to the particle distribution. It can divide a parent cell into k daughter cells and the tree structure is independent of the coordinate system and particle distribution. We investigated the characteristics and the performance of the tree structure according to k. For parallel computation, we used orthogonal recursive bisection method for domain decomposition to distribute particles to each processor, and the numerical experiments were performed on a 32-node Linux cluster. We compared the performance of the oct-tree and developed new algorithm according to the particle distributions, problem sizes and the number of processors. The comparison was performed sing tree-independent method and the results are independent of computing platform, parallelization, or programming language. It was found that the new algorithm can reduce computing cost for a large problem which has a short search range compared to the computational domain. But there are only small differences in wall-clock time because the proposed algorithm requires much time to construct tree structure than the oct-tree and he performance gain is small compared to the time for single time step calculation.

  • PDF

A Hierarchical Server Structure for Parallel Location Information Search of Mobile Hosts (이동 호스트의 병렬적 위치 정보 탐색을 위한 서버의 계층 구조)

  • Jeong, Gwang-Sik;Yu, Heon-Chang;Hwang, Jong-Seon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.1_2
    • /
    • pp.80-89
    • /
    • 2001
  • The development in the mobile computing systems have arisen new and previously unforeseen problems, such as problems in information management of mobile host, disconnection of mobile host and low bandwidths of wireless communications. Especially, location information management strategy of mobile host results in an increased overhead in mobile computing systems. Due to the mobility of the mobiles host, the changes in the mobile host's address depends on the mobile host's location, and is maintained by mapping physical address on virtual address, Since previously suggested several strategies for mapping method between physical address and virtual address did not tackle the increase of mobile host and distribution of location information, it was not able to support the scalability in mobile computing systems. Thus, to distribute the location inrormation, we propose an advanced n-depth LiST (Location information Search Tree) and the parallel location search and update strategy based on the advanced n-depth LiST. The advanced n-depth LiST is logically a hierarchical structure that clusters the location information server by ring structure and reduces the location information search and update cost by parallel seatch and updated method. The experiment shows that even though the distance of two MHs that communicate with each other is large, due to the strnctural distribution of location information, advanced n-depth LiST results in good performance. Moreover, despite the reduction in the location information search cost, there was no increase in the location information update cost.

  • PDF

Multi-platform Visualization System for Earth Environment Data (지구환경 데이터를 위한 멀티플랫폼 가시화 시스템)

  • Jeong, Seokcheol;Jung, Seowon;Kim, Jongyong;Park, Sanghun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.21 no.3
    • /
    • pp.36-45
    • /
    • 2015
  • It is important subject of research in engineering and natural science field that creating continuing high-definition image from very large volume data. The necessity of software that helps analyze useful information in data has improved by effectively showing visual image information of high resolution data with visualization technique. In this paper, we designed multi-platform visualization system based on client-server to analyze and express earth environment data effectively constructed with observation and prediction. The visualization server comprised of cluster transfers data to clients through parallel/distributed computing, and the client is developed to be operated in various platform and visualize data. In addition, we aim user-friendly program through multi-touch, sensor and have made realistic simulation image with image-based lighting technique.

Scalable and Dynamically Reconfigurable Internet Service System Based on Clustered System (확장과 동적재구성 가능한 클러스터기반의 인터넷서비스 시스템)

  • Kim Dong Keun;Park Se Myung
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.10
    • /
    • pp.1400-1411
    • /
    • 2004
  • Recently, explosion of internet user requires fundamental changes on the architecture of Web service system, from single server system to clustered server system, in parallel with the effort for improving the scalability of the single internet server system. But current cluster-based server systems are dedicated to the single application, for example, One-IP server system. One-IP server system has a clustered computing node with the same function and tries to distribute each request based on the If to the clustered node evenly. In this paper, we implemented the more useful application service platform. It works on shared clustered server(back-end server) with an application server(front-end server) for a particular service. An application server provides a particular service at a low load by itself, but as the load increases, it reconfigures itself with one or more available server from the shared cluster and distributes the load on selected server evenly We used PVM for an effective management of the clustered server. We found the implemented application service platform provides more stable and scalable operation characteristics and has remarkable performance improvement on the dynamic load changes.

  • PDF

Parallel Video Processing Using Divisible Load Scheduling Paradigm

  • Suresh S.;Mani V.;Omkar S. N.;Kim H.J.
    • Journal of Broadcast Engineering
    • /
    • v.10 no.1 s.26
    • /
    • pp.83-102
    • /
    • 2005
  • The problem of video scheduling is analyzed in the framework of divisible load scheduling. A divisible load can be divided into any number of fractions (parts) and can be processed/computed independently on the processors in a distributed computing system/network, as there are no precedence relationships. In the video scheduling, a frame can be split into any number of fractions (tiles) and can be processed independently on the processors in the network, and then the results are collected to recompose the single processed frame. The divisible load arrives at one of the processors in the network (root processor) and the results of the computation are collected and stored in the same processor. In this problem communication delay plays an important role. Communication delay is the time to send/distribute the load fractions to other processors in the network. and the time to collect the results of computation from other processors by the root processors. The objective in this scheduling problem is that of obtaining the load fractions assigned to each processor in the network such that the processing time of the entire load is a minimum. We derive closed-form expression for the processing time by taking Into consideration the communication delay in the load distribution process and the communication delay In the result collection process. Using this closed-form expression, we also obtain the optimal number of processors that are required to solve this scheduling problem. This scheduling problem is formulated as a linear pro-gramming problem and its solution using neural network is also presented. Numerical examples are presented for ease of understanding.