• Title/Summary/Keyword: 병렬시스템

Search Result 2,501, Processing Time 0.035 seconds

Design of Parallel Migration Method of Mobile Agents Using an Object Replication (객체 복제를 통한 이동 에이전트의 병렬 이주 방식 설계)

  • Kim, Kwang-Jong;Lee, Yon-Sik
    • The KIPS Transactions:PartD
    • /
    • v.11D no.2
    • /
    • pp.351-360
    • /
    • 2004
  • Most mobile agents are migrated to many mobile agent systems by the sequential node migration method. However. in this case, if some problems such as host's fault or obstacle etc. happened, mobile agent falls infinity walt or orphan states. Therefore, it is difficult to get an expectation effect as use of other distribution technologies because the required time for networking between nodes increases. And so, many researches have been performed to solve this problems. However, most of methods decide node migration based on passive routing table or detour hosts which have some problems. Actually, the researches for reducing the total required time for networking are insufficient yet. In this paper, to reduce the required time for networking of mobile agent we design an active routing table based on the information of implemented objects which are registered in the meta-table of naming agent. And also, for user's keyword, we propose an replication model that replicates many agent object according to the information and number of object references corresponding to meta-table. Replicated objects are migrated to mobile agent systems in parallel and it provides minimized required time for networking.

Study on LLVM application in Parallel Computing System (병렬 컴퓨팅 시스템에서 LLVM 응용 연구)

  • Cho, Jungseok;Cho, Doosan;Kim, Yongyeon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.395-399
    • /
    • 2019
  • In order to support various parallel computing systems, it is necessary to extend LLVM IR to more efficiently support vector / matrix and to design LLVM IR to machine code as a new algorithm. As shown in the IR example, RISC instruction generation is naturally generated because the RISC instruction is basically composed of the RISC instruction, and the vector instruction is also not supported. There is a need for new IR structures, command generation algorithms and related extensions to support vector / matrix more robustly. To do this, it is important to map each instruction in the LLVM IR to the appropriate instruction in the target architecture (vector / matrix) (instruction selection algorithm). It is necessary to understand the meaning of LLVM IR command, to compare the meaning of each instruction of the target architecture with syntax, and to select the instruction that matches the pattern to make mapping efficient.

High Speed and Robust Processor based on Parallelized Error Correcting Code Module (병렬화된 에러 보정 코드 모듈 기반 프로세서 속도 및 신뢰도 향상)

  • Kang, Myeong-jin;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.9
    • /
    • pp.1180-1186
    • /
    • 2020
  • One of the Embedded systems Tiny Processing Unit (TPU) usually acts in harsh environments like external shock or insufficient power. In these cases, data could be polluted, and cause critical problems. As a solution to data pollution, many embedded systems are using Error Correcting Code (ECC) to protect and restore data. However, ECC processing in TPU increases the overall processing time by increasing the time of instruction fetch which is the bottleneck. In this paper, we propose an architecture of parallelized ECC block to the reduce bottleneck of TPU. The proposed architecture results in the reduction of time 10% compared to the original model, although memory usage increased slightly. The test is evaluated with a matrix product that has various instructions. TPU with proposed parallelized ECC block shows 7% faster than the original TPU with ECC and was able to perform the proposed test accurately.

Space-Sharing Scheduling Schemes for NOW with Heterogeneous Computing Power (이질적 계산 능력을 가진 NOW를 위한 공간 공유 스케쥴링 기법)

  • Kim, Jin-Sung;Shim, Young-Chul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.7
    • /
    • pp.650-664
    • /
    • 2000
  • NOW(Network of Workstations) is considered as a platform for running parallel programs by many people. One of the fundamental problems that must be addressed to achieve good performance for parallel programs on NOW is the determination of efficient job scheduling policies. Currently most research on NOW assumes that all the workstations in the NOW have the same processing power. In this paper we consider a NOW in which workstations may have different computing power. We introduce 10 classes of space sharing-based scheduling policies that can be applied to the NOW with heterogeneous computing power. We compare the performance of these scheduling policies by using the simulator which accepts synthetically generated sequential and parallel workloads and generates the response time and waiting time of parallel jobs as performance indices of various scheduling strategies. Through the experiments the case when a parallel program is partitioned heterogeneously in proportion to the computing power of workstations is shown to have better performance than when a parallel program is partitioned into parallel processes of the same size. When the owner returns to the workstation which is executing a parallel process, the policy which just lowers the priority of the parallel process shows better performance than the one which migrates the parallel process to a new idle workstation. Among the policies which use heterogeneous partitioning and process priority lowering, the adaptive policy performed best across the wide range of inter-arrival time of parallel programs but when the load imbalance among parallel processes becomes very high, the modified adaptive policy performed better.

  • PDF

Comparison of Parallel Preconditioners for Solving Large Sparse Linear Systems on a Massively Parallel Machine (대형이산 행렬 시스템의 초대형병렬컴퓨터에서의 해법을 위한 병렬준비 행렬의 비교)

  • Ma, Sang-Baek
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.4
    • /
    • pp.535-542
    • /
    • 1995
  • In this paper we present two preconditioners for solving large sparse linear systems arising from elliptic partial differential equations on massively parallel machines, such as the CM-5. Most massively parallel machines do heavily rely on the message-passing for the interprocessor communications. but according to the current manufacturing standards the cost of communications is very high compared to that of floating point arithmetic computations. Due to this we need an algorithm which minimizes the amount of interprocessor communication on the massively parallel machines. We will show that Block SOR(Successive Over Relaxation) method coupled with the multi-coloring technique is one of such preconditioner on the massively parallel machines, by conducting experiments in the CM-5. Also, we implemented the ADI(Alternation Direction Implicit) method in the CM-5, which has been conventionally one of the most powerful parallel preconditioner. Our experiment shows that Block SOR method coupled with the multi-coloring technique could yield a speedup with 50% efficiency with the range of number of processors form 16 to 512 for a matrix with dimension 512x512. On the other hand, the ADI method shows a very poor performance.

  • PDF

Efficient Workload Distribution of Photomosaic Using OpenCL into a Heterogeneous Computing Environment (이기종 컴퓨팅 환경에서 OpenCL을 사용한 포토모자이크 응용의 효율적인 작업부하 분배)

  • Kim, Heegon;Sa, Jaewon;Choi, Dongwhee;Kim, Haelyeon;Lee, Sungju;Chung, Yongwha;Park, Daihee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.8
    • /
    • pp.245-252
    • /
    • 2015
  • Recently, parallel processing methods with accelerator have been introduced into a high performance computing and a mobile computing. The photomosaic application can be parallelized by using inherent data parallelism and accelerator. In this paper, we propose a way to distribute the workload of the photomosaic application into a CPU and GPU heterogeneous computing environment. That is, the photomosaic application is parallelized using both CPU and GPU resource with the asynchronous mode of OpenCL, and then the optimal workload distribution rate is estimated by measuring the execution time with CPU-only and GPU-only distribution rates. The proposed approach is simple but very effective, and can be applied to parallelize other applications on a CPU and GPU heterogeneous computing environment. Based on the experimental results, we confirm that the performance is improved by 141% into a heterogeneous computing environment with the optimal workload distribution compared with using GPU-only method.

A 2-Dimension Torus-based Genetic Algorithm for Multi-disk Data Allocation (2차원 토러스 기반 다중 디스크 데이터 배치 병렬 유전자 알고리즘)

  • 안대영;이상화;송해상
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.2
    • /
    • pp.9-22
    • /
    • 2004
  • This paper presents a parallel genetic algorithm for the Multi-disk data allocation problem an NP-complete problem. This problem is to find a method to distribute a Binary Cartesian Product File on disk-arrays to maximize parallel disk I/O accesses. A Sequential Genetic Algorithm(SGA), DAGA, has been proposed and showed the superiority to the other proposed methods, but it has been observed that DAGA consumes considerably lengthy simulation time. In this paper, a parallel version of DAGA(ParaDAGA) is proposed. The ParaDAGA is a 2-dimension torus-based Parallel Genetic Algorithm(PGA) and it is based on a distributed population structure. The ParaDAGA has been implemented on the parallel computer simulated on a single processor platform. Through the simulation, we study the impact of varying ParaDAGA parameters and compare the quality of solution derived by ParaDAGA and DAGA. Comparing the quality of solutions, ParaDAGA is superior to DAGA in all cases of configurations in less simulation time.

Application of LRBs for Reduction of Wind-Induced Responses of Coupled Shear Wall Structures (전단벽 구조물의 풍응답 저감을 위한 LRB의 적용)

  • Park, Yong-Koo;Kim, Hyun-Su;Ko, Hyun;Kim, Min-Gyun;Lee, Dong-Guen
    • Journal of Korean Association for Spatial Structures
    • /
    • v.11 no.1
    • /
    • pp.47-56
    • /
    • 2011
  • In general, shear walls are employed as lateral resistance system. Most of shear wall structures require openings in shear walls and thus shear walls are linked by floor slabs or coupling beams resulting in the coupled shear wall structures. In this study, an LRB (lead rubber bearing) was introduced in the middle of the coupling beam of the coupled shear wall structures and the wind-induced response reduction effect of this system was investigated. In order to evaluate the control performance of the proposed method, 20- and 30-story building structures were used as example structures and boundary nonlinear time history analyses have been performed using artificial wind excitation. Japanese vibration evaluation criteria was employed to evaluate whether the proposed system could improve the serviceability of the tall coupled shear wall structures under wind excitation. Based on analytical results, it has been shown that the proposed method that connects shear walls with LRBs can improve the wind-induced response control effect.

A Parallel Algorithm for Large DOF Structural Analysis Problems (대규모 자유도 문제의 구조해석을 위한 병렬 알고리즘)

  • Kim, Min-Seok;Lee, Jee-Ho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.23 no.5
    • /
    • pp.475-482
    • /
    • 2010
  • In this paper, an efficient two-level parallel domain decomposition algorithm is suggested to solve large-DOF structural problems. Each subdomain is composed of the coarse problem and local problem. In the coarse problem, displacements at coarse nodes are computed by the iterative method that does not need to assemble a stiffness matrix for the whole coarse problem. Then displacements at local nodes are computed by Multi-Frontal Sparse Solver. A parallel version of PCG(Preconditioned Conjugate Gradient Method) is developed to solve the coarse problem iteratively, which minimizes the data communication amount between processors to increase the possible problem DOF size while maintaining the computational efficiency. The test results show that the suggested algorithm provides scalability on computing performance and an efficient approach to solve large-DOF structural problems.

Implementation of Parallel Volume Rendering Using the Sequential Shear-Warp Algorithm (순차 Shear-Warp 알고리즘을 이용한 병렬볼륨렌더링의 구현)

  • Kim, Eung-Kon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1620-1632
    • /
    • 1998
  • This paper presents a fast parallel algorithm for volume rendering and its implementation using C language and MPI MasPar Programming Language) on the 4,096 processor MasPar MP-2 machine. This parallel algorithm is a parallelization hased on the Lacroute' s sequential shear - warp algorithm currently acknowledged to be the fastest sequential volume rendering algorithm. This algorithm reduces communication overheads by using the sheared space partition scheme and the load balancing technique using load estimates from the previous iteration, and the number of voxels to be processed by using the run-length encoded volume data structure.Actual performance is 3 to 4 frames/second on the human hrain scan dataset of $128\times128\times128$ voxels. Because of the scalability of this algorithm, performance of ]2-16 frames/sc.'cond is expected on the 16,384 processor MasPar MP-2 machine. It is expected that implementation on more current SIMD or MIMD architectures would provide 3O~60 frames/second on large volumes.

  • PDF