• Title/Summary/Keyword: Parallel computation

Search Result 594, Processing Time 0.028 seconds

Optimal Redundant Units and Load in Parallel Systems (병렬 시스템에서의 최적 중복부품수와 최적 부하수준)

  • 윤원영;김귀래
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.23 no.1
    • /
    • pp.97-107
    • /
    • 1998
  • This paper is concerned with a parallel system that sustains a time-independent load and consists of n components with exponential lifetimes. It is assumed that the total load is shared by the working components and the failures of components increase higher failure rates in the surviving components according to the relationship between the load and the fialure rates. The power rule model among several load-failure rate relationships is considered. We consider the system efficiency meausre as the expected profit earned by the system per unit time. The high load causes high gain but it also occurs frequent system failures. The expected profit per unit time is used as criterion to evaluate the system efficiency. The goal of system engineer is to determine the optimal load and redundant units maximizing the expected profit per unit time. First, the system reliability function is obtained and the optimization problem of the load-sharing parallel system is considered. Given the redundant units, the existence of the optimal load can be proved analytically and given the load, the optimal redundant units can be solved also analytically. The optimal load and redundant units are obtained simultaneously by numerical computation. Some numerical examples are studied.

  • PDF

A linear array SliM-II image processor chip (선형 어레이 SliM-II 이미지 프로세서 칩)

  • 장현만;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.2
    • /
    • pp.29-35
    • /
    • 1998
  • This paper describes architectures and design of a SIMD type parallel image processing chip called SliM-II. The chiphas a linear array of 64 processing elements (PEs), operates at 30 MHz in the worst case simulation and gives at least 1.92 GIPS. In contrast to existing array processors, such as IMAP, MGAP-2, VIP, etc., each PE has a multiplier that is quite effective for convolution, template matching, etc. The instruction set can execute an ALU operation, data I/O, and inter-PE communication simulataneously in a single instruction cycle. In addition, during the ALU/multiplier operation, SliM-II provides parallel move between the register file and on-chip memory as in DSP chips, SliM-II can greatly reduce the inter-PE communication overhead, due to the idea a sliding, which is a technique of overlapping inter-PE communication with computation. Moreover, the bandwidth of data I/O and inter-PE communication increases due to bit-parallel data paths. We used the COMPASS$^{TM}$ 3.3 V 0.6.$\mu$m standrd cell library (v8r4.10). The total number of transistors is about 1.5 muillions, the core size is 13.2 * 13.0 mm$^{2}$ and the package type is 208 pin PQ2 (Power Quad 2). The performance evaluation shows that, compared to a existing array processors, a proposed architeture gives a significant improvement for algorithms requiring multiplications.s.

  • PDF

A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices (Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계)

  • Kwon Do-All;Chung Tae-Sang
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.12
    • /
    • pp.723-731
    • /
    • 2005
  • The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.

Implementation of augmented reality using parallel structure (병렬구조를 이용한 증강현실 구현)

  • Park, Tae-Ryong;Heo, Hoon;Kwak, Jae-Chang
    • Journal of IKEEE
    • /
    • v.17 no.3
    • /
    • pp.371-377
    • /
    • 2013
  • This thesis propose an efficient parallel structure method for implementing a FAST and BRIEF algorithm based Augmented Reality. SURF algorithm that is well known in the object recognition algorithms is robust in object recognition. However, there is a disadvantage for real time operation because, SURF implementation requires a lot of computation. Therefore, we used a FAST and BRIEF algorithm for object recognition, and we improved Conventional Parallel Structure based on OpenMP Library. As a result, it achieves a 70%~100% improvement in execution time on the embedded system.

Parallelized Topology Design Optimization of the Frame of Human Powered Vessel (인력선 프레임의 병렬화 위상 최적설계)

  • Kim, Hyun-Suk;Lee, Ki-Myung;Kim, Min-Geun;Cho, Seon-Ho
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.47 no.1
    • /
    • pp.58-66
    • /
    • 2010
  • Topology design optimization is a method to determine the optimal distribution of material that yields the minimal compliance of structures, satisfying the constraint of allowable material volume. The method is easy to implement and widely used so that it becomes a powerful design tool in various disciplines. In this paper, a large-scale topology design optimization method is developed using the efficient adjoint sensitivity and optimality criteria methods. Parallel computing technique is required for the efficient topology optimization as well as the precise analysis of large-scale problems. Parallelized finite element analysis consists of the domain decomposition and the boundary communication. The preconditioned conjugate gradient method is employed for the analysis of decomposed sub-domains. The developed parallel computing method in topology optimization is utilized to determine the optimal structural layout of human powered vessel.

Thickness and clearance visualization based on distance field of 3D objects

  • Inui, Masatomo;Umezun, Nobuyuki;Wakasaki, Kazuma;Sato, Shunsuke
    • Journal of Computational Design and Engineering
    • /
    • v.2 no.3
    • /
    • pp.183-194
    • /
    • 2015
  • This paper proposes a novel method for visualizing the thickness and clearance of 3D objects in a polyhedral representation. The proposed method uses the distance field of the objects in the visualization. A parallel algorithm is developed for constructing the distance field of polyhedral objects using the GPU. The distance between a voxel and the surface polygons of the model is computed many times in the distance field construction. Similar sets of polygons are usually selected as close polygons for close voxels. By using this spatial coherence, a parallel algorithm is designed to compute the distances between a cluster of close voxels and the polygons selected by the culling operation so that the fast shared memory mechanism of the GPU can be fully utilized. The thickness/clearance of the objects is visualized by distributing points on the visible surfaces of the objects and painting them with a unique color corresponding to the thickness/clearance values at those points. A modified ray casting method is developed for computing the thickness/clearance using the distance field of the objects. A system based on these algorithms can compute the distance field of complex objects within a few minutes for most cases. After the distance field construction, thickness/clearance visualization at a near interactive rate is achieved.

Kinematic of 7 D.O.F. Exoskeleton-Type Master Arm Estimating Human Arm's Motion (사람팔의 운동을 추정하는 7자유도 골격형 마스터암의 기구학 연구)

  • Sin, Wan-Jae;Park, Jong-Hyun;Park, Jahng-Hyeon;Park, Jong-Oh
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.6 no.9
    • /
    • pp.796-802
    • /
    • 2000
  • A master-slave system for teleoperation is usually used to control the robor's motion on remote place such as abyss, outer space etc.. When the slave robot is a humanoid one, it can make a better performance if the configuration of the master arm is similar to that of the slave arm and of the human. The master arm proposed in this paper has a type to be put on the human arm, that is, the exoskeleton type, and has a combination of serial joint and parallel mechanism imitating the human's arm structure of muscles and bones, so called hybrid mechanism so that it can follow arm's movement effectively. But it is easy to solve the forward kinematis of the parallel structure because relating equations are implicit functions. In order to solve that, the virtual joint angle corresponding to human arm's joint is introduced and a sequential computation step is employed in calculating virtual joint angles and the posture of the end effector. Also validity is checked up through computational simulation.

  • PDF

Algorithmic GPGPU Memory Optimization

  • Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.391-406
    • /
    • 2014
  • The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

Simulation of 1993 East Sea Tsunami by Parallel FEM Model (병렬 FEM 모형을 이용한 1993년 동해 지진해일 시뮬레이션)

  • Hong, Sung-Jin;Choi, Byung-Ho;Pelinovsky, Efim
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.10 no.3 s.49
    • /
    • pp.35-45
    • /
    • 2006
  • The simulation of tsunami using detailed bathymetry and topography is required to establish the countermeasure of disaster mitigation and the tsunami hazard map. In this study, a simulation of the 1993 tsunami event in the East Sea using parallel finite element model, which is possible to simulate with suitable accuracy by the Beowulf parallel computation method, is performed to produce detailed features of coastal inundation. Results of simulation are compared with measured data. The evolution of statistic distribution of tsunami heights is studied numerically and the distribution functions of tsunami heights show a tendency to the log-normal curve along coastal area.

Numerical Study on the Mixing Enhancement of Parallel Supersonic-subsonic Wakes Using Wall Cavities (공동을 이용한 초음속-아음속 평행류에서의 혼합증대에 관한 수치적 연구)

  • Moon, Seong-Mok;Chang, Se-Myong;Kim, Chong-Am;Lee, Kyoung-Hoon;Kim, In-Soo;Ahn, Su-Hong;Woo, Kwan-Je
    • Proceedings of the Korean Society of Propulsion Engineers Conference
    • /
    • 2010.05a
    • /
    • pp.353-356
    • /
    • 2010
  • A computational study on the enhancement of parallel supersonic-subsonic mixing wakes is conducted and compared with available experimental data. The first aim of the present work is to show a direct comparison between numerical predictions and equivalent experimental data for the baseline case. The Pitot pressure distribution data are in good agreement between computation and experiment, and the results show that Menter's SST model with the compressibility correction gives the best performance. Further we investigate the effects of primary parameters such as the position of the cavity, and the arrangement of the cavity at the given flow condition.

  • PDF