• Title/Summary/Keyword: Parallel Computer

Search Result 1,772, Processing Time 0.029 seconds

TBBench: A Micro-Benchmark Suite for Intel Threading Building Blocks

  • Marowka, Ami
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.331-346
    • /
    • 2012
  • Task-based programming is becoming the state-of-the-art method of choice for extracting the desired performance from multi-core chips. It expresses a program in terms of lightweight logical tasks rather than heavyweight threads. Intel Threading Building Blocks (TBB) is a task-based parallel programming paradigm for multi-core processors. The performance gain of this paradigm depends to a great extent on the efficiency of its parallel constructs. The parallel overheads incurred by parallel constructs determine the ability for creating large-scale parallel programs, especially in the case of fine-grain parallelism. This paper presents a study of TBB parallelization overheads. For this purpose, a TBB micro-benchmarks suite called TBBench has been developed. We use TBBench to evaluate the parallelization overheads of TBB on different multi-core machines and different compilers. We report in detail in this paper on the relative overheads and analyze the running results.

Iterative mesh partitioning strategy for improving the efficiency of parallel substructure finite element computations

  • Hsieh, Shang-Hsien;Yang, Yuan-Sen;Tsai, Po-Liang
    • Structural Engineering and Mechanics
    • /
    • v.14 no.1
    • /
    • pp.57-70
    • /
    • 2002
  • This work presents an iterative mesh partitioning approach to improve the efficiency of parallel substructure finite element computations. The proposed approach employs an iterative strategy with a set of empirical rules derived from the results of numerical experiments on a number of different finite element meshes. The proposed approach also utilizes state-of-the-art partitioning techniques in its iterative partitioning kernel, a cost function to estimate the computational cost of each submesh, and a mechanism that adjusts element weights to redistribute elements among submeshes during iterative partitioning to partition a mesh into submeshes (or substructures) with balanced computational workloads. In addition, actual parallel finite element structural analyses on several test examples are presented to demonstrate the effectiveness of the approach proposed herein. The results show that the proposed approach can effectively improve the efficiency of parallel substructure finite element computations.

Optimal Server Allocation to Parallel Queueing Systems by Computer Simulation (컴퓨터 시뮬레이션을 이용한 병렬 대기행렬 시스템의 최적 서버 배치 방안)

  • Park, Jin-Won
    • Journal of the Korea Society for Simulation
    • /
    • v.24 no.3
    • /
    • pp.37-44
    • /
    • 2015
  • A queueing system with 2 parallel workstations is common in the field. Typically, the workstations have different features in terms of the inter arrival times of customers and the service times for the customers. Computer simulation study on the optimal server allocation for parallel heterogeneous queueing systems with fixed number of identical servers is presented in this paper. The queueing system is optimized with respect to minimizing the weighted system time of the customers served by 2 parallel workstations. The system time formula for the M/M/c systems in Kendall's notation is known. Thus, we first compute the optimal allocation for parallel M/M/c systems, comparing the results with those from the computer simulation experiments, and have the same results. The CETI rule is devised through optimizing M/M/c cases, which allocates the servers based on Close or Equal Traffic Intensities between workstations. Traffic intensity is defined as the arrival rate divided by the service rate times the number of servers. The CETI rule is shown to work for M/G/c, G/M/c queueing systems by numerous computer simulation experiments, even if the rule cannot be proven analytically. However, the CETI rule is shown not to work for some of G/G/c systems.

EPR : Enhanced Parallel R-tree Indexing Method for Geographic Information System (EPR : 지리 정보 시스템을 위한 향상된 병렬 R-tree 색인 기법)

  • Lee, Chun-Geun;Kim, Jeong-Won;Kim, Yeong-Ju;Jeong, Gi-Dong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.9
    • /
    • pp.2294-2304
    • /
    • 1999
  • Our research purpose in this paper is to improve the performance of query processing in GIS(Geographic Information System) by enhancing the I/O performance exploiting parallel I/O and efficient disk access. By packing adjacent spatial data, which are very likely to be referenced concurrently, into one block or continuous disk blocks, the number of disk accesses and the disk access overhead for query processing can be decreased, and this eventually leads to the I/O time decrease. So, in this paper, we proposes EPR(Enhanced Parallel R-tree) indexing method which integrates the parallel I/O method of the previous Parallel R-tree method and a packing-based clustering method. The major characteristics of EPR method are as follows. First, EPR method arranges spatial data in the increasing order of proximity by using Hilbert space filling curve, and builds a packed R-tree by bottom-up manner. Second, with packing-based clustering in which arranged spatial data are clustered into continuous disk blocks, EPR method generates spatial data clusters. Third, EPR method distributes EPR index nodes and spatial data clusters on multiple disks through round-robin striping. Experimental results show that EPR method achieves up to 30% or more gains over PR method in query processing speed. In particular, the larger the size of disk blocks is and the smaller the size of spatial data objects is, the better the performance of query processing by EPR method is.

  • PDF

New Parallel Mechanism for Biped Robots (병렬형 다리 구조를 가진 2족 보행 로봇의 설계 및 제어)

  • Yoon, Jung-Han;Yeon, Je-Sung;Kwon, O-Hung;Park, Jong-Hyeon
    • Proceedings of the KSME Conference
    • /
    • 2004.04a
    • /
    • pp.810-815
    • /
    • 2004
  • In this paper, we propose new parallel mechanism of a 3 dimensional biped robot whose each leg is composed of two 3-dof parallel platforms linked serially. This proposed parallel mechanism is able to move freely in the man-made environment and is applied to various fields, such as medical, welfare, and so on. And a total weight of each leg is expected to be lighter than serial linked leg. One side leg consists of a 3-dof orientation platform and 3-dof asymmetric parallel platform. The former consists of three active linear actuators and seven passive joints, and the latter of two active linear actuators, one active rotational actuator and eight passive joints. Thus, there are two kinds of parallel platforms each chain's elements and active joint's positions are different for the biped robot to move freely like a serial link without the kinematics constraints. The effectiveness and the performance of the proposed parallel mechanism and locomotion trajectory are shown in computer simulations with a 12-DOF parallel biped robot.

  • PDF

Design of QPSK Demodulator Using CMOS BPSK Receiver and Reflection-Type Phase Shifter (CMOS 기반 BPSK 수신기와 반사형 위상 천이기를 이용한 QPSK 복조기 설계)

  • Moon, Seong-Mo;Park, Dong-Hoon;Yu, Jong-Won;Lee, Moon-Que
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.20 no.8
    • /
    • pp.770-776
    • /
    • 2009
  • We propose and demonstrate an I/Q demodulator using four-port BPSK demodulator base on additive mixing and reflection-type phase shifter using hybrid technique. Previously, the conventional I/Q demodulator base on multiplicative or additive mixing method divides I/Q signal path from mixer to parallel-to-serial converter. In this paper, we propose new I/Q demodulator without dividing I/Q baseband signal path. The proposed schematic requires half size in implementation and half power consumption in baseband path compared with the conventional receiver. Also, the proposed receiver eliminates parallel-to-serial converter after data decoding. The proposed circuit has been successfully demodulated a QPSK signal with the L-band carrier frequency and 20 Mbps data rate.

Development of Mobile Volume Visualization System (모바일 볼륨 가시화 시스템 개발)

  • Park, Sang-Hun;Kim, Won-Tae;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.12 no.5
    • /
    • pp.286-299
    • /
    • 2006
  • Due to the continuing technical progress in the capabilities of modeling, simulation, and sensor devices, huge volume data with very high resolution are common. In scientific visualization, various interactive real-time techniques on high performance parallel computers to effectively render such large scale volume data sets have been proposed. In this paper, we present a mobile volume visualization system that consists of mobile clients, gateways, and parallel rendering servers. The mobile clients allow to explore the regions of interests adaptively in higher resolution level as well as specify rendering / viewing parameters interactively which are sent to parallel rendering server. The gateways play a role in managing requests / responses between mobile clients and parallel rendering servers for stable services. The parallel rendering servers visualize the specified sub-volume with rendering contexts from clients and then transfer the high quality final images back. This proposed system lets multi-users with PDA simultaneously share commonly interesting parts of huge volume, rendering contexts, and final images through CSCW(Computer Supported Cooperative Work) mode.

A Study on Parallel AES Cipher Algorithm based on Multi Processor (멀티프로세서 기반의 병렬 AES 암호 알고리즘에 관한 연구)

  • Park, Jung-Oh;Oh, Gi-Oug
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.171-181
    • /
    • 2012
  • This paper defines the AES password algorithm used as a symmetric-key-based password algorithm, and proposes the design of parallel password algorithm to utilize the resources of multi-core processor as much as possible. The proposed parallel password algorithm was confirmed for parallel execution of password computation by allocating the password algorithm according to the number of cores, and about 30% of performance increase compared to AES password algorithm. The encryption/decryption performance of the password algorithm was confirmed through binary comparative analysis tool, which confirmed that the binary results were the same for AES password algorithm and proposed parallel password algorithm, and the decrypted binary were also the same. The parallel password algorithm for multi-core environment proposed in this paper can be applied to authentication/payment of financial service in PC, laptop, server, and mobile environment, and can be utilized in the area that required high-speed encryption operation of large-sized data.

Construction of a CPU Cluster and Implementation of a 3-D Domain Decomposition Parallel FDTD Algorithm (CPU 클러스터 구축 및 3차원 공간분할 병렬 FDTD 알고리즘 구현)

  • Park, Sungmin;Chu, Kwang-Uk;Ju, Saehoon;Park, Yoon-Mi;Kim, Ki-Baek;Jung, Kyung-Young
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.25 no.3
    • /
    • pp.357-364
    • /
    • 2014
  • In this work, we construct a CPU cluster to implement a parallel finite-difference time domain(FDTD) algorithm for fast electromagnetic analyses. This parallel FDTD algorithm can reduce the computational time significantly and also analyze electrically larger structures, compared to a single FDTD counterpart. The parallel FDTD algorithm needs communication between neighboring processors, which is performed by the MPI(Message Passing Interface) library and a 3-D domain decomposition is employed to decrease the communication time between neighboring processors. Compared to a single-processor FDTD, the speed up factor of a-CPU-cluster-based parallel FDTD algorithm is investigated for the normal mode and the hypermode and finally analyze an electrically large concrete structure by the developed parallel algorithm.

Detecting the First Race in OpenMP Program with Nested Parallelism (내포 병렬성을 가지는 OpenMP 프로그램의 최초 경합 탐지)

  • Chon, Byoung-Gyu;Woo, Jong-Jung;Jun, Yong-Kee
    • The KIPS Transactions:PartA
    • /
    • v.8A no.3
    • /
    • pp.253-260
    • /
    • 2001
  • It is important to detect races for debugging shared-memoy parallel programs, because the races cause unintended nondeterministic program execution. Previous on-the-fly techniques to detect races can not guarantee the first race detection in nested parallel programs. Detecting the first race is important for debugging parallel programs, since the removal of the first race may make the next occurred races disappear. In this paper, we presents an on-the-fly detection technique to detect all of the first races through the reexecution of the debugged programs. We assume that the debugged parallel program may have one-way nested parallel programs. The number of reexecution is at the least the nesting depth of the program in the worst case. The space complexity is O(VT) and the time complexity to detect race in each access of access history is O(T), where V is number of shared variables and T is the maximum parallelism of the program. This efficiency of our technique in each execution is the same with the previous on-the-fly detection techniques. Therefore, this technique makes debugging parallel programs more effective and practical.

  • PDF