• Title/Summary/Keyword: Parallel Performance

Search Result 2,867, Processing Time 0.027 seconds

Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder (JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증)

  • Kim, Yong-Min;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.2
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

Parallel Procedure and Evaluation of Parallel Performance of Impact Simulation Based on Two-Step Eulerian Scheme (Two-Step Eulerian 기법에 기반 한 충돌 해석의 병렬처리 및 병렬효율 평가)

  • Kim Seung-Jo;Lee Min-Hyung;Paik Seung-Hoon
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.30 no.10 s.253
    • /
    • pp.1320-1327
    • /
    • 2006
  • Parallel procedure and performance of two-step Eulerian code have not been reported sufficiently yet even though it was developed and utilized widely in the impact simulation. In this study, parallel strategy of two-step Eulerian code was proposed and described in detail. The performance was evaluated in the self-made linux cluster computer. Compared with commercial code, a relatively good performance is achieved. Through the performance evaluation of each computation stage, remap is turned out to be the most time consuming part among the other part such as FE processing, communication, time marching etc.

PARALLEL PERFORMANCE OF MULTISPLITTING METHODS WITH PREWEIGHTING

  • Han, Yu-Du;Yun, Jae-Heon
    • Journal of the Korean Mathematical Society
    • /
    • v.49 no.4
    • /
    • pp.805-827
    • /
    • 2012
  • In this paper, we first study convergence of a special type of multisplitting methods with preweighting, and then we provide some comparison results of those multisplitting methods. Next, we propose both parallel implementation of an SOR-like multisplitting method with preweighting and an application of the SOR-like multisplitting method with preweighting to a parallel preconditioner of Krylov subspace method. Lastly, we provide parallel performance results of both the SOR-like multisplitting method with preweighting and Krylov subspace method with the parallel preconditioner to evaluate parallel efficiency of the proposed methods.

Magnetic Levitation Control Using The Parallel Fuzzy Controller (병렬 퍼지-PID 제어기를 이용한 자기부상 제어)

  • Kim, Myoung-Gun;Kim, Jong-Moon;Choi, Young-Kiu
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.352-354
    • /
    • 2004
  • In this paper, a parallel fuzzy controller for one degree of freedom magnetic levitation is designed and its performance is compared with the performance of a PID controller. Input, output scaling factor of fuzzy controller and gain of PID controller were tuned using the GA algorithm. The designed controllers are validated by numerical simulations. So it's shown that parallel fuzzy controller can give the better performance for the plant than PID controller.

  • PDF

ePRO-OMP: A Tool for Performance/Energy PRofiler and Analyzer for OpenMP Applications (ePRO-OMP: OpenMP 응용 프로그램의 성능 및 에너지 분석 도구)

  • Lee, Young-Ho;Kim, Jihong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.5
    • /
    • pp.287-293
    • /
    • 2011
  • As chip multiprocessors have been widely adopted in embedded systems, achieving both high performance and low power consumptions of parallel applications becomes challenging. In order to meet these requirements, it is crucial for developers to analyze the performance and energy consumption of parallel applications. In this paper, we propose a tool for profiling and optimizing the performance and energy consumption of OpenMP applications (energy PROfiler and analyzer for OpenMP: ePRO-OMP). The main advantage of ePRO-OMP is that it can analyze both the performance and energy consumption of each parallel region of an OpenMP application, which can help developers find the bottleneck of parallel applications in detail.

Master-Slave type DC-DC Converters Parallel Operation by ZCT method (ZCT방식의 master-slave형 DC-DC컨버터 병렬운전)

  • 박상은;송승찬;진정태;이기홍;성세진
    • Proceedings of the KIPE Conference
    • /
    • 1999.07a
    • /
    • pp.655-658
    • /
    • 1999
  • In this paper, Parallel operation of two DC-DC converters which we have ever done before need two CTs to do load current sharing. However, we have proposed a new method called ZCT method that can share load current with only a CT as doing parallel operation two converters with same converter capacity. To confirm parallel performance by a proposed DC-DC converter parallel operation method, we have done computer simulation and experiment. It is certain that we have showed to achieve two converters current sharing performance efficiently through simulation and experiment at result.

  • PDF

Analysis of Strategies for Installing Parallel Stations in Assembly Systems

  • Leung, John W.K.;Lai, K.K.
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.2
    • /
    • pp.117-122
    • /
    • 2005
  • An assembly system (AS), a valuable tool for mass production, is generally composed of a number of workstations and a transport system. While the workstations perform some preplanned operations, the transport system moves the assemblies by special designed pallets from one station to another. One common problem associated with automatic assembly systems is that some assembly operations may have relatively long cycle times. As a consequence, the productivity, as determined by the operations with the longest cycle time, can be reduced significantly. Therefore, special forms of parallel workstations were developed to improve the performance of an assembly system. In this paper, three most commonly used parallel stations: on-line, off-line and tunnel-gated stations in a free transfer assembly system are studied via discrete event simulation. Our findings revealed that the off-line parallel system has the best performance because the two independent parallel stations can lower the buffer requirement; reduce the sensitivity to variability of processing time and balance of a line. On-line parallel systems were found to have a relatively poor performance, because the operations of two parallel stations block each other, and higher buffer capacity is required to achieve similar capacity. The tunnel-gated system was more efficient than the on-line system since the first parallel station can operate independently. More importantly, we have quantified the productivity of the three different strategies mentioned. Engineers can choose the optimal strategies for installing parallel stations under their working environment.

The Implementation of Fast Object Recognition Using Parallel Processing on CPU and GPU (CPU와 GPU의 병렬 처리를 이용한 고속 물체 인식 알고리즘 구현)

  • Kim, Jun-Chul;Jung, Young-Han;Park, Eun-Soo;Cui, Xue-Nan;Kim, Hak-Il;Huh, Uk-Youl
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.5
    • /
    • pp.488-495
    • /
    • 2009
  • This paper presents a fast feature extraction method for autonomous mobile robots utilizing parallel processing and based on OpenMP, SSE (Streaming SIMD Extension) and CUDA programming. In the first step on CPU version, the algorithms and codes are optimized and then implemented by parallel processing. The parallel algorithms are debugged to maintain the same level of performance and the process for extracting key points and obtaining dominant orientation with respect to key points is parallelized. After extraction, a parallel descriptor via SSE instructions is constructed. And the GPU version also implemented by parallel processing using CUDA based on the SIFT. The GPU-Parallel descriptor achieves an acceleration up to five times compared with the CPU-Parallel descriptor, but it shows the lower performance than CPU version. CPU version also speed-up the four and half times compared with the original SIFT while maintaining robust performance.

Search scheme for parallel spatial index (병렬 공간 색인을 위한 검색 기법)

  • Seo, Young-Duk
    • Journal of Korea Spatial Information System Society
    • /
    • v.7 no.2 s.14
    • /
    • pp.81-89
    • /
    • 2005
  • Declustering and parallel index structures are important research areas to improve a performance of databases. Previous researches proposed several distribution schemes for parallel R-trees, however there is no search schemes to be suitable for the index. In this paper, we propose schemes to improve the performance of range queries for distribute parallel indexes. The proposed schemes use the features that a parallel disk can read multiple nodes from various disks. The proposed schemes are verified using various implementations and performance evaluations. We propose new schemes which can read multiple nodes from multiple disks in contrast that to the previous schemes which can read a node from disk. The experimental evaluation shows that the proposed schemes give us the performance improvement by 40% from the previous researches.

  • PDF

Performance Comparison of Parallel Programming Frameworks in Digital Image Transformation

  • Shin, Woochang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.3
    • /
    • pp.1-7
    • /
    • 2019
  • Previously, parallel computing was mainly used in areas requiring high computing performance, but nowadays, multicore CPUs and GPUs have become widespread, and parallel programming advantages can be obtained even in a PC environment. Various parallel programming frameworks using multicore CPUs such as OpenMP and PPL have been announced. Nvidia and AMD have developed parallel programming platforms and APIs for program developers to take advantage of multicore GPUs on their graphics cards. In this paper, we develop digital image transformation programs that runs on each of the major parallel programming frameworks, and measure the execution time. We analyze the characteristics of each framework through the execution time comparison. Also a constant K indicating the ratio of program execution time between different parallel computing environments is presented. Using this, it is possible to predict rough execution time without implementing a parallel program.