• Title/Summary/Keyword: Parallel Efficiency

검색결과 1,037건 처리시간 0.022초

재작업이 존재하는 이종병렬기계에서 생산효율을 위해 공정소요시간 단축을 목적으로 하는 작업할당 (Dispatching to Minimize Flow Time for Production Efficiency in Non-Identical Parallel Machines Environment with Rework)

  • 서정하;고효헌;김성식;백준걸
    • 대한산업공학회지
    • /
    • 제37권4호
    • /
    • pp.367-381
    • /
    • 2011
  • Reducing waste for the efficiency of production is becoming more important because of the rapidly changing market circumstances and the rising material and oil prices. The dispatching also has to consider the characteristic of production circumstance for the efficiency. The production circumstance has the non-identical parallel machines with rework rate since machines have different capabilities and deterioration levels in the real manufacturing field. This paper proposes a dispatching method, FTLR (Flow Time Loss Index with Rework Rate) for production efficiency. The goal of FTLR is to minimize flow time based on such production environments. FTLR predicts the flow time with rework rate. After assessing dominant position of expected flow time per each machine, FTLR performs dispatching to minimize flow time. Experiments compare various dispatch methods for evaluating FTLR with mean flow time, mean tardiness and max tardiness in queue.

그래프 컬러링과 OpenMP를 이용한 병렬 메쉬 스무딩 알고리즘의 성능 분석 (Performance Analysis of a Parallel Mesh Smoothing Algorithm using Graph Coloring and OpenMP)

  • 신명규;김지범
    • 전자공학회논문지
    • /
    • 제53권6호
    • /
    • pp.80-87
    • /
    • 2016
  • 본 논문에서는 그래프 컬러링과 OpenMP를 사용한 병렬 메쉬 스무딩 알고리즘을 제안하고 공유메모리 기반의 슈퍼컴퓨터를 이용하여 제안하는 병렬 메쉬 스무딩 알고리즘의 성능 분석을 수행하였다. 제안하는 병렬 메쉬 스무딩 알고리즘은 그래프 컬러링 방법을 통해 전체 메쉬를 여러 개의 독립적인 집합 (색깔)으로 나눈 후 각각의 독립적인 집합에 대하여 OpenMP 라이브러리를 사용하여 순차적으로 병렬 메쉬 스무딩을 수행하는 방법이다. 실험을 통하여 여러 가지 그래프 컬러링 방법과 색깔 순서 재배열 방법이 병렬 메쉬 스무딩의 효율성에 미치는 영향에 대해서 알아보았다. 또한, OpenMP의 루프 스케줄링 방법이 병렬 메쉬 스무딩의 효율성에 끼치는 영향에 대해서 알아보았다.

Efficient Parallel TLD on CPU-GPU Platform for Real-Time Tracking

  • Chen, Zhaoyun;Huang, Dafei;Luo, Lei;Wen, Mei;Zhang, Chunyuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권1호
    • /
    • pp.201-220
    • /
    • 2020
  • Trackers, especially long-term (LT) trackers, now have a more complex structure and more intensive computation for nowadays' endless pursuit of high accuracy and robustness. However, computing efficiency of LT trackers cannot meet the real-time requirement in various real application scenarios. Considering heterogeneous CPU-GPU platforms have been more popular than ever, it is a challenge to exploit the computing capacity of heterogeneous platform to improve the efficiency of LT trackers for real-time requirement. This paper focuses on TLD, which is the first LT tracking framework, and proposes an efficient parallel implementation based on OpenCL. In this paper, we firstly make an analysis of the TLD tracker and then optimize the computing intensive kernels, including Fern Feature Extraction, Fern Classification, NCC Calculation, Overlaps Calculation, Positive and Negative Samples Extraction. Experimental results demonstrate that our efficient parallel TLD tracker outperforms the original TLD, achieving the 3.92 speedup on CPU and GPU. Moreover, the parallel TLD tracker can run 52.9 frames per second and meet the real-time requirement.

Parallel Deblocking Filter Based on Modified Order of Accessing the Coding Tree Units for HEVC on Multicore Processor

  • Lei, Haiwei;Liu, Wenyi;Wang, Anhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권3호
    • /
    • pp.1684-1699
    • /
    • 2017
  • The deblocking filter (DF) reduces blocking artifacts in encoded video sequences, and thereby significantly improves the subjective and objective quality of videos. Statistics show that the DF accounts for 5-18% of the total decoding time in high-efficiency video coding. Therefore, speeding up the DF will improve codec performance, especially for the decoder. In view of the rapid development of multicore technology, we propose a parallel DF scheme based on a modified order of accessing the coding tree units (CTUs) by analyzing the data dependencies between adjacent CTUs. This enables the DF to run in parallel, providing accelerated performance and more flexibility in the degree of parallelism, as well as finer parallel granularity. We additionally solve the problems of variable privatization and thread synchronization in the parallelization of the DF. Finally, the DF module is parallelized based on the HM16.1 reference software using OpenMP technology. The acceleration performance is experimentally tested under various numbers of cores, and the results show that the proposed scheme is very effective at speeding up the DF.

엣지 디바이스에서의 병렬 프로그래밍 모델 성능 비교 연구 (A Performance Comparison of Parallel Programming Models on Edge Devices)

  • 남덕윤
    • 대한임베디드공학회논문지
    • /
    • 제18권4호
    • /
    • pp.165-172
    • /
    • 2023
  • Heterogeneous computing is a technology that utilizes different types of processors to perform parallel processing. It maximizes task processing and energy efficiency by leveraging various computing resources such as CPUs, GPUs, and FPGAs. On the other hand, edge computing has developed with IoT and 5G technologies. It is a distributed computing that utilizes computing resources close to clients, thereby offloading the central server. It has evolved to intelligent edge computing combined with artificial intelligence. Intelligent edge computing enables total data processing, such as context awareness, prediction, control, and simple processing for the data collected on the edge. If heterogeneous computing can be successfully applied in the edge, it is expected to maximize job processing efficiency while minimizing dependence on the central server. In this paper, experiments were conducted to verify the feasibility of various parallel programming models on high-end and low-end edge devices by using benchmark applications. We analyzed the performance of five parallel programming models on the Raspberry Pi 4 and Jetson Orin Nano as low-end and high-end devices, respectively. In the experiment, OpenACC showed the best performance on the low-end edge device and OpenSYCL on the high-end device due to the stability and optimization of system libraries.

PARALLEL PERFORMANCE OF MULTISPLITTING METHODS WITH PREWEIGHTING

  • Han, Yu-Du;Yun, Jae-Heon
    • 대한수학회지
    • /
    • 제49권4호
    • /
    • pp.805-827
    • /
    • 2012
  • In this paper, we first study convergence of a special type of multisplitting methods with preweighting, and then we provide some comparison results of those multisplitting methods. Next, we propose both parallel implementation of an SOR-like multisplitting method with preweighting and an application of the SOR-like multisplitting method with preweighting to a parallel preconditioner of Krylov subspace method. Lastly, we provide parallel performance results of both the SOR-like multisplitting method with preweighting and Krylov subspace method with the parallel preconditioner to evaluate parallel efficiency of the proposed methods.

CFDS 코드의 효율성 개선 (Efficiency Enhancement of CFDS Code)

  • 김재관;이정일;김종암;홍승규;이황섭;안창수
    • 한국전산유체공학회:학술대회논문집
    • /
    • 한국전산유체공학회 2005년도 춘계 학술대회논문집
    • /
    • pp.123-127
    • /
    • 2005
  • The numerical analyses of the complicated flows are widely attempted in these days. Because of the enormous demanding memory and calculation time, parallel processing is used for these problems. In order to obtain calculation efficiency, it is important to choose proper domain decomposition technique and numerical algorithm. In this research we enhanced the efficiency of the CFDS code developed by ADD, using parallel computation and newly developed numerical algorithms. For the huge amount of data transfer between blocks non-blocking method is used, and newly developed data transfer algorithm is used for non-aligned block interface. Recently developed RoeM scheme is adpoted as a spatial difference method, and AF-ADI and LU-SGS methods are used as a time integration method to enhance the convergence of the code. Analyses of the flows around the ONERA M6 wing and the high angle of attack missile configuration are performed to show the efficiency improvement.

  • PDF

PFC용 부스트 컨버터의 병렬화에 의한 효율 개선 (An Improvement Parallel to the Efficiency of Boost Converter for Power Factor Correction)

  • 전내석;장수형;전일영;박영산;안병원;이성근;김윤식
    • 한국마린엔지니어링학회:학술대회논문집
    • /
    • 한국마린엔지니어링학회 2001년도 추계학술대회 논문집(Proceeding of the KOSME 2001 Autumn Annual Meeting)
    • /
    • pp.120-124
    • /
    • 2001
  • A new technique for improving the efficiency of single-phase high-frequency boost converter is proposed. This converter includes an additional low-frequency boost converter which is connected to the main high-frequency switching device in parallel. The additional converter is controlled at lower frequency. Most of the current flows in the low-frequency switch and so, high-frequency switching loss is greatly reduced accordingly Both switching device are controlled by a simple method; each controller consists of a one-shot multivibrator, a comparator and an AND gate. The converter works cooperatively in high efficiency and acts as if it were a conventional high-frequency boost converter with one switching device. The proposed method is verified by simulation. This paper describes the converter configuration and design, and discusses the steady-state performance concerning the switching loss reduction and efficiency improvement.

  • PDF

Optimal Design Methodology of Zero-Voltage-Switching Full-Bridge Pulse Width Modulated Converter for Server Power Supplies Based on Self-driven Synchronous Rectifier Performance

  • Cetin, Sevilay
    • Journal of Power Electronics
    • /
    • 제16권1호
    • /
    • pp.121-132
    • /
    • 2016
  • In this paper, high-efficiency design methodology of a zero-voltage-switching full-bridge (ZVS-FB) pulse width modulation (PWM) converter for server-computer power supply is discussed based on self-driven synchronous rectifier (SR) performance. The design approach focuses on rectifier conduction loss on the secondary side because of high output current application. Various-number parallel-connected SRs are evaluated to reduce high conduction loss. For this approach, the reliability of gate control signals produced from a self-driver is analyzed in detail to determine whether the converter achieves high efficiency. A laboratory prototype that operates at 80 kHz and rated 1 kW/12 V is built for various-number parallel combination of SRs to verify the proposed theoretical analysis and evaluations. Measurement results show that the best efficiency of the converter is 95.16%.

Cylinder Deactivation 엔진의 동작모드 전환 시 과도상태 공연비 제어 (Transient Air-fuel Ratio Control of the Cylinder Deactivation Engine during Mode Transition)

  • 권민수;이민광;김준수;선우명호
    • 한국자동차공학회논문집
    • /
    • 제19권2호
    • /
    • pp.26-34
    • /
    • 2011
  • Hybrid powertrain systems have been developed to improve the fuel efficiency of internal combustion engines. In the case of a parallel hybrid powertrain system, an engine and a motor are directly coupled. Because of the hardware configuration of the parallel hybrid system, friction and the pumping losses of internal combustion engines always exists. Such losses are the primary factors that result in the deterioration of fuel efficiency in the parallel-type hybrid powertrain system. In particular, the engine operates as a power consumption device during the fuel-cut condition. In order to improve the fuel efficiency for the parallel-type hybrid system, cylinder deactivation (CDA) technology was developed. Cylinder deactivation technology can improve fuel efficiency by reducing pumping losses during the fuel-cut driving condition. In a CDA engine, there are two operating modes: a CDA mode and an SI mode according to the vehicle operating condition. However, during the mode change from CDA to SI, a serious fluctuation of the air-fuel ratio can occur without adequate control. In this study, an air-fuel ratio control algorithm during the mode transition from CDA to SI was proposed. The control algorithm was developed based on the mean value CDA engine model. Finally, the performance of the control algorithm was validated by various engine experiments.