• Title/Summary/Keyword: Speed scheduling

Search Result 284, Processing Time 0.023 seconds

A Study on High Speed LDPC Decoder Based on HSS (HSS기반의 고속 LDPC 복호기 연구)

  • Jung, Ji Won
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.5 no.3
    • /
    • pp.164-168
    • /
    • 2012
  • LDPC decoder architectures are generally classified into serial, parallel and partially parallel architectures. Conventional method of LDPC decoding in general give rise to a large number of computation operations, mass power consumption, and decoding delay. It is necessary to reduce the iteration numbers and computation operations without performance degradation. This paper studies Horizontal Shuffle Scheduling (HSS) algorithm. In the result, number of iteration is half than conventional algorithm without performance degradation. Finally, this paper present design methodology of high-speed LDPC decoder and confirmed its throughput is up to about 600Mbps.

An Efficient DVS Algorithm for Pinwheel Task Schedules

  • Chen, Da-Ren;Chen, You-Shyang
    • Journal of Information Processing Systems
    • /
    • v.7 no.4
    • /
    • pp.613-626
    • /
    • 2011
  • In this paper, we focus on the pinwheel task model with a variable voltage processor with d discrete voltage/speed levels. We propose an intra-task DVS algorithm, which constructs a minimum energy schedule for k tasks in O(d+k log k) time We also give an inter-task DVS algorithm with O(d+n log n) time, where n denotes the number of jobs. Previous approaches solve this problem by generating a canonical schedule beforehand and adjusting the tasks' speed in O(dn log n) or O($n^3$) time. However, the length of a canonical schedule depends on the hyper period of those task periods and is of exponential length in general. In our approach, the tasks with arbitrary periods are first transformed into harmonic periods and then profile their key features. Afterward, an optimal discrete voltage schedule can be computed directly from those features.

Heterogeneous Parallel Architecture for Face Detection Enhancement

  • Albssami, Aishah;Sharaf, Sanaa
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.193-198
    • /
    • 2022
  • Face Detection is one of the most important aspects of image processing, it considers a time-consuming problem in real-time applications such as surveillance systems, face recognition systems, attendance system and many. At present, commodity hardware is getting more and more heterogeneity in terms of architectures such as GPU and MIC co-processors. Utilizing those co-processors along with the existing traditional CPUs gives the algorithm a better chance to make use of both architectures to achieve faster implementations. This paper presents a hybrid implementation of the face detection based on the local binary pattern (LBP) algorithm that is deployed on both traditional CPU and MIC co-processor to enhance the speed of the LBP algorithm. The experimental results show that the proposed implementation achieved improvement in speed by 3X when compared to a single architecture individually.

Study on Accelerating Distributed ML Training in Orchestration

  • Su-Yeon Kim;Seok-Jae Moon
    • International journal of advanced smart convergence
    • /
    • v.13 no.3
    • /
    • pp.143-149
    • /
    • 2024
  • As the size of data and models in machine learning training continues to grow, training on a single server is becoming increasingly challenging. Consequently, the importance of distributed machine learning, which distributes computational loads across multiple machines, is becoming more prominent. However, several unresolved issues remain regarding the performance enhancement of distributed machine learning, including communication overhead, inter-node synchronization challenges, data imbalance and bias, as well as resource management and scheduling. In this paper, we propose ParamHub, which utilizes orchestration to accelerate training speed. This system monitors the performance of each node after the first iteration and reallocates resources to slow nodes, thereby speeding up the training process. This approach ensures that resources are appropriately allocated to nodes in need, maximizing the overall efficiency of resource utilization and enabling all nodes to perform tasks uniformly, resulting in a faster training speed overall. Furthermore, this method enhances the system's scalability and flexibility, allowing for effective application in clusters of various sizes.

Effective Scheduling Algorithm using Queue Separation and Packet Segmentation for Jumbo Packets (큐 분리 및 패킷 분할을 이용한 효율적인 점보패킷 스케쥴링 방법)

  • 윤빈영;고남석;김환우
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.9A
    • /
    • pp.663-668
    • /
    • 2003
  • With the advent of high speed networking technology, computers connected to the high-speed networks tend to consume more of their CPU cycles to process data. So one of the solutions to improve the performance of the computers is to reduce the CPU cycles for processing the data. As the consumption of the CPU cycles is increased in proportion to the number of the packets per second to be processed, reducing the number of the packets per second by increasing the length of the packet is one of the solutions. In order to meet this requirement, two types of jumbo packets such as jumbograms and jumbo frames have already been standardized or being discussed. In case that the jumbograms and general packets are interleaved and scheduled together in a router, the jumbogrms may deteriorate the QoS of the general packets due to the transfer delay. They also frequently exhaust the memory with storing the huge length of the packets. This produces the congestion state easily in the router that results in the loss of the packets. In this paper, we analyze the problems in processing the jumbo packets and suggest a noble solution to overcome the problems.

Development of an Analysis Program for Small Horizontal Wind Turbines Considering Side Furling and Optimal Torque Scheduling (사이드 펄링과 최적 토크스케줄을 고려한 소형 풍력터빈 해석 프로그램 개발)

  • Jang, Hyeon-Mu;Kim, Dong-Myeong;Paek, In-Su
    • Journal of the Korean Solar Energy Society
    • /
    • v.38 no.2
    • /
    • pp.15-31
    • /
    • 2018
  • A program to design a small capacity wind turbine blade is proposed in this study. The program is based on a matlab GUI environment and designed to perform blade design based on the blade element momentum theory. The program is different from other simulation tools available in a point that it can analyze the side-furling power regulation mechanism and also has an algorithm to find out optimal torque schedule above the rated wind speed region. The side-furling power regulation is used for small-capacity horizontal axis wind turbines because they cannot use active pitch control due to high cost which is commonly used for large-capacity wind turbine. Also, the torque schedule above the rated wind speed region should be different from that of the large capacity wind turbines because active pitching is not used. The program developed in this study was validated with the results with FAST which is the only program that can analyze the performance of side-furled wind turbines. For the validation a commercial 10 kW wind turbine data which is available in the literature was used. From the validation, it was found that the performance prediction from the proposed simple program is close to those from FAST. It was also found that the optimal torque scheduling from the proposed program was found to increase the turbine power substantially. Further experimental validation will be performed as a future work.

Minimum-Power Scheduling of Real-Time Parallel Tasks based on Load Balancing for Frequency-Sharing Multicore Processors (주파수 공유형 멀티코어 프로세서를 위한 부하균등화에 기반한 실시간 병렬 작업들의 최소 전력 스케줄링)

  • Lee, Wan Yeon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.6
    • /
    • pp.177-184
    • /
    • 2015
  • This paper proposes a minimum-power scheduling scheme of real-time parallel tasks while meeting deadlines of the real-time tasks on DVFS-enabled multicore processors. The proposed scheme first finds a floating number of processing cores to each task so that the computation load of all processing cores would be equalized. Next the scheme translates the found floating number of cores into a natural number of cores while maintaining the computation load of all cores unchanged, and allocates the translated natural number of cores to the execution of each task. The scheme is designed to minimize the power consumption of the frequency-sharing multicore processor operating with the same processing speed at an instant time. Evaluation shows that the scheme saves up to 38% power consumption of the previous method.

Computational Investigations of Adverse Effects of Deploying Spoilers on Airfoil Aerodynamic Characteristics (스포일러 동적 작동에 따른 에어포일 공력특성 역전현상 연구)

  • Chung, Hyoung-Seog
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.48 no.5
    • /
    • pp.335-342
    • /
    • 2020
  • Tailless aircraft designed for stealth efficiency uses spoilers instead of rudders for the directional control. When the spoiler is rapidly deployed, highly nonlinear and unsteady aerodynamic characteristics can be generated, resulting in adverse effects on aircraft flight performance. This paper investigates the aerodynamic characteristics of an airfoil with moving spoiler using dynamic mesh CFD technique. The effects of spoiler operation speed, mounting location, and deployment scheduling are analyzed to reduce the adverse effects of the spoiler's dynamic operation. The results shows that the adverse effects of dynamic spoiler can be reduced by appropriate selection of the spoiler mounting location and deployment scheduling.

Performance of LDPC Decoder of HSS based on Non-Uniform Quantization (비균일 양자화 방식 기반 HSS 방식의 LDPC 복호기 성능)

  • Kim, Tae-Hun;Kwon, Hae-Chan;Jung, Ji-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.9
    • /
    • pp.2029-2035
    • /
    • 2013
  • In this paper, we presented non-uniform quantization method for LDPC decoder specified in DVB-S2 standard. There are some problems in order to implement LDPC decoder in aspect to algorithm and implementation. In algorithm aspect, because of large number of iterations, LDPC decoding in general give rise to a large number of computation operations, mass power consumption, and decoding delay. Therefore, this paper studies Horizontal Shuffle Scheduling (HSS) algorithm which reduced iteration number without performance loss. In aspect of implementation, there are some solutions to improve the decoding speed, however this paper focused on non-uniform quantization which reduce the quantization bits of LDPC decoder. In simulation results, Decoding throughput of HSS LDPC decoder based on non-uniform quantization is 816Mbps and it improved 12% compared to conventional one.

Communication Schedule for GEN_BLOCK Redistribution (GEN_BLOCK간 재분산을 위한 통신 스케줄)

  • Yook, Hyun-Gyoo;Park, Myong-Soon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.5
    • /
    • pp.450-463
    • /
    • 2000
  • Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. GEN_BLOCK redistribution, which is redistribution between different GEN_BLOCKs, is essential for load balancing. However, prior research on redistribution has been focused on regular redistribution, such as redistribution between different CYCLIC(N)s. GEN_BLOCK redistribution is very different from regular redistribution. Message passing in regular redistribution involves repetitions of basic message passing patterns, while message passing for GEN_BLOCK redistribution shows locality. This paper proves that two optimal condition, reducing the number of communication steps and minimizing redistribution size, are essential in GEN_BLOCK redistribution. Additionally, by adding a relocation phase to list scheduling, we make an optimal scheduling algorithm for GEN_BLOCK redistribution. To evaluate the performance of the algorithm, we have performed experiments on a CRAY T3E. According to the experiments, it was proven that the scheduling algorithm shows better performance and that the conditions are critical in enhancing the communication speed of GEN_BLOCK redistribution.

  • PDF