• Title/Summary/Keyword: multicore

Search Result 143, Processing Time 0.023 seconds

Stochastic Power-efficient DVFS Scheduling of Real-time Tasks on Multicore Processors with Leakage Power Awareness (멀티코어 프로세서의 누수 전력을 고려한 실시간 작업들의 확률적 저전력 DVFS 스케쥴링)

  • Lee, Kwanwoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.25-33
    • /
    • 2014
  • This paper proposes a power-efficient scheduling scheme that stochastically minimizes the power consumption of real-time tasks while meeting their deadlines on multicore processors. In the proposed scheme, uncertain computation amounts of given tasks are translated into probabilistic computation amounts based on their past completion amounts, and the mean power consumption of the translated probabilistic computation amounts is minimized with a finite set of discrete clock frequencies. Also, when system load is low, the proposed scheme activates a part of all available cores with unused cores powered off, considering the leakage power consumption of cores. Evaluation shows that the scheme saves up to 69% power consumption of the previous method.

Parallel Deblocking Filter Based on Modified Order of Accessing the Coding Tree Units for HEVC on Multicore Processor

  • Lei, Haiwei;Liu, Wenyi;Wang, Anhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.3
    • /
    • pp.1684-1699
    • /
    • 2017
  • The deblocking filter (DF) reduces blocking artifacts in encoded video sequences, and thereby significantly improves the subjective and objective quality of videos. Statistics show that the DF accounts for 5-18% of the total decoding time in high-efficiency video coding. Therefore, speeding up the DF will improve codec performance, especially for the decoder. In view of the rapid development of multicore technology, we propose a parallel DF scheme based on a modified order of accessing the coding tree units (CTUs) by analyzing the data dependencies between adjacent CTUs. This enables the DF to run in parallel, providing accelerated performance and more flexibility in the degree of parallelism, as well as finer parallel granularity. We additionally solve the problems of variable privatization and thread synchronization in the parallelization of the DF. Finally, the DF module is parallelized based on the HM16.1 reference software using OpenMP technology. The acceleration performance is experimentally tested under various numbers of cores, and the results show that the proposed scheme is very effective at speeding up the DF.

A Study on Statistical Simulation of Multicore Processor Architectures (멀티코어 프로세서의 통계적 모의실험에 관한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.259-265
    • /
    • 2014
  • When the trace-driven simulation is used for the performance analysis of widely used multicore processors in the initial design stage, much time and disk space is necessary. In this paper, statistical simulations are performed for a high performance multicore processor with various hardware configurations. For the experiment, SPEC2000 benchmarks programs are used for profiling and synthesizing new instruction traces. As a result, the performance obtained by our statistical simulation is comparable to that of the trace-driven simulation with the benefit of tremendous reduction in the simulation time.

Bounding Worst-Case DRAM Performance on Multicore Processors

  • Ding, Yiqiang;Wu, Lan;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.53-66
    • /
    • 2013
  • Bounding the worst-case DRAM performance for a real-time application is a challenging problem that is critical for computing worst-case execution time (WCET), especially for multicore processors, where the DRAM memory is usually shared by all of the cores. Typically, DRAM commands from consecutive DRAM accesses can be pipelined on DRAM devices according to the spatial locality of the data fetched by them. By considering the effect of DRAM command pipelining, we propose a basic approach to bounding the worst-case DRAM performance. An enhanced approach is proposed to reduce the overestimation from the invalid DRAM access sequences by checking the timing order of the co-running applications on a dual-core processor. Compared with the conservative approach, which assumes that no DRAM command pipelining exists, our experimental results show that the basic approach can bound the WCET more tightly, by 15.73% on average. The experimental results also indicate that the enhanced approach can further improve the tightness of WCET by 4.23% on average as compared to the basic approach.

Implementation of SDR-based LTE-A PDSCH Decoder for Supporting Multi-Antenna Using Multi-Core DSP (멀티코어 DSP를 이용한 다중 안테나를 지원하는 SDR 기반 LTE-A PDSCH 디코더 구현)

  • Na, Yong;Ahn, Heungseop;Choi, Seungwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.85-92
    • /
    • 2019
  • This paper presents a SDR-based Long Term Evolution Advanced (LTE-A) Physical Downlink Shared Channel (PDSCH) decoder using a multicore Digital Signal Processor (DSP). For decoder implementation, multicore DSP TMS320C6670 is used, which provides various hardware accelerators such as turbo decoder, fast Fourier transformer and Bit Rate Coprocessors. The TMS320C6670 is a DSP specialized in implementing base station platforms and is not an optimized platform for implementing mobile terminal platform. Accordingly, in this paper, the hardware accelerator was changed to the terminal implementation to implement the LTE-A PDSCH decoder supporting the multi-antenna and the functions not provided by the hardware accelerator were implemented through core programming. Also pipeline using multicore was implemented to meet the transmission time interval. To confirm the feasibility of the proposed implementation, we verified the real-time decoding capability of the PDSCH decoder implemented using the LTE-A Reference Measurement Channel (RMC) waveform about transmission mode 2 and 3.

Performance Study of Multicore Digital Signal Processor Architectures (멀티코어 디지털 신호처리 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.171-177
    • /
    • 2013
  • Due to the demand for high speed 3D graphic rendering, video file format conversion, compression, encryption and decryption technologies, the importance of digital signal processor system is growing rapidly. In order to satisfy the real-time constraints, high performance digital signal processor is required. Therefore, as in general purpose computer systems, digital signal processor should be designed as multicore architecture as well. Using UTDSP benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2 to 16-core digital signal processor architectures with the cores from simple RISC to in-order and out-of-order superscalar processors for the various window sizes, extensively.

Energy-Efficient Fault-Tolerant Scheduling based on Duplicated Executions for Real-Time Tasks on Multicore Processors (멀티코어 프로세서상의 실시간 태스크들을 위한 중복 실행에 기반한 저전력 결함포용 스케줄링)

  • Lee, Kwan-Woo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.5
    • /
    • pp.1-10
    • /
    • 2014
  • The proposed scheme schedules given real-time tasks so that energy consumption of multicore processors would be minimized while meeting tasks' deadline and tolerating a permanent fault based on the primary-backup task model. Whereas the previous methods minimize the overlapped time of a primary task and its backup task, the proposed scheme maximizes the overlapped time so as to decrease the core speed as much as possible. It is analytically verified that the proposed scheme minimizes the energy consumption. Also, the proposed scheme saves up to 77% energy consumption of the previous method through experimental performance evaluation.

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

A Study on the Localization of 1500lb High-Pressure Drop Control Valve for Boiler Feedwater Pump (보일러 급수펌프용 1500lb 고차압 제어밸브 국산화 개발에 관한 연구)

  • Lee, Kwon-Il;Jang, Hoon;Lee, Chi-Woo
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.21 no.8
    • /
    • pp.19-24
    • /
    • 2022
  • We developed a prototype from the design of a trim, which is the most important in the localization development of a 1500 Ib high-differential pressure-control valve used for boiler feedwater, and measured the flow coefficient, the most basic design data for valves. The following conclusions were drawn. The comparison of the design values of the flow coefficients for the existing X-trim and the multicore trim designed for localization development showed that they were almost identical, and the X-trim value was slightly lower. The comparison of the X-trim and multicore trim based on the valve flow coefficient test showed that they were generally similar, indicating no problem with the design. In the future, we plan to compare and analyze the flow paths for the X-trim and multicore trim via flow analysis.

Hardware-Software Cosynthesis of Multitask Multicore SoC with Real-Time Constraints (실시간 제약조건을 갖는 다중태스크 다중코어 SoC의 하드웨어-소프트웨어 통합합성)

  • Lee Choon-Seung;Ha Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.592-607
    • /
    • 2006
  • This paper proposes a technique to select processors and hardware IPs and to map the tasks into the selected processing elements, aming to achieve high performance with minimal system cost when multitask applications with real-time constraints are run on a multicore SoC. Such technique is called to 'Hardware-Software Cosynthesis Technique'. A cosynthesis technique was already presented in our early work [1] where we divide the complex cosynthesis problem into three subproblems and conquer each subproblem separately: selection of appropriate processing components, mapping and scheduling of function blocks to the selected processing component, and schedulability analysis. Despite good features, our previous technique has a serious limitation that a task monopolizes the entire system resource to get the minimum schedule length. But in general we may obtain higher performance in multitask multicore system if independent multiple tasks are running concurrently on different processor cores. In this paper, we present two mapping techniques, task mapping avoidance technique(TMA) and task mapping pinning technique(TMP), which are applicable for general cases with diverse operating policies in a multicore environment. We could obtain significant performance improvement for a multimedia real-time application, multi-channel Digital Video Recorder system and for randomly generated multitask graphs obtained from the related works.