• Title/Summary/Keyword: Multicore computing

Search Result 45, Processing Time 0.02 seconds

Static Timing Analysis of Shared Caches for Multicore Processors

  • Zhang, Wei;Yan, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.4
    • /
    • pp.267-278
    • /
    • 2012
  • The state-of-the-art techniques in multicore timing analysis are limited to analyze multicores with shared instruction caches only. This paper proposes a uniform framework to analyze the worst-case performance for both shared instruction caches and data caches in a multicore platform. Our approach is based on a new concept called address flow graph, which can be used to model both instruction and data accesses for timing analysis. Our experiments, as a proof-of-concept study, indicate that the proposed approach can accurately compute the worst-case performance for real-time threads running on a dual-core processor with a shared L2 cache (either to store instructions or data).

Multicore-Aware Code Co-Positioning to Reduce WCET on Dual-Core Processors with Shared Instruction Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.1
    • /
    • pp.12-25
    • /
    • 2012
  • For real-time systems it is important to obtain the accurate worst-case execution time (WCET). Furthermore, how to improve the WCET of applications that run on multicore processors is both significant and challenging as the WCET can be largely affected by the possible inter-core interferences in shared resources such as the shared L2 cache. In order to solve this problem, we propose an innovative approach that adopts a code positioning method to reduce the inter-core L2 cache interferences between the different real-time threads that adaptively run in a multi-core processor by using different strategies. The worst-case-oriented strategy is designed to decrease the worst-case WCET among these threads to as low as possible. The other two strategies aim at reducing the WCET of each thread to almost equal percentage or amount. Our experiments indicate that the proposed multicore-aware code positioning approaches, not only improve the worst-case performance of the real-time threads but also make good tradeoffs between efficiency and fairness for threads that run on multicore platforms.

Bounding Worst-Case DRAM Performance on Multicore Processors

  • Ding, Yiqiang;Wu, Lan;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.53-66
    • /
    • 2013
  • Bounding the worst-case DRAM performance for a real-time application is a challenging problem that is critical for computing worst-case execution time (WCET), especially for multicore processors, where the DRAM memory is usually shared by all of the cores. Typically, DRAM commands from consecutive DRAM accesses can be pipelined on DRAM devices according to the spatial locality of the data fetched by them. By considering the effect of DRAM command pipelining, we propose a basic approach to bounding the worst-case DRAM performance. An enhanced approach is proposed to reduce the overestimation from the invalid DRAM access sequences by checking the timing order of the co-running applications on a dual-core processor. Compared with the conservative approach, which assumes that no DRAM command pipelining exists, our experimental results show that the basic approach can bound the WCET more tightly, by 15.73% on average. The experimental results also indicate that the enhanced approach can further improve the tightness of WCET by 4.23% on average as compared to the basic approach.

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

A Study on Effect of Code Distribution and Data Replication for Multicore Computing Architectures

  • Cho, Doosan
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.282-287
    • /
    • 2021
  • A multicore system must be able to take full advantage of the program's instruction and data parallelism. This study introduces the data replication technique as a support technique to maximize the program's instruction and data parallelism. Instruction level parallelism can be limited by data dependency. In this case, if data is replicated to each processor core and used, instruction level parallelism can be used to the maximum. The technique proposed in this study can maximize the performance improvement effect when applied to scientific applications such as matrix multiplication operation.

Counter-Based Approaches for Efficient WCET Analysis of Multicore Processors with Shared Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.4
    • /
    • pp.285-299
    • /
    • 2013
  • To enable hard real-time systems to take advantage of multicore processors, it is crucial to obtain the worst-case execution time (WCET) for programs running on multicore processors. However, this is challenging and complicated due to the inter-thread interferences from the shared resources in a multicore processor. Recent research used the combined cache conflict graph (CCCG) to model and compute the worst-case inter-thread interferences on a shared L2 cache in a multicore processor, which is called the CCCG-based approach in this paper. Although it can compute the WCET safely and accurately, its computational complexity is exponential and prohibitive for a large number of cores. In this paper, we propose three counter-based approaches to significantly reduce the complexity of the multicore WCET analysis, while achieving absolute safety with tightness close to the CCCG-based approach. The basic counter-based approach simply counts the worst-case number of cache line blocks mapped to a cache set of a shared L2 cache from all the concurrent threads, and compares it with the associativity of the cache set to compute the worst-case cache behavior. The enhanced counter-based approach uses techniques to enhance the accuracy of calculating the counters. The hybrid counter-based approach combines the enhanced counter-based approach and the CCCG-based approach to further improve the tightness of analysis without significantly increasing the complexity. Our experiments on a 4-core processor indicate that the enhanced counter-based approach overestimates the WCET by 14% on average compared to the CCCG-based approach, while its averaged running time is less than 1/380 that of the CCCG-based approach. The hybrid approach reduces the overestimation to only 2.65%, while its running time is less than 1/150 that of the CCCG-based approach on average.

Parallel damage detection through finite frequency changes on multicore processors

  • Messina, Arcangelo;Cafaro, Massimo
    • Structural Engineering and Mechanics
    • /
    • v.63 no.4
    • /
    • pp.457-469
    • /
    • 2017
  • This manuscript deals with a novel approach aimed at identifying multiple damaged sites in structural components through finite frequency changes. Natural frequencies, meant as a privileged set of modal data, are adopted along with a numerical model of the system. The adoption of finite changes efficiently allows challenging characteristic problems encountered in damage detection techniques such as unexpected comparison of possible shifted modes and the significance of modal data changes very often affected by experimental/environmental noise. The new procedure extends MDLAC and exploits parallel computing on modern multicore processors. Smart filters, aimed at reducing the potential damaged sites, are implemented in order to reduce the computational effort. Several use cases are presented in order to illustrate the potentiality of the new damage detection procedure.

Parallel Multithreaded Processing for Data Set Summarization on Multicore CPUs

  • Ordonez, Carlos;Navas, Mario;Garcia-Alvarado, Carlos
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.111-120
    • /
    • 2011
  • Data mining algorithms should exploit new hardware technologies to accelerate computations. Such goal is difficult to achieve in database management system (DBMS) due to its complex internal subsystems and because data mining numeric computations of large data sets are difficult to optimize. This paper explores taking advantage of existing multithreaded capabilities of multicore CPUs as well as caching in RAM memory to efficiently compute summaries of a large data set, a fundamental data mining problem. We introduce parallel algorithms working on multiple threads, which overcome the row aggregation processing bottleneck of accessing secondary storage, while maintaining linear time complexity with respect to data set size. Our proposal is based on a combination of table scans and parallel multithreaded processing among multiple cores in the CPU. We introduce several database-style and hardware-level optimizations: caching row blocks of the input table, managing available RAM memory, interleaving I/O and CPU processing, as well as tuning the number of working threads. We experimentally benchmark our algorithms with large data sets on a DBMS running on a computer with a multicore CPU. We show that our algorithms outperform existing DBMS mechanisms in computing aggregations of multidimensional data summaries, especially as dimensionality grows. Furthermore, we show that local memory allocation (RAM block size) does not have a significant impact when the thread management algorithm distributes the workload among a fixed number of threads. Our proposal is unique in the sense that we do not modify or require access to the DBMS source code, but instead, we extend the DBMS with analytic functionality by developing User-Defined Functions.

Performance Evaluation and Optimization of Journaling File Systems with Multicores and High-Performance Flash SSDs (멀티코어 및 고성능 플래시 SSD 환경에서 저널링 파일 시스템의 성능 평가 및 최적화)

  • Han, Hyuck
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.4
    • /
    • pp.178-185
    • /
    • 2018
  • Recently, demands for computer systems with multicore CPUs and high-performance flash-based storage devices (i.e., flash SSD) have rapidly grown in cloud computing, surer-computing, and enterprise storage/database systems. Journaling file systems running on high-performance systems do not exploit the full I/O bandwidth of high-performance SSDs. In this article, we evaluate and analyze the performance of the Linux EXT4 file system with high-performance SSDs and multicore CPUs. The system used in this study has 72 cores and Intel NVMe SSD, and the flash SSD has performance up to 2800/1900 MB/s for sequential read/write operations. Our experimental results show that checkpointing in the EXT4 file system is a major overhead. Furthermore, we optimize the checkpointing procedure and our optimized EXT4 file system shows up to 92% better performance than the original EXT4 file system.

Tile Partitioning-based HEVC Parallel Decoding Optimization for Asymmetric Multicore Processor (비대칭 멀티코어 시스템 상의 HEVC 병렬 디코딩 최적화를 위한 타일 분할 기법)

  • Ryu, Yeongil;Roh, Hyun-Joon;Ryu, Eun-Seok
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1060-1065
    • /
    • 2016
  • Recently, there is an emerging need for parallel UHD video processing, and the usage of computing systems that have an asymmetric processor such as ARM big.LITTLE is actively increasing. Thus, a new parallel UHD video processing method that is optimized for the asymmetric multicore systems is needed. This paper proposes a novel HEVC tile partitioning method for parallel processing by analyzing the computational power of asymmetric multicores. The proposed method analyzes (1) the computing power of asymmetric multicores and (2) the regression model of computational complexity per video resolution. Finally, the model (3) determines the optimal HEVC tile resolution for each core and partitions/allocates the tiles to suitable cores. The proposed method minimizes the gap in the decoding time between the fastest CPU core and the slowest CPU core. Experimental results with the 4K UHD official test sequences show average 20% improvement in the decoding speedup on the ARM asymmetric multicore system.