• Title/Summary/Keyword: Multicore processors

Search Result 49, Processing Time 0.027 seconds

Overhead Analysis of XtratuM for Space in SMP Envrionment (SMP 환경에서의 위성용 XtratuM 오버헤드 분석)

  • Kim, Sun-Wook;Yoo, Bum-Soo;Jeong, Jae-Yeop;Choi, Jong-Wook
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.4
    • /
    • pp.177-187
    • /
    • 2020
  • Virtualization with hypervisors is one of emerging topics in multicore processors for space. Hypervisors are software layers to make several independent virtualized environments on one processor. Since all hardware resources are virtualized and distributed only by hypervisors, overall performance of processors can be improved by fully utilizing the resources. However at the same time, there are overheads for virtualizing and distributing hardware resources. Satellites are one of hard real time systems, and performance degradation with overheads should be analyzed thoroughly. Previous research on the overheads focused on single core systems. Even the overheads were analyzed in multicore systems, SMP environment was not fully included. This paper builds SMP environment with XtratuM, one of hypervisors for space missions, and analyzes performance degradation with overheads. Two boards of GR712RC with 2 LEON3FT CPUs and GR740 with 4 LEON4 CPUs are used in experiments. On each board, SMP benchmark functions are executed on SMP environment with XtratuM and on that without XtratuM respectively. Results are analyzed to find timing characteristics including overheads. Finally, applicability of the XtratuM to flight software in SMP is also reviewed.

Power-efficient Scheduling of Periodic Real-time Tasks on Lightly Loaded Multicore Processors (저부하 멀티코어 프로세서에서 주기적 실시간 작업들의 저전력 스케쥴링)

  • Lee, Wan-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.8
    • /
    • pp.11-19
    • /
    • 2012
  • In this paper, we propose a power-efficient scheduling scheme for lightly loaded multicore processors which contain more processing cores than running tasks. The proposed scheme activates a portion of available cores and inactivates the other unused cores in order to save power consumption. The tasks are assigned to the activated cores based on a heuristic mechanism for fast task assignment. Each activated core executes its assigned tasks with the optimal clock frequency which minimizes the power consumption of the tasks while meeting their deadlines. Evaluation shows that the proposed scheme saves up to 78% power consumption of the previous method which activates as many processing cores as possible for the execution of the given tasks.

A Study on Statistical Simulation of Multicore Processor Architectures (멀티코어 프로세서의 통계적 모의실험에 관한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.259-265
    • /
    • 2014
  • When the trace-driven simulation is used for the performance analysis of widely used multicore processors in the initial design stage, much time and disk space is necessary. In this paper, statistical simulations are performed for a high performance multicore processor with various hardware configurations. For the experiment, SPEC2000 benchmarks programs are used for profiling and synthesizing new instruction traces. As a result, the performance obtained by our statistical simulation is comparable to that of the trace-driven simulation with the benefit of tremendous reduction in the simulation time.

Performance Study of Multicore Digital Signal Processor Architectures (멀티코어 디지털 신호처리 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.171-177
    • /
    • 2013
  • Due to the demand for high speed 3D graphic rendering, video file format conversion, compression, encryption and decryption technologies, the importance of digital signal processor system is growing rapidly. In order to satisfy the real-time constraints, high performance digital signal processor is required. Therefore, as in general purpose computer systems, digital signal processor should be designed as multicore architecture as well. Using UTDSP benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2 to 16-core digital signal processor architectures with the cores from simple RISC to in-order and out-of-order superscalar processors for the various window sizes, extensively.

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

Improving Multi-DNN Computational Performance of Embedded Multicore Processors through a Global Queue (글로벌 큐를 통한 임베디드 멀티코어 프로세서의 멀티 DNN 연산 성능 향상)

  • Cho, Ho-jin;Kim, Myung-sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.6
    • /
    • pp.714-721
    • /
    • 2020
  • DNN is expanding its use in embedded systems such as robots and autonomous vehicles. For high recognition accuracy, computational complexity is greatly increased, and multiple DNNs are running aperiodically. Therefore, the ability processing multiple DNNs in embedded environments is a crucial issue. Accordingly, multicore based platforms are being released. However, most DNN models are operated in a batch process, and when multiple DNNs are operated in multicore together, the execution time deviation between each DNN may be large and the end-to-end execution time of the whole DNNs could be long depending on how they are allocated to the cores. In this paper, we solve these problems by providing a framework that decompose each DNN into individual layers and then distribute to multicores through a global queue. As a result of the experiment, the total DNN execution time was reduced by 31%, and when operating multiple identical DNNs, the deviation in execution time was reduced by up to 95.1%.

Analysis on the Thermal Efficiency of Branch Prediction Techniques in 3D Multicore Processors (3차원 구조 멀티코어 프로세서의 분기 예측 기법에 관한 온도 효율성 분석)

  • Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • The KIPS Transactions:PartA
    • /
    • v.19A no.2
    • /
    • pp.77-84
    • /
    • 2012
  • Speculative execution for improving instruction-level parallelism is widely used in high-performance processors. In the speculative execution technique, the most important factor is the accuracy of branch predictor. Unfortunately, complex branch predictors for improving the accuracy can cause serious thermal problems in 3D multicore processors. Thermal problems have negative impact on the processor performance. This paper analyzes two methods to solve the thermal problems in the branch predictor of 3D multi-core processors. First method is dynamic thermal management which turns off the execution of the branch predictor when the temperature of the branch predictor exceeds the threshold. Second method is thermal-aware branch predictor placement policy by considering each layer's temperature in 3D multi-core processors. According to our evaluation, the branch predictor placement policy shows that average temperature is $87.69^{\circ}C$, and average maximum temperature gradient is $11.17^{\circ}C$. And, dynamic thermal management shows that average temperature is $89.64^{\circ}C$ and average maximum temperature gradient is $17.62^{\circ}C$. Proposed branch predictor placement policy has superior thermal efficiency than the dynamic thermal management. In the perspective of performance, the proposed branch predictor placement policy degrades the performance by 3.61%, while the dynamic thermal management degrades the performance by 27.66%.

Multicore Flow Processor with Wire-Speed Flow Admission Control

  • Doo, Kyeong-Hwan;Yoon, Bin-Yeong;Lee, Bhum-Cheol;Lee, Soon-Seok;Han, Man Soo;Kim, Whan-Woo
    • ETRI Journal
    • /
    • v.34 no.6
    • /
    • pp.827-837
    • /
    • 2012
  • We propose a flow admission control (FAC) for setting up a wire-speed connection for new flows based on their negotiated bandwidth. It also terminates a flow that does not have a packet transmitted within a certain period determined by the users. The FAC can be used to provide a reliable transmission of user datagram and transmission control protocol applications. If the period of flows can be set to a short time period, we can monitor active flows that carry a packet over networks during the flow period. Such powerful flow management can also be applied to security systems to detect a denial-of-service attack. We implement a network processor called a flow management network processor (FMNP), which is the second generation of the device that supports FAC. It has forty reduced instruction set computer core processors optimized for packet processing. It is fabricated in 65-nm CMOS technology and has a 40-Gbps process performance. We prove that a flow router equipped with an FMNP is better than legacy systems in terms of throughput and packet loss.

A Performance Study of Embedded Multicore Processor Architectures (임베디드 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.163-169
    • /
    • 2013
  • Recently, the importance of embedded system is growing rapidly. In-order to satisfy the real-time constraints of the system, high performance embedded processor is required. Therefore, as in general purpose computer systems, embedded processor should be designed as multicore architecture as well. Using MiBench benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2-core to 16-core embedded processor architectures with different types of cores from simple RISC to in-order and out-of-order superscalar processors, extensively. As a result, the achievable performance is as high as 23 times over the single core embedded RISC processor.

An Efficient Cache Coherence Protocol for Multi-Core Processors with Ring Interconnects (링 연결구조 기반의 멀티코어 프로세서를 위한 캐시 일관성 유지 기법)

  • Park, Jin-Young;Choi, Lynn
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.8
    • /
    • pp.768-772
    • /
    • 2008
  • Today's microprocessor normally includes several processing cores to reduce the energy consumption without losing performance. In this paper, data transfer ordering mechanism can be efficiently used for cache coherence solution in unidirectional ring interconnect. RING-DATA ORDER combines the simplicity of GREEDY-ORDER and the performance of RING-ORDER. RING-DATA ORDER can be easily applicable to multicore processor with unidirectional ring interconnect.