• Title/Summary/Keyword: Processor Trace

Search Result 40, Processing Time 0.025 seconds

A Study on Statistical Simulation of Multicore Processor Architectures (멀티코어 프로세서의 통계적 모의실험에 관한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.259-265
    • /
    • 2014
  • When the trace-driven simulation is used for the performance analysis of widely used multicore processors in the initial design stage, much time and disk space is necessary. In this paper, statistical simulations are performed for a high performance multicore processor with various hardware configurations. For the experiment, SPEC2000 benchmarks programs are used for profiling and synthesizing new instruction traces. As a result, the performance obtained by our statistical simulation is comparable to that of the trace-driven simulation with the benefit of tremendous reduction in the simulation time.

A Study On Statistical Simulation for Asymmetric Multi-Core Processor Architectures (비대칭적 멀티코어 프로세서의 통계적 모의실험에 관한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.157-163
    • /
    • 2016
  • If trace-driven or execution-driven simulation is used for the performance analysis of asymmetric multi-core processors, excessive time and much disk space are necessary. In this paper, statistical simulations are performed for asymmetric multi-core processors with various hardware configurations. For the experiment, SPEC 2000 benchmark programs are used for profiling and synthesis, which is supplied as input for the simulation of asymmetric multi-core processors. As a result, the performance of asymmetric multi-core processor obtained by statistical simulation is comparable to that of the trace-driven simulation with a tremendous reduction in the simulation time.

DEVS 형식론을 이용한 다중프로세서 운영체제의 모델링 및 성능평가

  • 홍준성
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1994.10a
    • /
    • pp.32-32
    • /
    • 1994
  • In this example, a message passing based multicomputer system with general interdonnedtion network is considered. After multicomputer systems are developed with morm-hole routing network, topologies of interconecting network are not major considertion for process management and resource sharing. Tehre is an independeent operating system kernel oneach node. It communicates with other kernels using message passingmechanism. Based on this architecture, the problem is how mech does performance degradation will occur in the case of processor sharing on multicomputer systems. Processor sharing between application programs is veryimprotant decision on system performance. In almost cases, application programs running on massively parallel computer systems are not so much user-interactive. Thus, the main performance index is system throughput. Each application program has various communication patterns. and the sharing of processors causes serious performance degradation in hte worst case such that one processor is shared by two processes and another processes are waiting the messages from those processes. As a result, considering this problem is improtant since it gives the reason whether the system allows processor sharingor not. Input data has many parameters in this simulation . It contains the number of threads per task , communication patterns between threads, data generation and also defects in random inupt data. Many parallel aplication programs has its specific communication patterns, and there are computation and communication phases. Therefore, this phase informatin cannot be obtained random input data. If we get trace data from some real applications. we can simulate the problem more realistic . On the other hand, simualtion results will be waseteful unless sufficient trace data with varisous communication patterns is gathered. In this project , random input data are used for simulation . Only controllable data are the number of threads of each task and mapping strategy. First, each task runs independently. After that , each task shres one and more processors with other tasks. As more processors are shared , there will be performance degradation . Form this degradation rate , we can know the overhead of processor sharing . Process scheduling policy can affects the results of simulation . For process scheduling, priority queue and FIFO queue are implemented to support round-robin scheduling and priority scheduling.

  • PDF

A Performance Study of Embedded Multicore Processor Architectures (임베디드 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.163-169
    • /
    • 2013
  • Recently, the importance of embedded system is growing rapidly. In-order to satisfy the real-time constraints of the system, high performance embedded processor is required. Therefore, as in general purpose computer systems, embedded processor should be designed as multicore architecture as well. Using MiBench benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2-core to 16-core embedded processor architectures with different types of cores from simple RISC to in-order and out-of-order superscalar processors, extensively. As a result, the achievable performance is as high as 23 times over the single core embedded RISC processor.

Performance Study of Multicore Digital Signal Processor Architectures (멀티코어 디지털 신호처리 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.171-177
    • /
    • 2013
  • Due to the demand for high speed 3D graphic rendering, video file format conversion, compression, encryption and decryption technologies, the importance of digital signal processor system is growing rapidly. In order to satisfy the real-time constraints, high performance digital signal processor is required. Therefore, as in general purpose computer systems, digital signal processor should be designed as multicore architecture as well. Using UTDSP benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2 to 16-core digital signal processor architectures with the cores from simple RISC to in-order and out-of-order superscalar processors for the various window sizes, extensively.

Performance Study of Multi-core In-Order Superscalar Processor Architecture (멀티코어 순차 수퍼스칼라 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.5
    • /
    • pp.123-128
    • /
    • 2012
  • In order to overcome the hardware complexity and performance limit problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further enhanced. In this paper, in-order superscalar processor is utilized as the core for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the number of superscalar cores between 2 and 16 and the window size of 4 to 16 extensively. As a result, the 16-core superscalar processor for the window size of 16 results in 8.4 times speed up over the single core superscalar processor. When compared with the same number of cores, the multi-core superscalar processor performance doubles that of the multi-core RISC processor.

Performance Analysis of Multicore Out-of-Order Superscalar Processor with Multiple Basic Block Execution (다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서의 성능 분석)

  • Lee, Jong Bok
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.2
    • /
    • pp.198-205
    • /
    • 2013
  • In this paper, the performance of multicore processor architecture is analyzed which utilizes out-of-order superscalar processor core using multiple basic block execution. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar processor with the window size from 32 to 64 and the number of cores between 1 and 16, exploiting multiple basic block execution from 1 to 4 extensively. As a result, the multicore out-of-order superscalar processor with 4 basic block execution achieves 22.0 % average performance increase over the same architecture with the single basic block execution.

The Analytic Performance Model of the Superscalar Processor Using Multiple Branch Prediction (독립시행의 정리를 이용하는 수퍼스칼라 프로세서의 다중 분기 예측 성능 모델)

  • 이종복
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.1009-1012
    • /
    • 1999
  • An analytical performance model that can predict the performance of a superscalar processor employing multiple branch prediction is introduced. The model is based on the conditional independence probability and the basic block size of instructions, with the degree of multiple branch prediction, the fetch rate, and the window size of a superscalar architecture. Trace driven simulation is performed for the subset of SPEC integer benchmarks, and the measured IPCs are compared with the results derived from the model. As the result, our analytic model could predict the performance of the superscalar processor using multiple branch prediction within 6.6 percent on the average.

  • PDF

A Performance Study of Multi-Core Processors with Perceptrons (퍼셉트론을 이용하는 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.12
    • /
    • pp.1704-1709
    • /
    • 2014
  • In order to increase the performance of multi-core system processor architectures, the multi-thread branch predictor which speculatively fetches and allocates threads to each core should be highly accurate. In this paper, the perceptron based multi-thread branch predictor is proposed for the multi-core processor architectures. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 2 to 16-core architectures employing perceptron multi-thread branch predictor extensively. Its performance is compared with the architecture which utilizes the two-level adaptive multi-thread branch predictor.

A Study on Implementation of Real-Time Multiprocess Trace Stream Decoder (실시간 다중 프로세스 트레이스 스트림 디코더 구현에 관한 연구)

  • Kim, Hyuncheol;Kim, Youngsoo;Kim, Jonghyun
    • Convergence Security Journal
    • /
    • v.18 no.5_1
    • /
    • pp.67-73
    • /
    • 2018
  • From a software engineering point of view, tracing is a special form of logging that records program execution information. Tracers using dedicated hardware are often used because of the characteristics of tracers that need to generate and decode huge amounts of data in real time. Intel(R) PT uses proprietary hardware to record all information about software execution on each hardware thread. When the software execution is completed, the PT can process the trace data of the software and reconstruct the correct program flow. The hardware trace program can be integrated into the operating system, but in the case of the window system, the integration is not tight due to problems such as the kernel opening. Also, it is possible to trace only a single process and not provide a way to trace multiple process streams. In this paper, we propose a method to extend existing PT trace program to support multi - process stream traceability in Windows environment in order to overcome these disadvantages.

  • PDF