• Title/Summary/Keyword: Execution-Driven Simulation

Search Result 29, Processing Time 0.021 seconds

Construction of a Compiled-code Simulator Generation System for Efficient Design Exploration in Embedded Core Design (임베디드 코어 설계시 효율적인 설계 공간 탐색을 위한 컴파일드 코드 방식 시뮬레이터 생성 시스템 구축)

  • Kim, Sang-Woo;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.71-79
    • /
    • 2011
  • This paper proposes a compiled-code simulator generation system based-on machine description language for efficient design space exploration in designing an embedded system optimized for a specific application. The proposed system generates a compiled-code simulator which maintains the functional accuracy of an event-driven simulator by determining instruction fetch and decoding processes statically. Generated simulator takes instruction-level and cycle-level simulation for estimating performances in embedded core. To show the efficiency of the constructed compiled-code simulator generator, architecture exploration had been performed for the JPEG encoder application. Starting with MIPS R3000 processor for one embedded core, the proposed system can produce the core showing optimized execution time for the application programming. In this process, a huge amount of simulation time has been used. Cycle-level compiled-code simulator has the functional accuracy and shows performance improvement by 21.7% in terms of simulation speed on the average when compared with an event-driven simulator.

Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy (예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선)

  • Yang, Seiyang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.439-446
    • /
    • 2016
  • In this paper, an efficient prediction accuracy enhancement strategy is proposed for improving the performance of the prediction-based parallel event-driven gate-level timing simulation. The proposed new strategy adopts the static double prediction and the dynamic prediction for input and output values of local simulations. The double prediction utilizes another static prediction data for the secondary prediction once the first prediction fails, and the dynamic prediction tries to use the on-going simulation result accumulated dynamically during the actual parallel simulation execution as prediction data. Therefore, the communication overhead and synchronization overhead, which are the main bottleneck of parallel simulation, are maximally reduced. Throughout the proposed two prediction enhancement techniques, we have observed about 5x simulation performance improvement over the commercial parallel multi-core simulation for six test designs.

New Control Scheme for the Wind-Driven Doubly Fed Induction Generator under Normal and Abnormal Grid Voltage Conditions

  • Ebrahim, Osama S.;Jain, Praveen K.;Nishith, Goel
    • Journal of Power Electronics
    • /
    • v.8 no.1
    • /
    • pp.10-22
    • /
    • 2008
  • The wind-driven doubly fed induction generator (DFIG) is currently under pressure to be more grid-compatible. The main concern is the fault ride-through (FRT) requirement to keep the generator connected to the grid during faults. In response to this, the paper introduces a novel model and new control scheme for the DFIG. The model provides a means of direct stator power control and considers the stator transients. On the basis of the derived model, a robust linear quadratic (LQ) controller is synthesized. The control law has proportional and integral actions and takes account of one sample delay in the input owing to the microprocessor's execution time. Further, the influence of the grid voltage imperfection is mitigated using frequency shaped cost functional method. Compensation of the rotor current pulsations is proposed to improve the FRT capability as well as the generator performance under grid voltage unbalance. As a consequence, the control system can achieve i) fast direct power control without instability risk, ii) alleviation of the problems associated with the DFIG operation under unbalanced grid voltage, and iii) high probability of successful grid FRT. The effectiveness of the proposed solution is confirmed through simulation studies on 2MW DFIG.

An On-chip Cache and Main Memory Compression System Optimized by Considering the Compression rate Distribution of Compressed Blocks (압축블록의 압축률 분포를 고려해 설계한 내장캐시 및 주 메모리 압축시스템)

  • Yim, Keun-Soo;Lee, Jang-Soo;Hong, In-Pyo;Kim, Ji-Hong;Kim, Shin-Dug;Lee, Yong-Surk;Koh, Kern
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.125-134
    • /
    • 2004
  • Recently, an on-chip compressed cache system was presented to alleviate the processor-memory Performance gap by reducing on-chip cache miss rate and expanding memory bandwidth. This research Presents an extended on-chip compressed cache system which also significantly expands main memory capacity. Several techniques are attempted to expand main memory capacity, on-chip cache capacity, and memory bandwidth as well as reduce decompression time and metadata size. To evaluate the performance of our proposed system over existing systems, we use execution-driven simulation method by modifying a superscalar microprocessor simulator. Our experimental methodology has higher accuracy than previous trace-driven simulation method. The simulation results show that our proposed system reduces execution time by 4-23% compared with conventional memory system without considering the benefits obtained from main memory expansion. The expansion rates of data and code areas of main memory are 57-120% and 27-36%, respectively.

Design and Implementation of Real-Time Parallel Engine for Discrete Event Wargame Simulation (이산사건 워게임 시뮬레이션을 위한 실시간 병렬 엔진의 설계 및 구현)

  • Kim, Jin-Soo;Kim, Dae-Seog;Kim, Jung-Guk;Ryu, Keun-Ho
    • The KIPS Transactions:PartA
    • /
    • v.10A no.2
    • /
    • pp.111-122
    • /
    • 2003
  • Military wargame simulation models must support the HLA in order to facilitate interoperability with other simulations, and using parallel simulation engines offer efficiency in reducing system overhead generated by propelling interoperability. However, legacy military simulation model engines process events using sequential event-driven method. This is due to problems generated by parallel processing such as synchronous reference to global data domains. Additionally. using legacy simulation platforms result in insufficient utilization of multiple CPUs even if a multiple CPU system is under use. Therefore, in this paper, we propose conversing the simulation engine to an object model-based parallel simulation engine to ensure military wargame model's improved system processing capability, synchronous reference to global data domains, external simulation time processing, and the sequence of parallel-processed events during a crash recovery. The converted parallel simulation engine is designed and implemented to enable parallel execution on a multiple CPU system (SMP).

Performance Analysis of PC Cluster-based CC-NUMA System using Execution-driven Simulation (실행주도 시뮬레이션에 의한 PC 클러스터 기반 CC-NUMA 시스템 성능분석)

  • Ha, Chi-Jeong;Jeong, Sang-Hwa;O, Su-Cheol
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.4
    • /
    • pp.188-195
    • /
    • 2001
  • 본 논문에서는 PC 클러스터 기반 CC-NUMA 시스템을 제안하고, 시뮬레이션을 통하여 성능을 분석하였다. PC 클러스터 기반 CC-NUMA 시스템은 PC의 PCI slot에 CC-NUMA 카드를 장착함으로써 구현되며 공유메모리, 네트워크 캐쉬, 네트워크 제어 모듈을 포함한다. CC-NUMA 시스템은 PCI 버스상에 존재하는 메모리를 공유대상으로 하며, 공유메모리와 네트워크 캐쉬사이의 일관성은 IEEE SCI 표준에 의해 유지된다. CC-NUMA 시스템을 시뮬레이션 하기 위해 실행주도 시뮬레이터인 Limes를 수정하여 사용하였으며, 캐쉬 일관성 유지 알고리즘으로 SCI의 typical set을 구현하였다. 또한 기존 시스템과의 비교를 위해서 네트워크 캐쉬를 활용하지 않는 Dolphin사의 PCI-SCI 카드에 기반한 NUMA 시스템을 시뮬레이션 하였다. CC-NUMA 시스템의 성능을 측정하기 위하여 다양한 실험을 수행하였으며, 실험결과 CC-NUMA 시스템이 NUMA 시스템에 비해서 성능향상이 우수함을 알 수 있었다. 또한, CC-NUMA 시스템이 최적의 성능을 발휘하는 파라미터의 값을 도출하였으며, 이를 CC-NUMA 시스템의 실제 구현에 반영하였다.

  • PDF

Analysis of Barrier Waiting Times in Data Parallel Programs (데이터 병렬 프로그램에서 배리어 대기시간의 분석)

  • Jung, In-Bum
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.73-80
    • /
    • 2001
  • Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier than others should wait at the barrier, the total processor utilization decreases. In this paper, to find the sources of the barrier waiting time, parallel programs are executed on the various grain sizes through execution-driven simulations. In simulation studies, we found that even if approximately equal amounts of work are distributed to each processor, all processes may not arrive at a barrier at the same time. The reasons are that the different numbers of cache misses and instructions within partitioned grains result in the difference in arrival time of processors at the barrier.

  • PDF

Memory Behavior in Scientific vs. Commercial Applications

  • Kim, Taegyoun;Heejung Wang;Lee, Kangwoo
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.421-425
    • /
    • 1999
  • As the market size of multiprocessor systems for commercial applications, parallel systems, especially cache-coherent shared-memory multiprocessors that are conventionally designed for scientific applications need to be tuned in different fashion to achieve the best performance for new application area. In this paper, indepth investigation on the memory behavior which is the primary cause for performance changes were made. We chose representative benchmarks in scientific and commercial application areas. After running execution-driven simulation for bus-based cache-coherent shared-memory multiprocessors, we experienced significant differences and conclude that the systems must be carefully and differently designed to achieve the best performance when they are built for distinct applications.

  • PDF

Reduction of Read Access Latency by Invalid Hint in Directory-Based Cache Coherence Scheme (디렉토리를 이용한 캐쉬 일관성 유지 기법에서 무효화 힌트를 이용한 읽기 접근 시간 감소)

  • Oh, Seung-Taek;Rhee, Yun-Seok;Maeng, Seung-Ryoul;Lee, Joon-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.408-415
    • /
    • 2000
  • Large scale shared memory multiprocessors have suffered from large access latency to shared memory. The large latency partially stems from a feature of directory-based cache coherence schemes which require a shared memory access to be serviced at a home node of the memory block. The home visit results in three or more hops traversal for a memory read access. The traversal becomes much longer as a system scales up. In this paper, we propose a new cache coherence scheme that reduces read access latency. The proposed scheme exploits ideas of invalid hint. Invalid hint for a cache block means which node has invalidated the cache block before. Thus a read access request can be directly sent to and serviced by the node (called owner) without help of a home node. Execution-driven simulation is employed to evaluate performance of the proposed scheme. The simulation results show that read access latency and execution time are reduced.

  • PDF

Page replication mechanism using adjustable DELAY counter in NUMA multiprocessors (NUMA 다중처리기에서 조정가능한 지연 카운터를 이용한 페이집 복사 기법)

  • 이종우;조유곤
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.6
    • /
    • pp.23-33
    • /
    • 1996
  • The exploitation of locality of reference in shared memory NUMA multiprocessors is one of the improtant problems in parallel processing today. In this paper, we propose a revised hardeare reference counter to help operating system to manage locality. In contrast to the previous one, the value of counter can abe adjusted dynamically and periodically to adapt the page replication policy to the various memory reference patterns of processors. We use execution-driven simulation of real applications to evaluate the effectiveness of our adjustable DELAY counter. Our main conclusijon is that by using the adjustable DELAY counter the t normalized average memory access costs and the variance of them become smaller for most applications than the previous one and more robust memory management policies can be provided for the operating systems.

  • PDF