• Title/Summary/Keyword: Multiple Execution

Search Result 301, Processing Time 0.028 seconds

Performance Analysis of Multicore Out-of-Order Superscalar Processor with Multiple Basic Block Execution (다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서의 성능 분석)

  • Lee, Jong Bok
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.2
    • /
    • pp.198-205
    • /
    • 2013
  • In this paper, the performance of multicore processor architecture is analyzed which utilizes out-of-order superscalar processor core using multiple basic block execution. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar processor with the window size from 32 to 64 and the number of cores between 1 and 16, exploiting multiple basic block execution from 1 to 4 extensively. As a result, the multicore out-of-order superscalar processor with 4 basic block execution achieves 22.0 % average performance increase over the same architecture with the single basic block execution.

Efficient Exploring Multiple Execution Path for Dynamic Malware Analysis (악성코드 동적 분석을 위한 효율적인 다중실행경로 탐색방법)

  • Hwang, Ho;Moon, Daesung;Kim, Ikkun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.2
    • /
    • pp.377-386
    • /
    • 2016
  • As the number of malware has been increased, it is necessary to analyze malware rapidly against cyber attack. Additionally, Dynamic malware analysis has been widely studied to overcome the limitation of static analysis such as packing and obfuscation, but still has a problem of exploring multiple execution path. Previous works for exploring multiple execution path have several problems that it requires much time to analyze and resource for preparing analysis environment. In this paper, we proposed efficient exploring approach for multiple execution path in a single analysis environment by pipelining processes and showed the improvement of speed by 29% in 2-core and 70% in 4-core through experiment.

A Simulation Method For Virtual Situations Through Seamless Integration Of Independent Events Via Autonomous And Independent Agents

  • Park, Jong Hee;Choi, Jun Seong
    • International Journal of Contents
    • /
    • v.14 no.3
    • /
    • pp.7-16
    • /
    • 2018
  • The extent and depth of the event plan determines the scope of pedagogical experience in situations and consequently the quality of immersive learning based on our simulated world. In contrast to planning in conventional narrative-based systems mainly pursuing dramatic interests, planning in virtual world-based pedagogical systems strive to provide realistic experiences in immersed situations. Instead of story plot comprising predetermined situations, our inter-event planning method aims at simulating diverse situations that each involve multiple events coupled via their associated agents' conditions and meaningful associations between events occurring in a background world. The specific techniques to realize our planning method include, two-phase planning based on inter-event search and intra-event decomposition (down to the animated action level); autonomous and independent agents to behave proactively with their own belief and planning capability; full-blown background world to be used as the comprehensive stage for all events to occur in; coupling events via realistic association types including deontic associations as well as conventional causality; separation of agents from event roles; temporal scheduling; and parallel and concurrent event progression mechanism. Combining all these techniques, diverse exogenous events can be derived and seamlessly (i.e., semantically meaningfully) integrated with the original event to form a wide scope of situations providing chances of abundant pedagogical experiences. For effective implementation of plan execution, we devise an execution scheme based on multiple priority queues, particularly to realize concurrent progression of many simultaneous events to simulate its corresponding reality. Specific execution mechanisms include modeling an action in terms of its component motions, adjustability of priority for agent across different events, and concurrent and parallel execution method for multiple actions and its expansion for multiple events.

Improving Multi-DNN Computational Performance of Embedded Multicore Processors through a Global Queue (글로벌 큐를 통한 임베디드 멀티코어 프로세서의 멀티 DNN 연산 성능 향상)

  • Cho, Ho-jin;Kim, Myung-sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.6
    • /
    • pp.714-721
    • /
    • 2020
  • DNN is expanding its use in embedded systems such as robots and autonomous vehicles. For high recognition accuracy, computational complexity is greatly increased, and multiple DNNs are running aperiodically. Therefore, the ability processing multiple DNNs in embedded environments is a crucial issue. Accordingly, multicore based platforms are being released. However, most DNN models are operated in a batch process, and when multiple DNNs are operated in multicore together, the execution time deviation between each DNN may be large and the end-to-end execution time of the whole DNNs could be long depending on how they are allocated to the cores. In this paper, we solve these problems by providing a framework that decompose each DNN into individual layers and then distribute to multicores through a global queue. As a result of the experiment, the total DNN execution time was reduced by 31%, and when operating multiple identical DNNs, the deviation in execution time was reduced by up to 95.1%.

KAWS: Coordinate Kernel-Aware Warp Scheduling and Warp Sharing Mechanism for Advanced GPUs

  • Vo, Viet Tan;Kim, Cheol Hong
    • Journal of Information Processing Systems
    • /
    • v.17 no.6
    • /
    • pp.1157-1169
    • /
    • 2021
  • Modern graphics processor unit (GPU) architectures offer significant hardware resource enhancements for parallel computing. However, without software optimization, GPUs continuously exhibit hardware resource underutilization. In this paper, we indicate the need to alter different warp scheduler schemes during different kernel execution periods to improve resource utilization. Existing warp schedulers cannot be aware of the kernel progress to provide an effective scheduling policy. In addition, we identified the potential for improving resource utilization for multiple-warp-scheduler GPUs by sharing stalling warps with selected warp schedulers. To address the efficiency issue of the present GPU, we coordinated the kernel-aware warp scheduler and warp sharing mechanism (KAWS). The proposed warp scheduler acknowledges the execution progress of the running kernel to adapt to a more effective scheduling policy when the kernel progress attains a point of resource underutilization. Meanwhile, the warp-sharing mechanism distributes stalling warps to different warp schedulers wherein the execution pipeline unit is ready. Our design achieves performance that is on an average higher than that of the traditional warp scheduler by 7.97% and employs marginal additional hardware overhead.

A New Minimizing Algorithm for Design the PLA of Multiple Output Combinational Circuits (다출력조합회로의 PLA설계를 위한 간소화 알고리즘)

  • Lee, Sung Woo;Hwang, Ho Jung
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.23 no.3
    • /
    • pp.357-363
    • /
    • 1986
  • In the design of PLA's of VLSI, as the number of subsets of functions from which common preme implicants must be determined increases, the execution time increases by a factor of O(2\ulcorner. When the number of functions N is a large number, this poses a serious problem in minumization of multiple-output logic functions. In this paper a new algorithm that minimizes multiple-output logic functions is proposed. The algorithm requires less number of Fortran statements, less execution time, and less memory area than existing methods. The bases of this algorithm are explained and verified, and the sequential operation for preparation of the program is discussed.

  • PDF

GAGPC : An Algorithm to Optimize Multiple Continuous Queries on Data Streams (GAGPC : 데이타 스트림에 대한 다중 연속 질의의 최적화 알고리즘)

  • Suh Young-Kyoon;Son Jin-Hyun;Kim Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.4
    • /
    • pp.409-422
    • /
    • 2006
  • In general, there can be many reusable intermediate results due to the overlapped windows and periodic execution intervals among Multiple Continuous Queries (MCQ) on data streams. In this regard, we propose an efficient greedy algorithm for a global query plan construction, called GAGPC. GAGPC first decides an execution cycle and finds the maximal Set(s) of Related execution Points (SRP). Next, GAGPC constructs a global execution plan to make MCQ share common join-fragments with the highest benefit in each SRP. The algorithm suggests that the best plan of the same continuous queries may be different according to not only the existence of common expressions, but the size of overlapped windows related to them. It also reflects to reuse not only the whole but partial intermediate results unlike previous work. Finally, we show experimental results for the validation of GAGPC.

A Data-Driven Query Processing Method for Stream Data (스트림 데이터를 위한 데이터 구동형 질의처리 기법)

  • Min, Mee-Kyung
    • Journal of Digital Contents Society
    • /
    • v.8 no.4
    • /
    • pp.541-546
    • /
    • 2007
  • Traditional query processing method is not efficient for continuous queries with large continuous stream data. This paper proposes a data-driven query processing method for stream data. The structure of query plan and query execution method are presented. With the proposed method, multiple query processing and sharing among queries can be achieved. Also query execution time can be reduced by storing partial results of query execution. This paper showed an example of query processing with XML data and XQuery query.

  • PDF

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

  • Oh, Jaeg-Eun;Hwang, Seok-Joong;Nguyen, Huong Giang;Kim, A-Reum;Kim, Seon-Wook;Kim, Chul-Woo;Kim, Jong-Kook
    • ETRI Journal
    • /
    • v.30 no.4
    • /
    • pp.576-586
    • /
    • 2008
  • In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

  • PDF

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.