• Title/Summary/Keyword: ILP

Search Result 99, Processing Time 0.023 seconds

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

  • Oh, Jaeg-Eun;Hwang, Seok-Joong;Nguyen, Huong Giang;Kim, A-Reum;Kim, Seon-Wook;Kim, Chul-Woo;Kim, Jong-Kook
    • ETRI Journal
    • /
    • v.30 no.4
    • /
    • pp.576-586
    • /
    • 2008
  • In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

  • PDF

Implementing Swing Modulo Scheduler for VLIW Processor (VLIW 프로세서를 위한 Swing Modulo Scheduler 구현)

  • Shin, Jangseop;Han, Sangjun;Jung, Hyungyun;Ahn, Minwook;Youn, Jonghee M.;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.12-14
    • /
    • 2014
  • 하드웨어가 해저드(hazard) 검출을 지원하지 않는 멀티이슈 VLIW 프로세서의 성능을 높이기 위해서는 컴파일러가 명령어 의존성과 하드웨어 자원의 제약을 지키는 범위 안에서 최대한 명령어수준 병렬성(ILP)을 활용하는 것이 중요하다. 기본 블록(basic block) 스케쥴링은 Branch 등 제어 흐름(control flow)의 경계를 넘어선 스케쥴링을 행하지 않아 그 효과가 제한적이다. 소프트웨어 파이프라이닝(software pipelining)은 루프(loop)의 경계를 허물어 여러반본(iteration)의 명령어가 동시에 수행되도록 하는 것으로 모듈로 스케쥴링(modulo scheduling)은 그 중에 한 범주의 스케쥴링 기법들을 일컫는다. 본 연구에서는 그 중 한가지인 스윙 모듈로 스케쥴러(swing modulo scheduler)[1]를 구현하여 그 효과를 알아보고자 한다.

IRSML: An intelligent routing algorithm based on machine learning in software defined wireless networking

  • Duong, Thuy-Van T.;Binh, Le Huu
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.733-745
    • /
    • 2022
  • In software-defined wireless networking (SDWN), the optimal routing technique is one of the effective solutions to improve its performance. This routing technique is done by many different methods, with the most common using integer linear programming problem (ILP), building optimal routing metrics. These methods often only focus on one routing objective, such as minimizing the packet blocking probability, minimizing end-to-end delay (EED), and maximizing network throughput. It is difficult to consider multiple objectives concurrently in a routing algorithm. In this paper, we investigate the application of machine learning to control routing in the SDWN. An intelligent routing algorithm is then proposed based on the machine learning to improve the network performance. The proposed algorithm can optimize multiple routing objectives. Our idea is to combine supervised learning (SL) and reinforcement learning (RL) methods to discover new routes. The SL is used to predict the performance metrics of the links, including EED quality of transmission (QoT), and packet blocking probability (PBP). The routing is done by the RL method. We use the Q-value in the fundamental equation of the RL to store the PBP, which is used for the aim of route selection. Concurrently, the learning rate coefficient is flexibly changed to determine the constraints of routing during learning. These constraints include QoT and EED. Our performance evaluations based on OMNeT++ have shown that the proposed algorithm has significantly improved the network performance in terms of the QoT, EED, packet delivery ratio, and network throughput compared with other well-known routing algorithms.

Bus and Registor Optimization in Datapath Synthesis (데이터패스 합성에서의 버스와 레지스터의 최적화 기법)

  • Sin, Gwan-Ho;Lee, Geun-Man
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2196-2203
    • /
    • 1999
  • This paper describes the bus scheduling problem and register optimization method in datapath synthesis. Scheduling is process of operation allocation to control steps in order to minimize the cost function under the given circumstances. For that purpose, we propose some formulations to minimize the cost function for bus assignment to get an optimal and minimal cost function in hardware allocations. Especially, bus and register minimization technique are fully considered which are the essential topics in hardware allocation. Register scheduling is done after the operation and bus scheduling. Experiments are done with the DFG model of fifth-order digital ware filter to show its effectiveness. Structural integer programming formulations are used to solve the scheduling problems in order to get the optimal scheduling results in the integer linear programming environment.

  • PDF

Measurement and Analysis of Power Dissipation of Value Speculation in Superscalar Processors (슈퍼스칼라 프로세서에서 값 예측을 이용한 모험적 실행의 전력소모 측정 및 분석)

  • 이상정;이명근;신화정
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.12
    • /
    • pp.724-735
    • /
    • 2003
  • In recent high-performance superscalar processors, the result value of an instruction is predicted to improve instruction-level parallelism by breaking data dependencies. Using those predicted values, instructions are speculatively executed and substantial performance can be gained. It, however, requires additional power consumption due to the frequent access and update of the value prediction table. In this paper, first, the trade-off between the performance improvement and the increased power consumption for value prediction is measured and analyzed. And, in order to reduce additional power consumption without performance loss, the technique of controlling speculative execution with confidence counter and predicting useful instructions is developed. Also, in order to prove the validity, a tool is developed that can simulate processor behavior at cycle-level and measure total energy consumption and power consumption per cycle.

A Novel Shared Segment Protection Algorithm for Multicast Sessions in Mesh WDM Networks

  • Lu, Cai;Luo, Hongbin;Wang, Sheng;Li, Lemin
    • ETRI Journal
    • /
    • v.28 no.3
    • /
    • pp.329-336
    • /
    • 2006
  • This paper investigates the problem of protecting multicast sessions in mesh wavelength-division multiplexing (WDM) networks against single link failures, for example, a fiber cut in optical networks. First, we study the two characteristics of multicast sessions in mesh WDM networks with sparse light splitter configuration. Traditionally, a multicast tree does not contain any circles, and the first characteristic is that a multicast tree has better performance if it contains some circles. Note that a multicast tree has several branches. If a path is added between the leave nodes on different branches, the segment between them on the multicast tree is protected. Based the two characteristics, the survivable multicast sessions routing problem is formulated into an Integer Linear Programming (ILP). Then, a heuristic algorithm, named the adaptive shared segment protection (ASSP) algorithm, is proposed for multicast sessions. The ASSP algorithm need not previously identify the segments for a multicast tree. The segments are determined during the algorithm process. Comparisons are made between the ASSP and two other reported schemes, link disjoint trees (LDT) and shared disjoint paths (SDP), in terms of blocking probability and resource cost on CERNET and USNET topologies. Simulations show that the ASSP algorithm has better performance than other existing schemes.

  • PDF

A Study on the Preliminary 3-D Structure Model around East Sea and Its Vicinity

  • 조봉곤;이우동;황의홍
    • Proceedings of the International Union of Geodesy And Geophysics Korea Journal of Geophysical Research Conference
    • /
    • 2003.05a
    • /
    • pp.16-16
    • /
    • 2003
  • 본 연구는 ILP(International Lithosphere Project) Task Group II-4가 진행하고 있는 상부맨틀에 대한 3차원 구조도 작성 연구의 일환으로 수행되어졌으며 구조도 작성을 위한 데이터 베이스의 구조는 task group의 표준안을 따랐다. 기존 문헌과 기존의 데이터 베이스를 통해서 획득된 자료를 이용해 동해와 그 주변을 대상으로 하는 지역의 ($32-45^{\circ}$E, $122-148^{\circ}$N) 상부 670km까지의 3차원 구조도 작성을 위한 초기 모델을 구축하였으며, 이 절차를 최대한 자동화하는 프로그램을 포트란을 이용해 만들어보았다. 연구 지역에 대한 곡율을 계산하기 위해 표준타원체 모델인 WGS84과 geoid undulation 모델인 EGM96을 사용했으며 지형 고도 자료는 GTOPO30, GLOBE 1.0, 그리고 Smith and Sandwell 데이터베이스를 사용하여 지구 중심으로부터 지표까지의 거리를 구하였다. 연구지역은 $0.25^{\circ}$간격으로 나누었으며 총 5777개의 격자점을 정의하였으며 각각의 격자점에 1차원 수직구조를 부여함으로써 3차원 모델을 구축하였다. 그리고 지형적으로나 지질학적으로 유사한 지역을 하나의 구역으로 정의하고 동일한 수직구조를 부여함으로써 모든 격자점에 1차원 수직구조를 정의하지 않도록 하였다. 본 연구에서는 지표 지질은 모델에 고려하지 않았지만 지형학적으로 의미가 있는 분지나 수평적으로 불균질성이 뚜렷한 지역을 중심으로 연구 지역의 리젼을 정의하였다. 중요 리젼에 대한 지각구조에 대해서는 기존의 문헌을 통해 모델치를 정의하였으며 지각 하부부터 상부 670km에 대한 부분은 Task Group에서 제시한 표준 모델을 이용했다. 모델을 정의하기 위해 주어진 격자점에 대한 지구 중심으로부터 지오이드까지의 거리, 지오이드로부터 지표까지의 거리를 정의해주었으며, 각 격자점의 수직구조를 정의하기 위해 깊이에 따른 각 매질의 밀도, P파의 속도, S파의 속도, P파에 대한 Q값, S파에 대한 Q값을 정의 해주었다. S파의 속도를 구하기 위해서 지구 내부 물질을 포아송 매질이라는 가정 하에, 관계식을 $Vp{\;}={\;}SQRT(3){\;}{\times}{\;}Vs$ 이용하였다. 획득한 모델치들을 이용해 동해와 동해 인근 지역에 대한 초기모델을 구축하였다.

  • PDF

A Hybrid Value Predictor using Speculative Update of the Predictor Table and Static Classification for the Pattern of Executed Instructions in Superscalar Processors (슈퍼스칼라 프로세서에서 예상 테이블의 모험적 갱신과 명령어 실행 유형의 정적 분류를 이용한 혼합형 결과값 예측기)

  • Park, Hong-Jun;Jo, Young-Il
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.107-115
    • /
    • 2002
  • We propose a new hybrid value predictor which achieves high performance by combining several predictors. Because the proposed hybrid value predictor can update the prediction table speculatively, it efficiently reduces the number of mispredicted instructions due to stale data. Also, the proposed predictor can enhance the prediction accuracy and efficiently decrease the hardware cost of predictor, because it allocates instructions into the best-suited predictor during instruction fetch stage by using the information of static classification which is obtained from the profile-based compiler implementation. For the 16-issue superscalar processors, simulation results based on the SimpleScalar/PISA tool set show that we achieve the average prediction rates of 73% by using speculative update and the average prediction rates of 88% by adding static classification for the SPECint95 benchmark programs.

A Hybrid Value Predictor using Static and Dynamic Classification in Superscalar Processors (슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 값 예측기)

  • 김주익;박홍준;조영일
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.10
    • /
    • pp.569-578
    • /
    • 2003
  • Data dependencies are one of major hurdles to limit ILP(Instruction Level Parallelism), so several related works have suggested that the limit imposed by data dependencies can be overcome to some extent with use of the data value prediction. Hybrid value predictor can obtain the high prediction accuracy using advantages of various predictors, but it has a defect that same instruction has overlapping entries in all predictor. In this paper, we propose a new hybrid value predictor which achieves high performance by using the information of static and dynamic classification. The proposed predictor can enhance the prediction accuracy and efficiently decrease the prediction table size of predictor, because it allocates each instruction into single best-suited predictor during the fetch stage by using the information of static classification. Also, it can enhance the prediction accuracy because it selects a best- suited prediction method for the “Unknown”pattern instructions by using the dynamic classification mechanism. Simulation results based on the SimpleScalar/PISA tool set and the SPECint95 benchmarks show the average correct prediction rate of 85.1% by using the static classification mechanism. Also, we achieve the average correction prediction rate of 87.6% by using static and dynamic classification mechanism.