• Title/Summary/Keyword: Interconnection Architecture

Search Result 114, Processing Time 0.023 seconds

Cycle Extendability of Torus Sub-Graphs in the Enhanced Pyramid Network (개선된 피라미드 네트워크에서 토러스 부그래프의 사이클 확장성)

  • Chang, Jung-Hwan
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.8
    • /
    • pp.1183-1193
    • /
    • 2010
  • The pyramid graph is well known in parallel processing as a interconnection network topology based on regular square mesh and tree architectures. The enhanced pyramid graph is an alternative architecture by exchanging mesh into the corresponding torus on the base for upgrading performance than the pyramid. In this paper, we adopt a strategy of classification into two disjoint groups of edges in regular square torus as a basic sub-graph constituting of each layer in the enhanced pyramid graph. Edge set in the torus graph is considered as two disjoint sub-sets called NPC(represents candidate edge for neighbor-parent) and SPC(represents candidate edge for shared-parent) whether the parents vertices adjacent to two end vertices of the corresponding edge have a relation of neighbor or sharing in the upper layer of the enhanced pyramid graph. In addition, we also introduce a notion of shrink graph to focus only on the NPC-edges by hiding SPC-edges within the shrunk super-vertex on the resulting shrink graph. In this paper, we analyze that the lower and upper bounds on the number of NPC-edges in a Hamiltonian cycle constructed on $2^n{\times}2^n$ torus is $2^{2n-2}$ and $3{\cdot}2^{2n-2}$ respectively. By expanding this result into the enhanced pyramid graph, we also prove that the maximum number of NPC-edges containable in a Hamiltonian cycle is $4^{n-1}$-2n+1 in the n-dimensional enhanced pyramid.

An Emulation System for Efficient Verification of ASIC Design (ASIC 설계의 효과적인 검증을 위한 에뮬레이션 시스템)

  • 유광기;정정화
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.10
    • /
    • pp.17-28
    • /
    • 1999
  • In this paper, an ASIC emulation system called ACE (ASIC Emulator) is proposed. It can produce the prototype of target ASIC in a short time and verify the function of ASIC circuit immediately The ACE is consist of emulation software in which there are EDIF reader, library translator, technology mapper, circuit partitioner and LDF generator and emulation hardware including emulation board and logic analyzer. Technology mapping is consist of three steps such as circuit partitioning and extraction of logic function, minimization of logic function and grouping of logic function. During those procedures, the number of basic logic blocks and maximum levels are minimized by making the output to be assigned in a same block sharing product-terms and input variables as much as possible. Circuit partitioner obtain chip-level netlists satisfying some constraints on routing structure of emulation board as well as the architecture of FPGA chip. A new partitioning algorithm whose objective function is the minimization of the number of interconnections among FPGA chips and among group of FPGA chips is proposed. The routing structure of emulation board take the advantage of complete graph and partial crossbar structure in order to minimize the interconnection delay between FPGA chips regardless of circuit size. logic analyzer display the waveform of probing signal on PC monitor that is designated by user. In order to evaluate the performance of the proposed emulation system, video Quad-splitter, one of the commercial ASIC, is implemented on the emulation board. Experimental results show that it is operated in the real time of 14.3MHz and functioned perfectly.

  • PDF

Mechanical Performance Evaluation of Cement Paste with Foaming Agent using FEM Analysis Based on Picture Image (화상 이미지 기반 FEM 해석을 이용한 기포제 혼입 시멘트 페이스트의 역학 성능 평가)

  • Kim, Bo-Seok;Shin, Jun-Ho;Lee, Han-Seung
    • Journal of the Korea Institute of Building Construction
    • /
    • v.16 no.1
    • /
    • pp.35-43
    • /
    • 2016
  • Concrete is a representative heterogeneous material and mechanical properties of concrete are influenced by various factors. Due to the fact that pores in concrete affect determining compressive strength of concrete, studies which deal with distribution and magnitudes of pores are very important. That way, studies using picture imaging have been emerged. Studies on mechanical performance evaluation of structural lightweight foamed concrete and FEM analysis based on picture image are inadequate because lightweight foamed concrete has been researched for only non-structural. Therefore, in this study, cement paste with foaming agent to evaluate mechanical performance is made, FEM analysis with picture image is conducted and young's modulus of experiment and analysis are compared. In this study, dosage of foaming agent is determined 7 level to check pore distribution and water-binder ratio is determined 20% to progress research about structural light weight foamed concrete. Weight of unit volume is minimum at 0.8% of foaming agent dosage. However, weight of unit volume is increased over 0.8% of foaming agent dosage because of interconnection with independent pores. For FEM analysis, cement paste is photographed to use image analyzer(HF-MA C01). Consequently, the fact that Young's Modulus of experiment and FEM analysis are same is drawn by using OOF(Object Oriented Finite elements).

A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization (GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법)

  • Thuan, Do Cong;Choi, Yong;Kim, Jong Myon;Kim, Cheol Hong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.5
    • /
    • pp.219-230
    • /
    • 2017
  • General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism. By using programming models like CUDA, and OpenCL, GPGPUs are becoming the best in exploiting plentiful thread-level parallelism caused by parallel applications. Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications. One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance. This paper studies the effects of hardware thread scheduling policy on GPGPU performance. We propose a novel warp scheduling policy that can alleviate the drawbacks of the traditional round-robin policy. The proposed warp scheduler first classifies the warps of a thread block into two groups, warps with long latency and warps with short latency and then schedules the warps with long latency before the warps with short latency. Furthermore, to support the proposed warp scheduler, we also propose a supplemental technique that can dynamically reduce the number of streaming multiprocessors to which will be assigned thread blocks when encountering a high contention degree at the memory and interconnection network. Based on our experiments on a 15-streaming multiprocessor GPGPU platform, the proposed warp scheduling policy provides an average IPC improvement of 7.5% over the baseline round-robin warp scheduling policy. This paper also shows that the GPGPU performance can be improved by approximately 8.9% on average when the two proposed techniques are combined.