• Title/Summary/Keyword: manycore system

Search Result 11, Processing Time 0.021 seconds

Research Status and Plan for Manycore Operating System (매니코어 운영체제 연구현황 및 계획)

  • Jung, Sungin;Kim, Taesoo;Min, Changwoo;Park, Sungyong;Byun, Sugwoo;Seo, Euiseong;Woo, Gyun;Lee, Kyoungwoo;Lee, Jaewook;Rim, Sung-Soo;Im, Eun-Jin;Jo, Heeseung;Jin, Hyun-Wook
    • Electronics and Telecommunications Trends
    • /
    • v.32 no.6
    • /
    • pp.83-95
    • /
    • 2017
  • The trend of manycore hardware has recently evolved more quickly than expected. However, an operating system, which is software used for managing computer resources, is still optimized for a multicore system. To handle this issue, we started a research project called 'Research on High Performance and Scalable Manycore Operating Systems' in 2014. This article briefly examines the technology trends of manycore hardware and operating systems, and introduces the research areas and outcomes during the first stage of the project(2014-2017). The core technologies improving the performance scalability of manycore systems are publicly available, and anyone can use the source code or apply the ideas of the core technique to other research activities. In addition, the research plans of the second stage of the project(2018-2021) are also included.

Trends in Unikernel and Its Application to Manycore Systems (유니커널의 동향과 매니코어 시스템에 적용)

  • Cha, S.J.;Jeon, S.H.;Ramneek, Ramneek;Kim, J.M.;Jeong, Y.J.;Jung, S.I.
    • Electronics and Telecommunications Trends
    • /
    • v.33 no.6
    • /
    • pp.129-138
    • /
    • 2018
  • As recent applications are requiring more CPUs for their performance, manycore systems have evolved. Since existing operating systems do not provide performance scalability in manycore systems, Azalea, a multi-kernel based system, has been developed for supporting performance scalability. Unikernel is a new operating system technology starting with the concept of a library OS. Applying unikernel to Azalea enables an improvement in performance. In this paper, we first analyze the current technology trends of unikernel, and then discuss the applications and effects of unikernel to Azalea. Azalea-unikernel was built in a single image consisting of libOS, runtime libraries, and an application, and executed with the desired number of cores and memory size in bare-metal. In particular, it supports source and binary compatibility such that existing linux binaries can be rebuilt and executed in Azalea-unikernel, and already built binaries can be run immediately without modification with a better performance. It not only achieves a performance enhancement, it is also a more secure OS for manycore systems.

Area-constrained NTC Manycore Architecture Design Methodology (면적 제약 조건을 고려한 NTC 매니코어 설계 방법론)

  • Chang, Jin Kyu;Han, Tae Hee
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.866-869
    • /
    • 2015
  • With the advance in semiconductor technology, the number of elements that can be integrated in system-on-chip(SoC) increases exponentially, and thus voltage scaling is indispensable to enhance energy efficiency. Near-threshold voltage computing(NTC) improves the energy efficiency by an order of degree, hence it is able to overcome the limitation of conventional super-threshold voltage computing(STC). Although NTC-based low performance manycore system can be used to maximize energy efficiency, it demands more number of cores to sustain the performance, which results in considerable increase of area. In this paper, we analyze NTC manycore architecture considering the trade-offs between performance, power, and area. Therefore, we propose an algorithmic methodology that can optimize power consumption and area while satisfying the required performance by determining the constrained number of cores and size of caches and clusters in NTC environment. Experimental results show that proposed NTC architecture can reduce power consumption by approximately 16.5 % while maintaining the performance of STC core under area constraint.

  • PDF

System-Call-Level Core Affinity for Improving Network Performance (네트워크 성능향상을 위한 시스템 호출 수준 코어 친화도)

  • Uhm, Junyong;Cho, Joong-Yeon;Jin, Hyun-Wook
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.80-84
    • /
    • 2017
  • Existing operating systems experience scalability issues as the number of cores increases. The network I/O performance on manycore systems is faced with the major limiting factors of cache consistency costs and locking overheads. Legacy methods resolve this issue include the new microkernel-like operating system or modification of existing kernels; however, these solutions are not fully application transparent. In this study, we proposed a library that improves the network performance by separating system call context from user context and by applying the core affinity without any kernel and application modifications. Experiment results showed that our implementation can improve the network throughput of Apache by up to 30%.

Energy-efficient Custom Topology Generation for Link-failure-aware Network-on-chip in Voltage-frequency Island Regime

  • Li, Chang-Lin;Yoo, Jae-Chern;Han, Tae Hee
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.6
    • /
    • pp.832-841
    • /
    • 2016
  • The voltage-frequency island (VFI) design paradigm has strong potential for achieving high energy efficiency in communication centric manycore system-on-chip (SoC) design called network-on-chip (NoC). However, because of the diminished scaling of wire-dimension and supply voltage as well as threshold voltage in modern CMOS technology, the vulnerability to link failure in VFI NoC is becoming a crucial challenge. In this paper, we propose an energy-optimized topology generation technique for VFI NoC to cope with permanent link failures. Based on the energy consumption model, we exploit the on-chip communication traffic patterns and characteristics of link failures in the early design stage to accommodate diverse applications and architectures. Experimental results using a number of multimedia application benchmarks show the effectiveness of the proposed three-step custom topology generation method in terms of energy consumption and latency without any degradation in the fault coverage metric.

The Effect of Mesh Interconnection Network on the Performance of Manycore System. (다중코어 시스템의 메쉬구조 상호연결망이 성능에 미치는 영향)

  • Kim, Han-Yee;Kim, Young-Hwan;Suh, Taeweon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.116-119
    • /
    • 2011
  • 다중코어(Many-Core) 시스템은 많은 코어들이 상호연결망을 통해서 연결되어있는 시스템으로, 단일코어나 멀티코어 시스템에 비해 보다 많은 병렬 컴퓨팅 자원을 지원한다. Amdahl 의 법칙에 의하면 병렬화되어 처리하는 부분은 이론적으로 프로세서의 개수에 비례하게 가속화 될 수 있지만, 상호연결망에서의 전송 지연을 비롯한 많은 요인에 의해서 성능의 가속화가 저해된다. 특히 캐시 일관성 규약(Cache Coherence Protocol)을 지원하는 대부분의 다중코어 시스템에서는 병렬화를 함에 있어서 캐시 미스로 인해 발생하는 데이터의 전송 지연이 성능에 많은 영향을 미칠 수 있다. 따라서 효과적인 병렬 프로그램을 위해서는 캐시 구조에 대한 이해를 바탕으로 상호연결망에 대한 연구가 필요하다. 본 논문에서는 메쉬(Mesh) 구조의 64 코어 다중코어 시스템인 TilePro64 를 이용하여 상호연결망의 데이터 전송 지연에 따른 프로그램 성능의 민감도를 측정하였다. 결과적으로 코어간 거리(Hop)가 늘어날수록 작업의 수행시간이 평균적으로 4.27%씩 선형적으로 증가하는 관계가 있는 것으로 나타났다.

Study on LLVM application in Parallel Computing System (병렬 컴퓨팅 시스템에서 LLVM 응용 연구)

  • Cho, Jungseok;Cho, Doosan;Kim, Yongyeon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.395-399
    • /
    • 2019
  • In order to support various parallel computing systems, it is necessary to extend LLVM IR to more efficiently support vector / matrix and to design LLVM IR to machine code as a new algorithm. As shown in the IR example, RISC instruction generation is naturally generated because the RISC instruction is basically composed of the RISC instruction, and the vector instruction is also not supported. There is a need for new IR structures, command generation algorithms and related extensions to support vector / matrix more robustly. To do this, it is important to map each instruction in the LLVM IR to the appropriate instruction in the target architecture (vector / matrix) (instruction selection algorithm). It is necessary to understand the meaning of LLVM IR command, to compare the meaning of each instruction of the target architecture with syntax, and to select the instruction that matches the pattern to make mapping efficient.

Optimizing LRU Lock Management in the Linux Kernel for Improving Parallel Write Throughout in Many-Core CPU Systems (매니코어 CPU 시스템의 병렬 쓰기 성능 향상을 위한 리눅스 커널의 LRU 관리 최적화 기법)

  • Eun-Kyu Byun;Gibeom Gu;Kwang-Jin Oh;Jiwoo Bang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.7
    • /
    • pp.209-216
    • /
    • 2023
  • Modern HPC systems are equipped with many-core CPUs with dozens of cores. When performing parallel I/O in such a system, there is a limit to scalability due to the problem of the LRU lock management policy of the Linux system. The study proposes an improved FinerLRU to solve this problem. Our new FinerLRU improves the parallel write performance of file systems using the buffer cache through granular lock management by increasing the number of LRU locks upto the maximum number of cores. The proposed method was implemented in Linux 5.18.11, and the performance was measured on two types of CPUs, Intel Icelake Xeon and Intel Knights landing, with different characteristics, and it was found that a performance improvement of about two times can be obtained in both types of systems.

Enhancing the Performance of Multiple Parallel Applications using Heterogeneous Memory on the Intel's Next-Generation Many-core Processor (인텔 차세대 매니코어 프로세서에서의 다중 병렬 프로그램 성능 향상기법 연구)

  • Rho, Seungwoo;Kim, Seoyoung;Nam, Dukyun;Park, Geunchul;Kim, Jik-Soo
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.878-886
    • /
    • 2017
  • This paper discusses performance bottlenecks that may occur when executing high-performance computing MPI applications in the Intel's next generation many-core processor called Knights Landing(KNL), as well as effective resource allocation techniques to solve this problem. KNL is composed of a host processor to enable self-booting in addition to an existing accelerator consisting of a many-core processor, and it was released with a new type of on-package memory with improved bandwidth on top of existing DDR4 based memory. We empirically verified an improvement of the execution performance of multiple MPI applications and the overall system utilization ratio by studying a resource allocation method optimized for such new many-core processor architectures.

Analysis of Programming Techniques for Creating Optimized CUDA Software (최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석)

  • Kim, Sung-Soo;Kim, Dong-Heon;Woo, Sang-Kyu;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.775-787
    • /
    • 2010
  • Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.