• Title/Summary/Keyword: 매니코어 시스템

Search Result 35, Processing Time 0.024 seconds

Optimizing LRU Lock Management in the Linux Kernel for Improving Parallel Write Throughout in Many-Core CPU Systems (매니코어 CPU 시스템의 병렬 쓰기 성능 향상을 위한 리눅스 커널의 LRU 관리 최적화 기법)

  • Eun-Kyu Byun;Gibeom Gu;Kwang-Jin Oh;Jiwoo Bang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.7
    • /
    • pp.209-216
    • /
    • 2023
  • Modern HPC systems are equipped with many-core CPUs with dozens of cores. When performing parallel I/O in such a system, there is a limit to scalability due to the problem of the LRU lock management policy of the Linux system. The study proposes an improved FinerLRU to solve this problem. Our new FinerLRU improves the parallel write performance of file systems using the buffer cache through granular lock management by increasing the number of LRU locks upto the maximum number of cores. The proposed method was implemented in Linux 5.18.11, and the performance was measured on two types of CPUs, Intel Icelake Xeon and Intel Knights landing, with different characteristics, and it was found that a performance improvement of about two times can be obtained in both types of systems.

Enhancing the Performance of Multiple Parallel Applications using Heterogeneous Memory on the Intel's Next-Generation Many-core Processor (인텔 차세대 매니코어 프로세서에서의 다중 병렬 프로그램 성능 향상기법 연구)

  • Rho, Seungwoo;Kim, Seoyoung;Nam, Dukyun;Park, Geunchul;Kim, Jik-Soo
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.878-886
    • /
    • 2017
  • This paper discusses performance bottlenecks that may occur when executing high-performance computing MPI applications in the Intel's next generation many-core processor called Knights Landing(KNL), as well as effective resource allocation techniques to solve this problem. KNL is composed of a host processor to enable self-booting in addition to an existing accelerator consisting of a many-core processor, and it was released with a new type of on-package memory with improved bandwidth on top of existing DDR4 based memory. We empirically verified an improvement of the execution performance of multiple MPI applications and the overall system utilization ratio by studying a resource allocation method optimized for such new many-core processor architectures.

Implementation of an Optimal SIMD-based Many-core Processor for Sound Synthesis of Guitar (기타 음 합성을 위한 최적의 SIMD기반 매니코어 프로세서 구현)

  • Choi, Ji-Won;Kang, Myeong-Su;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.1-10
    • /
    • 2012
  • Improving operating frequency of processors is no longer today's issues; a multiprocessor technique which integrates many processors has received increasing attention. Currently, high-performance processors that integrate 64 or 128 cores are developing for large data processing over 2, 4, or 8 processor cores. This paper proposes an optimal many-core processor for synthesizing guitar sounds. Unlike the previous research in which a processing element (PE) was assigned to support one of guitar strings, this paper evaluates the impacts of mapping different numbers of PEs to one guitar string in terms of performance and both area and energy efficiencies using architectural and workload simulations. Experimental results show that the maximum area energy efficiencies were achieved at PEs=24 and 96, respectively, for synthesizing guitar sounds with sampling rate of 44.1kHz and 16-bit quantization. The synthesized sounds were very similar to original guitar sounds in their spectra. In addition, the proposed many-core processor was 1,235 and 22 times better than TI TMS320C6416 in area and energy efficiencies, respectively.

Area-constrained NTC Manycore Architecture Design Methodology (면적 제약 조건을 고려한 NTC 매니코어 설계 방법론)

  • Chang, Jin Kyu;Han, Tae Hee
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.866-869
    • /
    • 2015
  • With the advance in semiconductor technology, the number of elements that can be integrated in system-on-chip(SoC) increases exponentially, and thus voltage scaling is indispensable to enhance energy efficiency. Near-threshold voltage computing(NTC) improves the energy efficiency by an order of degree, hence it is able to overcome the limitation of conventional super-threshold voltage computing(STC). Although NTC-based low performance manycore system can be used to maximize energy efficiency, it demands more number of cores to sustain the performance, which results in considerable increase of area. In this paper, we analyze NTC manycore architecture considering the trade-offs between performance, power, and area. Therefore, we propose an algorithmic methodology that can optimize power consumption and area while satisfying the required performance by determining the constrained number of cores and size of caches and clusters in NTC environment. Experimental results show that proposed NTC architecture can reduce power consumption by approximately 16.5 % while maintaining the performance of STC core under area constraint.

  • PDF

Technology Trends of Runtime Systems to Realize High Performance Computing (고성능 컴퓨팅을 실현하는 런타임 시스템 기술 동향)

  • Kim, J.M.;Lee, J.J.;Choi, W.
    • Electronics and Telecommunications Trends
    • /
    • v.27 no.6
    • /
    • pp.124-133
    • /
    • 2012
  • 최근 산업의 발전으로 대규모 문제 해결의 요구가 커지고 사용자가 원하는 서비스를 신속하게 받고자 고성능 컴퓨팅에 대한 요구가 계속해서 증가하고 있다. 이에 따라 멀티코어 및 매니코어와 이종 하드웨어의 혼용 등으로 지속해서 발전하는 새로운 고성능 컴퓨팅을 위한 시스템은 컴퓨팅 패러다임을 바꿀 시스템 소프트웨어의 혁신 요소로 등장하였다. 하드웨어를 활용하여 시스템의 성능을 높이기 위해서는 컴퓨팅 요소 간의 통신을 최소화하여 전력 소모를 줄이고, 메모리 계층 구조 및 지역성을 고려하여 성능을 높이는 것이 필요하다. 특히, 응용의 실행 시에 시스템 자원을 최고로 활용할 수 있게 하여 성능을 높이는 런타임 시스템은 하드웨어 및 운영체제를 변경하지 않고 시스템 자원을 최대한 활용하여 성능 최적화를 이룰 수 있는 기술이다. 따라서 본고에서는 런타임 시스템의 기능과 기술 방향을 파악하여 차세대 런타임 시스템에 필요한 기술 및 연구 분야를 전망하고자 한다.

  • PDF

Design Space Exploration of Many-Core Processors for Ultrasonic Image Processing at Different Resolutions (다양한 해상도의 초음파 영상처리를 위한 매니코어 프로세서의 디자인 공간 탐색)

  • Kang, Sung-Mo;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.19A no.3
    • /
    • pp.121-128
    • /
    • 2012
  • This paper explores the optimal processing element (PE) configuration for ultrasonic image processing at different resolutions ($256{\times}256$, $768{\times}1,024$, and $1,024{\times}1,280$). To determine the optimal PE configuration, this paper evaluates the impacts of a data-per-processing element (DPE) ratio that is defined as the amount of image data directly mapped to each PE on system performance and both energy and area efficiencies using architectural and workload simulations. This paper illustrates the correlation between DPE ratio and PE architecture for a target implementation in 130nm technology. To identify the most efficient PE structure, seven different PE configurations were simulated for ultrasonic image processing. Experimental results indicate that the highest energy efficiencies were achieved at PEs=1,024, 4,096, and 16,384 for ultrasonic images at $256{\times}256$, $768{\times}1,024$, $1,024{\times}1,280$ resolutions, respectively. Furthermore, the maximum area efficiencies were yielded at PEs=256 ($256{\times}256$ image) and 4,096 ($768{\times}1,024$ and $1,024{\times}1,280$ images), respectively.