• Title/Summary/Keyword: 매니코어 프로세서

Search Result 30, Processing Time 0.026 seconds

High Performance Message Scattering Algorithm in Multicore Processor (멀티코어 프로세서에서의 효율적인 메시지 스캐터링 지원 기법)

  • Park, Jongsu
    • Journal of Platform Technology
    • /
    • v.10 no.2
    • /
    • pp.3-9
    • /
    • 2022
  • In this paper, to maximize the performance of the scatter communication in multi-core and many-core processors, a technique that considers the communication situation of the processing node is applied to a multi-core processor composed of 32 processing nodes. Since the existing scatter algorithm cannot recognize the communication conditions of the processing nodes, communication is generally performed according to an initially set transmission order. In this case, scatter communication starts only after the communication currently being performed by all processing nodes inside the processor is finished. The scatter communication performance was improved by this technique, and it was confirmed that there was a performance improvement of up to 78.93% compared to the existing algorithm through BFM simulation.

Design Space Exploration of Many-Core Processor for High-Speed Cluster Estimation (고속의 클러스터 추정을 위한 매니코어 프로세서의 디자인 공간 탐색)

  • Seo, Jun-Sang;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.1-12
    • /
    • 2014
  • This paper implements and improves the performance of high computational subtractive clustering algorithm using a single instruction, multiple data (SIMD) based many-core processor. In addition, this paper implements five different processing element (PE) architectures (PEs=16, 64, 256, 1,024, 4,096) to select an optimal PE architecture for the subtractive clustering algorithm by estimating execution time and energy efficiency. Experimental results using two different medical images and three different resolutions ($128{\times}128$, $256{\times}256$, $512{\times}512$) show that PEs=4,096 achieves the highest performance and energy efficiency for all the cases.

Design Space Exploration of Many-Core Processors for Ultrasonic Image Processing at Different Resolutions (다양한 해상도의 초음파 영상처리를 위한 매니코어 프로세서의 디자인 공간 탐색)

  • Kang, Sung-Mo;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.19A no.3
    • /
    • pp.121-128
    • /
    • 2012
  • This paper explores the optimal processing element (PE) configuration for ultrasonic image processing at different resolutions ($256{\times}256$, $768{\times}1,024$, and $1,024{\times}1,280$). To determine the optimal PE configuration, this paper evaluates the impacts of a data-per-processing element (DPE) ratio that is defined as the amount of image data directly mapped to each PE on system performance and both energy and area efficiencies using architectural and workload simulations. This paper illustrates the correlation between DPE ratio and PE architecture for a target implementation in 130nm technology. To identify the most efficient PE structure, seven different PE configurations were simulated for ultrasonic image processing. Experimental results indicate that the highest energy efficiencies were achieved at PEs=1,024, 4,096, and 16,384 for ultrasonic images at $256{\times}256$, $768{\times}1,024$, $1,024{\times}1,280$ resolutions, respectively. Furthermore, the maximum area efficiencies were yielded at PEs=256 ($256{\times}256$ image) and 4,096 ($768{\times}1,024$ and $1,024{\times}1,280$ images), respectively.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

Performance Comparison between Haskell Eval Monad and Cloud Haskell (Haskell Eval 모나드와 Cloud Haskell 간의 성능 비교)

  • Kim, Yeoneo;An, Hyungjun;Byun, Sugwoo;Woo, Gyun
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.791-802
    • /
    • 2017
  • Competition in the modern CPU market has shifted from speeding up the clock speed of a single core to increasing the number of cores. As such, there is a growing interest in parallel programming to maximize the use of resources of many core processors. In this paper, we propose parallel programming models in Haskell to find an advisable parallel programming model for many-core environments. Specifically, we used Eval monad and Cloud Haskell to develop two versions of parallel programs: plagiarism detection and K-means. Then, we evaluated the performance of the developed programs in 32-core and 120-core environments. The results of our experiment show that the Eval monad is highly efficient in an environment with a small number of cores. On the other hand, the Cloud Haskell runtime shows 37% improvement over Eval monad and the scalability shows a 134% improvement over Eval monad as the number of cores increases.

Using the On-Package Memory of Manycore Processor for Improving Performance of MPI Intra-Node Communication (MPI 노드 내 통신 성능 향상을 위한 매니코어 프로세서의 온-패키지 메모리 활용)

  • Cho, Joong-Yeon;Jin, Hyun-Wook;Nam, Dukyun
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.124-131
    • /
    • 2017
  • The emerging next-generation manycore processors for high-performance computing are equipped with a high-bandwidth on-package memory along with the traditional host memory. The Multi-Channel DRAM (MCDRAM), for example, is the on-package memory of the Intel Xeon Phi Knights Landing (KNL) processor, and theoretically provides a four-times-higher bandwidth than the conventional DDR4 memory. In this paper, we suggest a mechanism to exploit MCDRAM for improving the performance of MPI intra-node communication. The experiment results show that the MPI intra-node communication performance can be improved by up to 272 % compared with the case where the DDR4 is utilized. Moreover, we analyze not only the performance impact of different MCDRAM-utilization mechanisms, but also that of core affinity for processes.

Multimedia Extension Instructions and Optimal Many-core Processor Architecture Exploration for Portable Ultrasonic Image Processing (휴대용 초음파 영상처리를 위한 멀티미디어 확장 명령어 및 최적의 매니코어 프로세서 구조 탐색)

  • Kang, Sung-Mo;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.8
    • /
    • pp.1-10
    • /
    • 2012
  • This paper proposes design space exploration methodology of many-core processors including multimedia specific instructions to support high-performance and low power ultrasound imaging for portable devices. To explore the impact of multimedia instructions, we compare programs using multimedia instructions and baseline programs with a same many-core processor in terms of execution time, energy efficiency, and area efficiency. Experimental results using a $256{\times}256$ ultrasound image indicate that programs using multimedia instructions achieve 3.16 times of execution time, 8.13 times of energy efficiency, and 3.16 times of area efficiency over the baseline programs, respectively. Likewise, programs using multimedia instructions outperform the baseline programs using a $240{\times}320$ image (2.16 times of execution time, 4.04 times of energy efficiency, 2.16 times of area efficiency) as well as using a $240{\times}400$ image (2.25 times of execution time, 4.34 times of energy efficiency, 2.25 times of area efficiency). In addition, we explore optimal PE architecture of many-core processors including multimedia instructions by varying the number of PEs and memory size.

A Study of Distribute Computing Performance Using a Convergence of Xeon-Phi Processor and Quantum ESPRESSO (퀀텀 에스프레소와 제온 파이 프로세서의 융합을 이용한 분산컴퓨팅 성능에 대한 연구)

  • Park, Young-Soo;Park, Koo-Rack;Kim, Dong-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.5
    • /
    • pp.15-21
    • /
    • 2016
  • Recently the degree of integration of processor and developed rapidly. However, clock speed is not increased, a situation that increases the number of cores in the processor. In this paper, we analyze the performance of a typical Intel Xeon Phi of many core process used for the current operation accelerate. Utilizing the Quantum ESPRESSO, which was calculated using the FFTW library. By varying the number of ranks in MPI when running the benchmarks the performance Xeon Phi. The result shows a good performance in the handling of four job on one physical core. However, four or more to expand the number of MPI Rank is degraded. Through this convergence it was found to improve the performance of Quantum ESPRESSO. It is possible to check the hardware characteristics of the Xeon Phi.