• Title/Summary/Keyword: Parallel processor core

Search Result 63, Processing Time 0.029 seconds

A Study on Machine Learning Compiler and Modulo Scheduler (머신러닝 컴파일러와 모듈로 스케쥴러에 관한 연구)

  • Doosan Cho
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.1
    • /
    • pp.87-95
    • /
    • 2024
  • This study is on modulo scheduling algorithms for multicore processor in machine learning applications. Machine learning algorithms are designed to perform a large amount of operations such as vectors and matrices in order to quickly process large amounts of data stream. To support such large amounts of computations, processor architectures to support applications such as artificial intelligence, neural networks, and machine learning are designed in the form of parallel processing such as multicore. To effectively utilize these multi-core hardware resources, various compiler techniques are being used and studied. In this study, among these compiler techniques, we analyzed the modular scheduler, which is especially important in one core's computation pipeline. This paper looked at and compared the iterative modular scheduler and the swing modular scheduler, which are the most widely used and studied. As a result, both schedulers provided similar performance results, and when measuring register pressure as an indicator, it was confirmed that the swing modulo scheduler provided slightly better performance. In this study, a technique that divides recurrence edge is proposed to improve the minimum initiation interval of the modulo schedulers.

Speedup Analysis Model for High Speed Network based Distributed Parallel Systems (고속 네트웍 기반의 분산병렬시스템에서의 성능 향상 분석 모델)

  • 김화성
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12C
    • /
    • pp.218-224
    • /
    • 2001
  • The objective of Distributed Parallel Computing is to solve the computationally intensive problems, which have several types of parallelism, on a suite of high performance and parallel machines in a manner that best utilizes the capabilities of each machine. In this paper, we propose a computational model including the generalized graph representation method of distributed parallel systems for speedup analysis, and analyze how the super-linear speedup is achieved when scheduling of programs with diverse embedded parallelism modes onto a distributed heterogeneous supercomputing network environment. The proposed representation method can also be applied to simple homogeneous or heterogeneous systems whose components are heterogeneous only in terms of the processor speed. In order to obtain the core speedup, the matching of the parallelism characteristics between tasks and parallel machines should be carefully handled while minimizing the communication overhead.

  • PDF

Implementation of IQ/IDCT in H.264/AVC Decoder Using Mobile Multi-Core GPGPU (모바일 멀티 코어 GP-GPU를 이용한 H.264/AVC 디코더 구현)

  • Kim, Dong-Han;Lee, Kwang-Yeob;Jeong, Jun-Mo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.321-324
    • /
    • 2010
  • There have been lots of researches on a multi-core processor. The enhancement has been performed through parallelization method. Multi-core architecture in the mobile environment has emerged. But, there is a limit to a mobile CPU's performance. GP-GPU(General-Purpose computing on Graphics Processing Units) can improve performance without adding other dedicated hardware. This paper presents the implementation of Inverse Quantization, Inverse DCT and Color Space Conversion module in H.264/AVC decoder using Multi-Core GP-GPU for a mobile environments. The proposed architecture improves approximately 50% of performance when it use all the features.

  • PDF

Real-Time Object Segmentation in Image Sequences (연속 영상 기반 실시간 객체 분할)

  • Kang, Eui-Seon;Yoo, Seung-Hun
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.173-180
    • /
    • 2011
  • This paper shows an approach for real-time object segmentation on GPU (Graphics Processing Unit) using CUDA (Compute Unified Device Architecture). Recently, many applications that is monitoring system, motion analysis, object tracking or etc require real-time processing. It is not suitable for object segmentation to procedure real-time in CPU. NVIDIA provide CUDA platform for Parallel Processing for General Computation to upgrade limit of Hardware Graphic. In this paper, we use adaptive Gaussian Mixture Background Modeling in the step of object extraction and CCL(Connected Component Labeling) for classification. The speed of GPU and CPU is compared and evaluated with implementation in Core2 Quad processor with 2.4GHz.The GPU version achieved a speedup of 3x-4x over the CPU version.

Development of a flux emergence simulation using parallel computing

  • Lee, Hwanhee;Magara, Tetsuya
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.71.1-71.1
    • /
    • 2019
  • The solar magnetic field comes from the solar interior and is related to various phenomena on the Sun. To understand this process, many studies have been conducted to produce its evolution using a single flux rope. In this study, we are interested in the emergence of two flux ropes and their evolution, which takes longer than the emergence of a single flux rope. To construct it, we develop a flux emergence simulation by applying a parallel computing to reduce a computation time in a wider domain. The original simulation code had been written in Fortran 77. We modify it to a version of Fortran 90 with Message Passing Interface (MPI). The results of the original and new simulation are compared on the NEC SX-Aurora TSUBASA which is a vector engine processor. The parallelized version is faster than running on a single core and it shows a possibility to handle large amounts of calculation. Based on this model, we can construct a complex flux emergence system, such as an evolution of two magnetic flux ropes.

  • PDF

A Performance Evaluation on Classic Mutual Exclusion Algorithms for Exploring Feasibility of Practical Application (실제 적용 타당성 탐색을 위한 고전적 상호배제 알고리즘 성능 평가)

  • Lee, Hyung-Bong;Kwon, Ki-Hyeon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.12
    • /
    • pp.469-478
    • /
    • 2017
  • The mutual exclusion is originally based on the theory of race condition prevention in symmetric multi-processor operating systems. But recently, due to the generalization of multi-core processors, its application range has been rapidly shifted to parallel processing application domain. POSIX thread, WIN32 thread, and Java thread, which are typical parallel processing application development environments, provide a unique mutual exclusion mechanism for each of them. Applications that are very sensitive to performance in these environments may want to reduce the burden of mutual exclusion, even at some cost, such as inconvenience of coding. In this study, we implement Dekker's and Peterson's algorithm in the form of busy-wait and processor-yield in various platforms, and compare the performance of them with the built-in mutual exclusion mechanisms to evaluate the usability of the classic algorithms. The analysis result shows that Dekker's algorithm of processor-yield type is superior to the built-in mechanisms in POSIX and WIN32 thread environments at least 2 times and up to 70 times, and confirms that the practicality of the algorithm is sufficient.

Energy-Efficient Multi- Core Scheduling for Real-Time Video Processing (실시간 비디오 처리에 적합한 에너지 효율적인 멀티코어 스케쥴링)

  • Paek, Hyung-Goo;Yeo, Jeong-Mo;Lee, Wan-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.6
    • /
    • pp.11-20
    • /
    • 2011
  • In this paper, we propose an optimal scheduling scheme that minimizes the energy consumption of a real-time video task on the multi-core platform supporting dynamic voltage and frequency scaling. Exploiting parallel execution on multiple cores for less energy consumption, the propose scheme allocates an appropriate number of cores to the task execution, turns off the power of unused cores, and assigns the lowest clock frequency meeting the deadline. Our experiments show that the proposed scheme saves a significant amount of energy, up to 67% and 89% of energy consumed by two previous methods that execute the task on a single core and on all cores respectively.

Real-Time Power-Saving Scheduling Based on Genetic Algorithms in Multi-core Hybrid Memory Environments (멀티코어 이기종메모리 환경에서의 유전 알고리즘 기반 실시간 전력 절감 스케줄링)

  • Yoo, Suhyeon;Jo, Yewon;Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.135-140
    • /
    • 2020
  • Recently, due to the rapid diffusion of intelligent systems and IoT technologies, power saving techniques in real-time embedded systems has become important. In this paper, we propose P-GA (Parallel Genetic Algorithm), a scheduling algorithm aims at reducing the power consumption of real-time systems in multi-core hybrid memory environments. P-GA improves the Proportional-Fairness (PF) algorithm devised for multi-core environments by combining the dynamic voltage/frequency scaling of the processor with the nonvolatile memory technologies. Specifically, P-GA applies genetic algorithms for optimizing the voltage and frequency modes of processors and the memory types, thereby minimizing the power consumptions of the task set. Simulation experiments show that the power consumption of P-GA is reduced by 2.85 times compared to the conventional schemes.

Implementation of MPEG/Audio Decoder based on RISC Processor With Minimized DSP Accelerator (DSP 가속기가 내장된 RISC 프로세서 기반 MPEG/Audio 복호화기의 구현)

  • Bang Kyoung Ho;Lee Ken Sup;Park Young Cheol;Youn Dae Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.12C
    • /
    • pp.1617-1622
    • /
    • 2004
  • MPEG/Audio decoder for mobile multimedia systems requires low power consumption. Implementations of AV decoder using a single RISC processor often need high power consumption owing to cash-miss in case of insufficient cash memory. In this paper, we present a MPEG/Audio decoder for mobile handset applications and implement it on a RISC processor embedding a minimized DSP accelerator. Audio decoding algorithm is splined into two parts; computation intensive and control intensive parts. Those parts we, respectively, allocated to DSP and RISC core, which are designed to run in parallel to increase the processing efficiency. The proposed system implements MP3 and AAC decoders at l7MHz and 24MHz clocks, which are reductions of 48% and 40% of complexities in comparison with implementations on a single RISC processor. The proposed method is adequate for mobile multimedia applications with insufficient cash memory.

Design Space Exploration of Many-Core Processor for High-Speed Cluster Estimation (고속의 클러스터 추정을 위한 매니코어 프로세서의 디자인 공간 탐색)

  • Seo, Jun-Sang;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.1-12
    • /
    • 2014
  • This paper implements and improves the performance of high computational subtractive clustering algorithm using a single instruction, multiple data (SIMD) based many-core processor. In addition, this paper implements five different processing element (PE) architectures (PEs=16, 64, 256, 1,024, 4,096) to select an optimal PE architecture for the subtractive clustering algorithm by estimating execution time and energy efficiency. Experimental results using two different medical images and three different resolutions ($128{\times}128$, $256{\times}256$, $512{\times}512$) show that PEs=4,096 achieves the highest performance and energy efficiency for all the cases.