• Title/Summary/Keyword: multicore

Search Result 143, Processing Time 0.031 seconds

Design Space Exploration for NoC-Style Bus Networks

  • Kim, Jin-Sung;Lee, Jaesung
    • ETRI Journal
    • /
    • v.38 no.6
    • /
    • pp.1240-1249
    • /
    • 2016
  • With the number of IP cores in a multicore system-on-chip increasing to up to tens or hundreds, the role of on-chip interconnection networks is vital. We propose a networks-on-chip-style bus network as a compromise and redefine the exploration problem to find the best IP tiling patterns and communication path combinations. Before solving the problem, we estimate the time complexity and validate the infeasibility of the solution. To reduce the time complexity, we propose two fast exploration algorithms and develop a program to implement these algorithms. The program is executed for several experiments, and the exploration time is reduced to approximately 1/22 and 7/1,200 at the first and second steps of the exploration process, respectively. However, as a trade-off for the time saving, the time cost (TC) of the searched architecture is increased to up to 4.7% and 11.2%, respectively, at each step compared with that of the architecture obtained through full-case exploration. The reduction ratio can be decreased to 1/4,000 by simultaneously applying both the algorithms even though the resulting TC is increased to up to 13.1% when compared with that obtained through full-case exploration.

Comparing Energy Efficiency of MPI and MapReduce on ARM based Cluster (ARM 클러스터에서 에너지 효율 향상을 위한 MPI와 MapReduce 모델 비교)

  • Maqbool, Jahanzeb;Rizki, Permata Nur;Oh, Sangyoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.9-13
    • /
    • 2014
  • The performance of large scale software applications has been automatically increasing for last few decades under the influence of Moore's law - the number of transistors on a microprocessor roughly doubled every eighteen months. However, on-chip transistors limitations and heating issues led to the emergence of multicore processors. The energy efficient ARM based System-on-Chip (SoC) processors are being considered for future high performance computing systems. In this paper, we present a case study of two widely used parallel programming models i.e. MPI and MapReduce on distributed memory cluster of ARM SoC development boards. The case study application, Black-Scholes option pricing equation, was parallelized and evaluated in terms of power consumption and throughput. The results show that the Hadoop implementation has low instantaneous power consumption that of MPI, but MPI outperforms Hadoop implementation by a factor of 1.46 in terms of total power consumption to execution time ratio.

  • PDF

Implementation of Parallel Processing Interpolation Algorithm for Multicore GPU (다중코어 GPU를 위한 병렬처리 보간 알고리즘 구현)

  • Lee, Kwang-Yeob;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.304-309
    • /
    • 2012
  • As resolution for displays is recently more and more increasing, the amount of data abd calculation that graphic hardware needs to process are also increasing. Especially the amount of data processing by rasterizer is rapidly increasing. This paper used an algorism using coordinates in center of gravity and area for triangle instead of using bilinear algorism[1] used by conventional interpolation, which is to make it easier for parallel processing by rasterizer. This paper implemented designed rasterizer under FPGA environment, and compared it with conventional rasterizer and verified it. This rasterizer is proved to have approximately 50% higher performance compared to conventional one.

Implementation of LTE-A PDSCH Decoder using TMS320C6670 (TMS320C6670 기반 LTE-A PDSCH 디코더 구현)

  • Lee, Gwangmin;Ahn, Heungseop;Choi, Seungwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.14 no.4
    • /
    • pp.79-85
    • /
    • 2018
  • This paper presents an implementation method of Long Term Evolution-Advanced (LTE-A) Physical Downlink Shared Channel (PDSCH) decoder using a general-purpose multicore Digital Signal Processor (DSP), TMS320C6670. Although the DSP provides some useful coprocessors such as turbo decoder, fast Fourier transformer, Viterbi Coprocessor, Bit Rate Coprocessor etc., it is specific to the base station platform implementation not the mobile terminal platform implementation. This paper shows an implementation method of the LTE-A PDSCH decoder using programmable DSP cores as well as the coprocessors of Fast Fourier Transformer and turbo decoder. First, it uses the coprocessor supported by the TMS320C6670, which can be used for PDSCH implementation. Second, we propose a core programming method using DSP optimization method for block diagram of PDSCH that can not use coprocessor. Through the implementation, we have verified a real-time decoding feasibility for the LTE-A downlink physical channel using test vectors which have been generated from LTE-A Reference Measurement Channel (RMC) Waveform R.6.

Hybrid AI Based Process Scheduler for Asymmetric Multicore Processor to Improve Power Efficiency (전력 효율 향상을 위한 하이브리드 인공지능 기반의 비대칭 멀티코어 프로세서용 프로세스 스케줄러)

  • Jeong, Won Seob;Kim, Seung Hun;Lee, Sang-Min;Ro, Won Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.180-183
    • /
    • 2013
  • 근래의 프로세서는 하나의 다이 위에 여러 개의 코어를 배치한 멀티코어 형태를 띠고 있다. 최근에는 프로세서의 에너지 소비량을 줄이기 위해 비대칭 멀티코어를 활용하여 동일한 성능을 유지하며 소비전력을 낮추는 방법에 대한 연구가 활발히 진행되고 있다. 비대칭 멀티코어의 장점을 최대한 활용하기 위해서는 대칭형 멀티코어와는 달리 실행해야 할 프로세스와 상이한 코어간의 작동 특성을 고려해야 한다. 본 논문에서는 전력 소비 효율 향상을 위해 프로세스 스케줄링 알고리즘에 하이브리드 인공지능 기술인 Adaptive Neuro Fuzzy Inference System (ANFIS)를 적용하여 각 프로세스에 적합한 코어를 찾아 할당하는 방법을 제안한다. 시뮬레이션 결과 제안하는 프로세스 스케줄러는 리눅스의 CFS 대비 평균 35.4% 낮은 Energy Delay Product (EDP)를 보였으며 이를 통해 하이브리드 인공지능을 적용한 프로세스 스케줄링 알고리즘의 유효성을 입증하였다.

Multicore Processor based Parallel SVM for Video Surveillance System (비디오 감시 시스템을 위한 멀티코어 프로세서 기반의 병렬 SVM)

  • Kim, Hee-Gon;Lee, Sung-Ju;Chung, Yong-Wha;Park, Dai-Hee;Lee, Han-Sung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.6
    • /
    • pp.161-169
    • /
    • 2011
  • Recent intelligent video surveillance system asks for development of more advanced technology for analysis and recognition of video data. Especially, machine learning algorithm such as Support Vector Machine (SVM) is used in order to recognize objects in video. Because SVM training demands massive amount of computation, parallel processing technique is necessary to reduce the execution time effectively. In this paper, we propose a parallel processing method of SVM training with a multi-core processor. The results of parallel SVM on a 4-core processor show that our proposed method can reduce the execution time of the sequential training by a factor of 2.5.

Lock-free unique identifier allocation for parallel macro expansion

  • Son, Bum-Jun;Ahn, Ki Yung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.4
    • /
    • pp.1-8
    • /
    • 2022
  • In this paper, we propose a more effective unique identifier allocation method for macro expansion in a single-process multicore parallel computing environment that does not require locks. Our key idea for such an allocation method is to remove sequential dependencies using the remainder operation. We confirmed that our lock-free method is suitable for improving the performance of parallel macro expansion through the following benchmark: we patched an existing library, which is based on a sequential unique identifier allocation, with our proposed method, and compared the performances of the same program but using two different versions of the library, before and after the patch.

Debugging of Parallel Programs using Distributed Cooperating Components

  • Mrayyan, Reema Mohammad;Al Rababah, Ahmad AbdulQadir
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.570-578
    • /
    • 2021
  • Recently, in the field of engineering and scientific and technical calculations, problems of mathematical modeling, real-time problems, there has been a tendency towards rejection of sequential solutions for single-processor computers. Almost all modern application packages created in the above areas are focused on a parallel or distributed computing environment. This is primarily due to the ever-increasing requirements for the reliability of the results obtained and the accuracy of calculations, and hence the multiply increasing volumes of processed data [2,17,41]. In addition, new methods and algorithms for solving problems appear, the implementation of which on single-processor systems would be simply impossible due to increased requirements for the performance of the computing system. The ubiquity of various types of parallel systems also plays a positive role in this process. Simultaneously with the growing demand for parallel programs and the proliferation of multiprocessor, multicore and cluster technologies, the development of parallel programs is becoming more and more urgent, since program users want to make the most of the capabilities of their modern computing equipment[14,39]. The high complexity of the development of parallel programs, which often does not allow the efficient use of the capabilities of high-performance computers, is a generally accepted fact[23,31].

Para-virtualized Library for Bare-metal Network Performance in Virtualized Environment (가상화 환경의 고성능 I/O를 위한 반가상화 라이브러리)

  • Lee, Dongwoo;Cho, Youngjoong;Eom, Young Ik
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.605-610
    • /
    • 2014
  • Now, virtualization is no more emerging research area, and we can easily find its application in our circumstance. Nevertheless, I/O workloads are reluctant to be applied in virtual environment since they still suffer from unacceptable performance degradation due to virtualization latency. Many previous papers identified that virtual I/O overhead is mainly caused by exits and redundant I/O stack, and proposed several techniques to reduce them. However, they still have some limitations. In this paper, we introduce a novel I/O virtualization framework which improves I/O performance by exploiting multicore architecture. We applied our framework to the virtual network, and it improves TCP throughput up to 169%, and decreases UDP latency up to 38% on the network with the 10Gbps NIC.

Dynamic Power Management for Webpage Loading on Mobile Devices (모바일 웹 페이지 로딩에서 동적 관리 기법)

  • Park, Hyunjae;Choi, Youngjune
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1623-1628
    • /
    • 2015
  • As the performance of mobile devices has increased, high-end multicore CPUs have become the norm in smartphones. However, such high performance devices are exposed to the problem of battery depletion due to the energy consumption caused by software performance, and despite increases in battery capacity. The required resources are dynamic and varied, and further user interaction is highly random; thus, Linux-based power management such as DVFS is needed to fulfill requirements. In order to reduce power consumption, we propose a method to restrict the CPU frequency of data download while maintaining user reactivity. This can supplement the weakness of existing Linux-based power management techniques like DVFS in relation to webpage loading. Through the implementation of our method at the application level, we confirm that energy consumption from webpage loading is reduced.