Search | Korea Science

Analysis on the Performance Impact of Partitioned LLC for Heterogeneous Multicore Processors (이종 멀티코어 프로세서에서 분할된 공유 LLC가 성능에 미치는 영향 분석)

Moon, Min Goo;Kim, Cheol Hong
- The Journal of Korean Institute of Next Generation Computing
- /
- v.15 no.2
- /
- pp.39-49
- /
- 2019
Recently, CPU-GPU integrated heterogeneous multicore processors have been widely used for improving the performance of computing systems. Heterogeneous multicore processors integrate CPUs and GPUs on a single chip where CPUs and GPUs share the LLC(Last Level Cache). This causes a serious cache contention problem inside the processor, resulting in significant performance degradation. In this paper, we propose the partitioned LLC architecture to solve the cache contention problem in heterogeneous multicore processors. We analyze the performance impact varying the LLC size of CPUs and GPUs, respectively. According to our simulation results, the bigger the LLC size of the CPU, the CPU performance improves by up to 21%. However, the GPU shows negligible performance difference when the assigned LLC size increases. In other words, the GPU is less likely to lose the performance when the LLC size decreases. Because the performance degradation due to the LLC size reduction in GPU is much smaller than the performance improvement due to the increase of the LLC size of the CPU, the overall performance of heterogeneous multicore processors is expected to be improved by applying partitioned LLC to CPUs and GPUs. In addition, if we develop a memory management technique that can maximize the performance of each core in the future, we can greatly improve the performance of heterogeneous multicore processors.

Performance Analysis of Multicore Out-of-Order Superscalar Processor with Multiple Basic Block Execution (다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서의 성능 분석)

Lee, Jong Bok
- Journal of Korea Multimedia Society
- /
- v.16 no.2
- /
- pp.198-205
- /
- 2013
In this paper, the performance of multicore processor architecture is analyzed which utilizes out-of-order superscalar processor core using multiple basic block execution. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar processor with the window size from 32 to 64 and the number of cores between 1 and 16, exploiting multiple basic block execution from 1 to 4 extensively. As a result, the multicore out-of-order superscalar processor with 4 basic block execution achieves 22.0 % average performance increase over the same architecture with the single basic block execution.
https://doi.org/10.9717/kmms.2013.16.2.198 인용 PDF KSCI

A Multithreaded Implementation of HEVC Intra Prediction Algorithm for a Photovoltaic Monitoring System

Choi, Yung-Ho;Ahn, Hyung-Keun
- Transactions on Electrical and Electronic Materials
- /
- v.13 no.5
- /
- pp.256-261
- /
- 2012
Recently, many photovoltaic systems (PV systems) including solar parks and PV farms have been built to prepare for the post fossil fuel era. To investigate the degradation process of the PV systems and thus, efficiently operate PV systems, there is a need to visually monitor PV systems in the range of infrared ray through the Internet. For efficient visual monitoring, this paper explores a multithreaded implementation of a recently developed HEVC standard whose compression efficiency is almost two times higher than H.264. For an efficient parallel implementation under a meshbased 64 multicore system, this work takes into account various design choices which can solve potential problems of a two-dimensional interconnects-based 64 multicore system. These problems may have not occurred in a small-scale multicore system based on a simple bus network. Through extensive evaluation, this paper shows that, for an efficient multithreaded implementation of HEVC intra prediction in a mesh-based multicore system, much effort needs to be made to optimize communications among processing cores. Thus, this work provides three design choices regarding communications, i.e., main thread core location, cache home policy, and maximum coding unit size. These design choices are shown to improve the overall parallel performance of the HEVC intra prediction algorithm by up to 42%, achieving a 7 times higher speed-up.
https://doi.org/10.4313/TEEM.2012.13.5.256 인용 PDF KSCI

Performance Analysis of Multicore Processor Architectures Based On Cache Size Effects (캐쉬 용량 효과에 대한 멀티코어 프로세서의 성능 연구)

Lee, Jongbok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.12 no.6
- /
- pp.175-180
- /
- 2012
In order to overcome the complexity and performance limit problems of superscalar processors, the multicore architecture has been prevalent recently. The configuration and the size of instruction and data caches greatly gives effect on the performance of multicore processors. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 2-core to 16-core architectures with different sizes of caches extensively. As a result, the 2-way set associative instruction and data cache with the size of 64KB brought the best cost-effective performance.
https://doi.org/10.7236/JIWIT.2012.12.6.175 인용 PDF KSCI

The Performance Study of a Virtualized Multicore Web System

Lu, Chien-Te;Yeh, C.S. Eugene;Wang, Yung-Chung;Yang, Chu-Sing
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.10 no.11
- /
- pp.5419-5436
- /
- 2016
Enhancing the performance of computing systems has been an important topic since the invention of computers. The leading-edge technologies of multicore and virtualization dramatically influence the development of current IT systems. We study performance attributes of response time (RT), throughput, efficiency, and scalability of a virtualized Web system running on a multicore server. We build virtual machines (VMs) for a Web application, and use distributed stress tests to measure RTs and throughputs under varied combinations of virtual cores (VCs) and VM instances. Their gains, efficiencies and scalabilities are also computed and compared. Our experimental and analytic results indicate: 1) A system can perform and scale much better by adopting multiple single-VC VMs than by single multiple-VC VM. 2) The system capacity gain is proportional to the number of VM instances run, but not proportional to the number of VCs allocated in a VM. 3) A system with more VMs or VCs has higher physical CPU utilization, but lower vCPU utilization. 4) The maximum throughput gain is less than VM or VC gain. 5) Per-core computing efficiency does not correlate to the quality of VCs or VMs employed. The outcomes can provide valuable guidelines for selecting instance types provided by public Cloud providers and load balancing planning for Web systems.
https://doi.org/10.3837/tiis.2016.11.012 인용 PDF KSCI

MC-MIPOG: A Parallel t-Way Test Generation Strategy for Multicore Systems

Younis, Mohammed I.;Zamli, Kamal Z.
- ETRI Journal
- /
- v.32 no.1
- /
- pp.73-83
- /
- 2010
Combinatorial testing has been an active research area in recent years. One challenge in this area is dealing with the combinatorial explosion problem, which typically requires a very expensive computational process to find a good test set that covers all the combinations for a given interaction strength (t). Parallelization can be an effective approach to manage this computational cost, that is, by taking advantage of the recent advancement of multicore architectures. In line with such alluring prospects, this paper presents a new deterministic strategy, called multicore modified input parameter order (MC-MIPOG) based on an earlier strategy, input parameter order generalized (IPOG). Unlike its predecessor strategy, MC-MIPOG adopts a novel approach by removing control and data dependency to permit the harnessing of multicore systems. Experiments are undertaken to demonstrate speedup gain and to compare the proposed strategy with other strategies, including IPOG. The overall results demonstrate that MC-MIPOG outperforms most existing strategies (IPOG, IPOF, IPOF2, IPOG-D, ITCH, TConfig, Jenny, and TVG) in terms of test size within acceptable execution time. Unlike most strategies, MC-MIPOG is also capable of supporting high interaction strengths of t > 6.
https://doi.org/10.4218/etrij.10.0109.0266 인용 PDF KSCI

An Efficient Load Balancing Technique in a Multicore Mobile System (멀티코어 모바일 시스템에서 효과적인 부하 균등화 기법)

Cho, Jungseok;Cho, Doosan
- KIPS Transactions on Computer and Communication Systems
- /
- v.4 no.5
- /
- pp.153-160
- /
- 2015
The effectiveness of multicores depends on how well a scheduler can assign tasks onto the cores efficiently. In a heterogeneous multicore platform, the execution time of an application depends on which core it executes on. That is to say, the effectiveness of task assignment is one of the important components for a multicore systems' performance. This work proposes a load scheduling technique that analyzes execution time of each task by profiling. The profiling result provides a basic information to predict which task-to-core mapping is likely to provide the best performance. By using such information, the proposed technique is about 26% performance gain.
https://doi.org/10.3745/KTCCS.2015.4.5.153 인용 PDF KSCI

Fault-tolerant Scheduling of Real-time Parallel Tasks with Energy Efficiency on Multicore Processors (멀티코어 프로세서 상에서 에너지 효율을 고려한 실시간 병렬 작업들의 결함 포용 스케쥴링)

Lee, Kwanwoo
- KIPS Transactions on Computer and Communication Systems
- /
- v.3 no.6
- /
- pp.173-178
- /
- 2014
By exploiting parallel processing, the proposed scheduling scheme enhances energy saving capability of multicore processors for real-time tasks while satisfying deadline and fault tolerance constraints. The scheme searches for a near minimum-energy schedule within a polynomial time, because finding the minimum-energy schedule on multicore processors is a NP-hard problem. The scheme consumes manifestly less energy than the state-of-the-arts method even with low parallel processing speedup as well as with high parallel processing speedup, and saves the energy consumption up to 86%.
https://doi.org/10.3745/KTCCS.2014.3.6.173 인용 PDF KSCI

Concentric Core Fiber Design for Optical Fiber Communication

Nadeem, Iram;Choi, Dong-You
- Journal of information and communication convergence engineering
- /
- v.14 no.3
- /
- pp.163-170
- /
- 2016
Because of rapid technological advancements, increased data rate support has become the key criterion for future communication medium selection. Multimode optical fibers and multicore optical fibers are well matched to high data rate throughput requirements because of their tendency to support multiple modes through one core at a time, which results in higher data rates. Using the numerical mode solver OptiFiber, we have designed a concentric core fiber by investigating certain design parameters, namely core diameter (µm), wavelength (nm), and refractive index profile, and as a result, the number of channels, material losses, bending losses, polarization mode dispersion, and the effective nonlinear refractive index have been determined. Space division multiplexing is a promising future technology that uses few-mode fibers in parallel to form a multicore fiber. The experimental tests are conducted using the standard second window wavelength of 1,550 nm and simulated results are presented.
https://doi.org/10.6109/jicce.2016.14.3.163 인용 KSCI KPUBS HTML

Time-Predictable Java Dynamic Compilation on Multicore Processors

Sun, Yu;Zhang, Wei
- Journal of Computing Science and Engineering
- /
- v.6 no.1
- /
- pp.26-38
- /
- 2012
Java has been increasingly used in programming for real-time systems. However, some of Java's features such as automatic memory management and dynamic compilation are harmful to time predictability. If these problems are not solved properly then it can fundamentally limit the usage of Java for real-time systems, especially for hard real-time systems that require very high time predictability. In this paper, we propose to exploit multicore computing in order to reduce the timing unpredictability that is caused by dynamic compilation and adaptive optimization. Our goal is to retain high performance comparable to that of traditional dynamic compilation, while at the same time, obtain better time predictability for Java virtual machine (JVM). We have studied pre-compilation techniques to utilize another core more efficiently, preoptimization on another core (PoAC) scheme to replace the adaptive optimization system (AOS) in Jikes JVM and the counter based optimization (CBO). Our evaluation reveals that the proposed approaches are able to attain high performance while greatly reducing the variation of the execution time for Java applications.
https://doi.org/10.5626/JCSE.2012.6.1.26 인용 PDF KSCI KPUBS

Search Result 143, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)