• Title/Summary/Keyword: L2-Cache

Search Result 57, Processing Time 0.034 seconds

Compact Field Remapping for Dynamically Allocated Structures (동적으로 할당된 구조체를 위한 압축된 필드 재배치)

  • Kim, Jeong-Eun;Han, Hwan-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.1003-1012
    • /
    • 2005
  • The most significant difference of embedded systems from general purpose systems is that embedded systems are allowed to use only limited resources including battery and memory. Especially, the number of applications increases which deal with multimedia data. In those systems with high data computations, the delay of memory access is one of the major bottlenecks hurting the system performance. As a result, many researchers have investigated various techniques to reduce the memory access cost. Most programs generally have locality in memory references. Temporal locality of references means that a resource accessed at one point will be used again in the near future. Spatial locality of references is that likelihood of using a resource gets higher if resources near it were just accessed. The latest embedded processors usually adapt cache memory to exploit these two types of localities. Processors access faster cache memory than off-chip memory, reducing the latency. In this paper we will propose the enhanced dynamic allocation technique for structure-type data in order to eliminate unused memory space and to reduce both the cache miss rate and the application execution time. The proposed approach aggregates fields from multiple records dynamically allocated and consecutively remaps them on the memory space. Experiments on Olden benchmarks show $13.9\%$ L1 cache miss rate drop and $15.9\%$ L2 cache miss drop on average, compared to the previously proposed techniques. We also find execution time reduced by $10.9\%$ on average, compared to the previous work.

Analysis on the Temperature of 3D Multi-core Processors according to Vertical Placement of Core and L2 Cache (코어와 L2 캐쉬의 수직적 배치 관계에 따른 3차원 멀티코어 프로세서의 온도 분석)

  • Son, Dong-Oh;Ahn, Jin-Woo;Park, Jae-Hyung;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.6
    • /
    • pp.1-10
    • /
    • 2011
  • In designing multi-core processors, interconnection delay is one of the major constraints in performance improvement. To solve this problem, the 3-dimensional integration technology has been adopted in designing multi-core processors. The 3D multi-core architecture can reduce the physical wire length by stacking cores vertically, leading to reduced interconnection delay and reduced power consumption. However, the power density of 3D multi-core architecture is increased significantly compared to the traditional 2D multi-core architecture, resulting in the increased temperature of the processor. In this paper, the floorplan methods which change the forms of vertical placement of the core and the level-2 cache are analyzed to solve the thermal problems in 3D multi-core processors. According to the experimental results, it is an effective way to reduce the temperature in the processor that the core and the level-2 cache are stacked adjacently. Compared to the floorplan where cores are stacked adjacently to each other, the floorplan where the core is stacked adjacently to the level-2 cache can reduce the temperature by 22% in the case of 4-layers, and by 13% in the case of 2-layers.

A Case Study on Hardware Trojan: Cache Coherence-Exploiting DoS Attack (하드웨어 Trojan 사례 연구: 캐시 일관성 규약을 악용한 DoS 공격)

  • Kong, Sunhee;Hong, Bo-Uye;Suh, Taeweon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.740-743
    • /
    • 2015
  • The increasing complexity of integrated circuits and IP-based hardware designs have created the risk of hardware Trojans. This paper introduces a new type of threat, the coherence-exploiting hardware Trojan. This Trojan can be maliciously implanted in master components in a system, and continuously injects memory read transactions on to bus or main interconnect. The injected traffic forces the eviction of cache lines, taking advantage of cache coherence protocols. This type of Trojans insidiously slows down the system performance, incurring Denial-of-Service (DoS) attack. We used Xilinx Zynq-7000 device to implement and evaluate the coherence-exploiting Trojan. The malicious traffic was injected through the AXI ACP interface in Zynq-7000. Then, we collected the L2 cache eviction statistics with performance counters. The experiment results reveal the severe threats of the Trojan to the system performance.

Efficient Cache Management Scheme in Database based on Block Classification (블록 분류에 기반한 데이타베이스의 효율적 캐쉬 관리 기법)

  • Sin, Il-Hoon;Koh, Kern
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.7
    • /
    • pp.369-376
    • /
    • 2002
  • Although LRU is not adequate for database that has non-uniform reference pattern, it has been adopted in most database systems due to the absence of the proper alternative. We analyze database block reference pattern with the realistic database trace. Based on this analysis, we propose a new cache replacement policy. Trace analysis shows that extremely non-popular blocks take up about 70 % of the entire blocks. The influence of recency on blocks' re-reference likelihood is at first strong due to temporal locality, however, it rapidly decreases and eventually becomes negligible as stack distance increases. Based on this observation, RCB(Reference Characteristic Based) cache replacement policy, which we propose in this paper, classifies the entire blocks into four block groups by blocks' recency and re-reference likelihood, and operates different priority evaluation methods for each block group. RCB policy evicts non-popular blocks more quickly than the others and evaluates the priority of the block by frequency that has not been referenced for a long time. In a trace-driven simulation, RCB delivers a better performance than the existing polices(LRU, 2Q, LRU-K, LRFU). Especially compared to LRU. It reduces miss count by 5~l2.7%. Time complexity of RCB is O(1), which is the same with LRU and 2Q and superior to LRU-K(O(log$_2$N)) and LRFU(O(l) ~ O(log$_2$N)).

A Study on an Error Correction Code Circuit for a Level-2 Cache of an Embedded Processor (임베디드 프로세서의 L2 캐쉬를 위한 오류 정정 회로에 관한 연구)

  • Kim, Pan-Ki;Jun, Ho-Yoon;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.1
    • /
    • pp.15-23
    • /
    • 2009
  • Microprocessors, which need correct arithmetic operations, have been the subject of in-depth research in relation to soft errors. Of the existing microprocessor devices, the memory cell is the most vulnerable to soft errors. Moreover, when soft errors emerge in a memory cell, the processes and operations are greatly affected because the memory cell contains important information and instructions about the entire process or operation. Users do not realize that if soft errors go undetected, arithmetic operations and processes will have unexpected outcomes. In the field of architectural design, the tool that is commonly used to detect and correct soft errors is the error check and correction code. The Itanium, IBM PowerPC G5 microprocessors contain Hamming and Rasio codes in their level-2 cache. This research, however, focuses on huge server devices and does not consider power consumption. As the operating and threshold voltage is currently shrinking with the emergence of high-density and low-power embedded microprocessors, there is an urgent need to develop ECC (error check correction) circuits. In this study, the in-output data of the level-2 cache were analyzed using SimpleScalar-ARM, and a 32-bit H-matrix for the level-2 cache of an embedded microprocessor is proposed. From the point of view of power consumption, the proposed H-matrix can be implemented using a schematic editor of Cadence. Therefore, it is comparable to the modified Hamming code, which uses H-spice. The MiBench program and TSMC 0.18 um were used in this study for verification purposes.

Multicore-Aware Code Co-Positioning to Reduce WCET on Dual-Core Processors with Shared Instruction Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.1
    • /
    • pp.12-25
    • /
    • 2012
  • For real-time systems it is important to obtain the accurate worst-case execution time (WCET). Furthermore, how to improve the WCET of applications that run on multicore processors is both significant and challenging as the WCET can be largely affected by the possible inter-core interferences in shared resources such as the shared L2 cache. In order to solve this problem, we propose an innovative approach that adopts a code positioning method to reduce the inter-core L2 cache interferences between the different real-time threads that adaptively run in a multi-core processor by using different strategies. The worst-case-oriented strategy is designed to decrease the worst-case WCET among these threads to as low as possible. The other two strategies aim at reducing the WCET of each thread to almost equal percentage or amount. Our experiments indicate that the proposed multicore-aware code positioning approaches, not only improve the worst-case performance of the real-time threads but also make good tradeoffs between efficiency and fairness for threads that run on multicore platforms.

A Study on the Performance of Prefetching Web Cache Proxy (Prefetch하는 웹 캐쉬 프록시의 성능에 대한 연구)

  • 백윤철
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.11
    • /
    • pp.1453-1464
    • /
    • 2001
  • Explosive growth of Internet populations results performance degradations of web service. Popular web sites cannot provide proper level of services to many requests, and such poor services cannot give user a satisfaction. Web caching, the remedy of this problem, reduces the amount of network traffic and gives fast response to user. In this paper, we analyze the characteristics of web cache traffics using traces of NLANR(National Laboratory for Applied Network Research) root caches and Education network cache in Seoul National University. Based on this analysis, we suggest a method of prefetching and we discuss the gains of our prefetching. As a result, we find proposed prefetching enhances hit rate up to 3%, response time up to 5%.

  • PDF

Implementation of DCT using Bit Slice Signal Processor (BIT SLICE SIGNAL PROCESSOR를 이용한 DCT의 구현)

  • Kim, Dong-L.;Go, Seok-B.;Paek, Seung-K.;Lee, Tae-S.;Min, Byong-G.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1449-1453
    • /
    • 1987
  • A microprogrammable Bit Slice Sinal Processor for image processing is implemented. Processing speed is increased by the parallelism in horizontal microprogram using 120bits microcode, pipelined architecture, 2 bank memory switching that interfaces with the Host through DMA, a variable clock control, overflow checking H/W,look-up table method and cache memory. With this processor, a DCT algorithm which uses 2-D FFT is performed. The execution time for $512{\times}512{\times}8$ image is 12 sec when 16 bit operation is runned, and the recovered image has acceptable quality with MSE 0.276%.

  • PDF

L2A Cache Replacement Scheme for Label Switching Network (레이블 스위칭 네트웍 상에서 L2A 캐쉬 대체기법)

  • 김남기;황인철;윤현수
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2000.04a
    • /
    • pp.386-389
    • /
    • 2000
  • 인터넷이 급속도로 발전되면서 트래픽이 폭발적으로 증가하여 현재 라우터에 많은 부담을 주고 있다. 반면 스위칭 기술은 라우팅보다 빠르게 데이터를 전송할수 있다. 그 결과 라우터 병목 현상을 해결하고자 IP 라우팅이 스위칭 기술을 접목한 레이블 스위칭 네트웍이 출현하게 되었다. 레이블 스위칭 기술중 데이터 기반 레이블 스위칭에서 매우 중요한 것은 캐쉬 테이블 관리이다. 캐쉬 테이블에는 흐름 분류를 위한 정보와 레이블 스위칭을 위한 정보를 저장하고 있는데 캐쉬 테이블 크기는 라우터자원에 의해 제약을 받으므로 캐쉬 대체기법이 필요하게 된다. 따라서 효율적이 캐쉬테이블 관리를 위해 인터넷 트래픽 특성을 고려한 캐쉬 대체 기법에 관한 연구가 필요하다. 본 논문에서는 인터넷 트래픽 특성을 고려해 LFC 기법과 LRU 기법의 단점을 보완한 L2A 캐쉬 대체 기법을 제안한다. L2A 기법은 기본적인 FIFO , LFC, LRU 기법보다 나은 성능을 보이며 특히 캐쉬 크기가 작을 경우에도 타 기법에 비해 탁월한 성능을 유지한다.

  • PDF

Bounding Worst-Case Performance for Multi-Core Processors with Shared L2 Instruction Caches

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.1
    • /
    • pp.1-18
    • /
    • 2011
  • As the first step toward real-time multi-core computing, this paper presents a novel approach to bounding the worst-case performance for threads running on multi-core processors with shared L2 instruction caches. The idea of our approach is to compute the worst-case instruction access interferences between different threads based on the program control flow information of each thread, which can be statically analyzed. Our experiments indicate that the proposed approach can reasonably estimate the worst-case shared L2 instruction cache misses by considering the inter-thread instruction conflicts. Also, the worst-case execution time (WCET) of applications running on multi-core processors estimated by our approach is much better than the estimation by simply assuming all L2 instruction accesses are misses.