• Title/Summary/Keyword: Private cache

Search Result 12, Processing Time 0.764 seconds

Core-aware Cache Replacement Policy for Reconfigurable Last Level Cache (재구성 가능한 라스트 레벨 캐쉬 구조를 위한 코어 인지 캐쉬 교체 기법)

  • Son, Dong-Oh;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.11
    • /
    • pp.1-12
    • /
    • 2013
  • In multi-core processors, Last Level Cache(LLC) can reduce the speed gap between the memory and the core. For this reason, LLC has big impact on the performance of processors. LLC is composed of shared cache and private cache. In computer architecture community, most researchers have mainly focused on the management techniques for shared cache, while management techniques for private cache have not been widely researched. In conventional private LLC, memory is statically assigned to each core, resulting in serious performance degradation when the workloads are not fairly distributed. To overcome this problem, this paper proposes the replacement policy for managing private cache of LLC efficiently. As proposed core-aware cache replacement policy can reconfigure LLC dynamically, hit rate of LLC is increases drastically. Moreover, proposed policy uses 2-bit saturating counters to improve the performance. According to our simulation results, the proposed method can improve hit rates by 9.23% and reduce the access time by 12.85% compared to the conventional method.

Efficient Cache Architecture for Transactional Memory (트랜잭셔널 메모리를 위한 효율적인 캐시 구조)

  • Choi, Dong-Min;Kim, Seung-Hun;Ro, Won-Woo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.4
    • /
    • pp.1-8
    • /
    • 2011
  • Traditional transactional memory systems are no longer able to guarantee the performance of diverse applications with overflowed transactions since there is the drawback that tracking the data for logging is difficult. Especially, this mechanism has a disadvantage of increasing communication delay for sustaining the state which is required to detect the conflict on the overflowed transactions from the first level cache in the transactional memory systems. To address this point, we have focused on the cache architecture of the systems to reduce the overhead caused by overflows and cache misses. In this paper, we present Supportive Cache which reduces additional overhead during transactions. Supportive Cache performs a parallel look-up with L1 private cache and uses the same replacement policy as L1 private cache. We evaluate the performance of the proposed design by comparing LogTM-SE with and without Supportive Cache. The simulation results show that our system improves the performance by 37% on average, compared to the original LogTM-SE which uses the same hardware resource.

Study of Cache Performance on GPGPU

  • Choi, Kyu Hyun;Kim, Seon Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.78-82
    • /
    • 2015
  • General-purpose graphics processing units (GPGPUs) provide tremendous computational and processing power. Despite the latency hiding mechanism, a GPU architecture requires high memory bandwidth and lower latency between computational units and the memory system. For this reason, the current GPU architecture has private L1 caches in each core and a shared L2 cache to increase performance by reducing memory latency. But in some cases, this CPU-like cache design is not suitable for GPGPUs. In this paper, we analyze detailed cache performance related to GPGPU application characteristics, and suggest technical alternatives for the GPGPU architecture as future work.

Bus Splitting Techniques for MPSoC to Reduce Bus Energy (MPSoC 플랫폼의 버스 에너지 절감을 위한 버스 분할 기법)

  • Chung Chun-Mok;Kim Jin-Hyo;Kim Ji-Hong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.699-708
    • /
    • 2006
  • Bus splitting technique reduces bus energy by placing modules with frequent communications closely and using necessary bus segments in communications. But, previous bus splitting techniques can not be used in MPSoC platform, because it uses cache coherency protocol and all processors should be able to see the bus transactions. In this paper, we propose a bus splitting technique for MPSoC platform to reduce bus energy. The proposed technique divides a bus into several bus segments, some for private memory and others for shared memory. So, it minimizes the bus energy consumed in private memory accesses without producing cache coherency problem. We also propose a task allocation technique considering cache coherency protocol. It allocates tasks into processors according to the numbers of bus transactions and cache coherence protocol, and reduces the bus energy consumption during shared memory references. The experimental results from simulations say the bus splitting technique reduces maximal 83% of the bus energy consumption by private memory accesses. Also they show the task allocation technique reduces maximal 30% of bus energy consumed in shared memory references. We can expect the bus splitting technique and the task allocation technique can be used in multiprocessor platforms to reduce bus energy without interference with cache coherency protocol.

Efficient On-Chip Idle Cache Utilization Technique in Chip Multi-Processor Architecture (칩 멀티 프로세서 구조에서 온칩 유휴 캐시의 효과적인 활용 방안)

  • Kwak, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.13-21
    • /
    • 2013
  • Recently, although the number of cores on a chip multi-processor increases, multi-programming or multi-threaded programming techniques to utilize the whole cores are still insufficient. Therefore, there inevitably exist some idle cores which are not working. This results in a waste of the caches, so-called idle caches which are dedicated to those idle cores. In this research, we propose amethodology to exploit idle caches effectively as victimcaches of on-chip memory resource. In simulation results, we have achieved 19.4%and 10.2%IPC improvement in 4-core and 16-core respectively, compared to previous technique.

Dynamic Directory Table: On-Demand Allocation of Directory Entries for Active Shared Cache Blocks (동적 디렉터리 테이블 : 공유 캐시 블록의 디렉터리 엔트리 동적 할당)

  • Bae, Han Jun;Choi, Lynn
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1245-1251
    • /
    • 2017
  • In this study we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than one core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is implemented as a cache of active directory entries. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 17.84% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

Design of Pipeline Bus and the Performance Evaluation in Multiprocessor System (다중프로세서 시스템에서 파이프라인 전송 버스의 설계 및 성능 평가)

  • 윤용호;임인칠
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.2
    • /
    • pp.288-299
    • /
    • 1993
  • This paper proposes the new bus protocol in the tightly coupled multiprocessor system. The bus protocol uses the pipelined data transfer and block transfer scheme to increase the bus bandwidth, The bus also has the independent transfer lines for the address and data respectively, and it can transfer the data up to maximum 264 Mbytes /sec. This paper also models the multiprocessor system where each processor boards have the private cache. Simulation evaluates the bus and system performance according to hit ratio of the reference data in cache memory, In the case of using this bus, the bus is evaluated not to be saturated when up to 10 processor boards are connected to the bus. As for up to 4 memory interleavng, the performance increases linearly.

  • PDF

Enhancing Location Privacy through P2P Network and Caching in Anonymizer

  • Liu, Peiqian;Xie, Shangchen;Shen, Zihao;Wang, Hui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.5
    • /
    • pp.1653-1670
    • /
    • 2022
  • The fear that location privacy may be compromised greatly hinders the development of location-based service. Accordingly, some schemes based on the distributed architecture in peer-to-peer network for location privacy protection are proposed. Most of them assume that mobile terminals are mutually trusted, but this does not conform to realistic scenes, and they cannot make requirements for the level of location privacy protection. Therefore, this paper proposes a scheme for location attribute-based security authentication and private sharing data group, so that they trust each other in peer-to-peer network and the trusted but curious mobile terminal cannot access the initiator's query request. A new identifier is designed to allow mobile terminals to customize the protection strength. In addition, the caching mechanism is introduced considering the cache capacity, and a cache replacement policy based on deep reinforcement learning is proposed to reduce communications with location-based service server for achieving location privacy protection. Experiments show the effectiveness and efficiency of the proposed scheme.

An Optimized Cache Coherence Protocol in Multiprocessor System Connected by Slotted Ring (슬롯링으로 연결된 다중처리기 시스템에서 최적화된 캐쉬일관성 프로토콜)

  • Min, Jun-Sik;Chang, Tae-Mu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3964-3975
    • /
    • 2000
  • There are two policies for maintaining consistency among the multiple processor caches in a multiprocessor system: Write invalidate and Write update. In the write invalidate policy, whenever a processor attempt to write its cached block, it has to invalidate all the same copies of the updated block in the system. As a results of this frequent invalidations, this policy results in high cache miss ratio. On the other hand, the write update policy renew them, instead of invalidating all the same copies. This policy has to transfer the updated contents through interconnection network, whether the updated block is ptivate or not. Therefore the network suffer from heavy transaction traffic. In this paper we present an efficient cache coherence protocol for shared memory multiprocessor system connected by slotted ring. This protocol is based on the write update policy, but the updated contents are transferred only in case of updating the shared block. Otherwise, if the updated block is private, the updated contents are not transferred. We analyze the proposed protocol and enforce simulation to compare it with previous version.

  • PDF

공유 메모리를 갖는 다중 프로세서 컴퓨터 시스팀의 설계 및 성능분석

  • Choe, Chang-Yeol;Park, Byeong-Gwan;Park, Seong-Gyu;O, Gil-Rok
    • ETRI Journal
    • /
    • v.10 no.3
    • /
    • pp.83-91
    • /
    • 1988
  • This paper describes the architecture and the performance analysis of a multiprocessor system, which is based on the shared memory and single system bus. The system bus provides the pended protocol for the multiprocessor environment. Analyzing the processor utilization, address/data bus utilization and memory conflicts, we use a simulation model. The hit ratio of private cache memory is a major factor on the linear increase of the performance of a shared memory based multiprocessor system.

  • PDF