• Title/Summary/Keyword: write latency

Search Result 47, Processing Time 0.024 seconds

An Efficient Latency Hiding method using accumulation buffer (누적 버퍼를 활용한 효율적인 Latency Hiding기법)

  • Lee, Min-Woo;Han, Tack-Don
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2012.07a
    • /
    • pp.297-300
    • /
    • 2012
  • 현재 cache의 성능 향상을 위한 많은 기법들이 제안되고 있으며, Latency Hiding 기법 역시 cache의 효율적인 사용을 위해 많은 연구가 진행 되어 왔다. write buffer를 사용한 write Latency hiding기법이나 multi threading을 사용한 Latency Hiding 방법 등 여러 기법들이 연구되어 왔으며, 지금도 Latency hiding을 위한 많은 연구들이 지속적으로 진행되고 있다. 본 논문 역시 효율적인 Latency Hiding을 위한 누적 버퍼를 제안한다. 본 논문은 누적 버퍼의 활용도를 조사하여 얼마나 효율적으로 Latency를 은폐했는지, 또 버퍼를 사용함으로써 얻는 다른 이점에 대해 집중적으로 연구하였다.

  • PDF

A Locality-Aware Write Filter Cache for Energy Reduction of STTRAM-Based L1 Data Cache

  • Kong, Joonho
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.1
    • /
    • pp.80-90
    • /
    • 2016
  • Thanks to superior leakage energy efficiency compared to SRAM cells, STTRAM cells are considered as a promising alternative for a memory element in on-chip caches. However, the main disadvantage of STTRAM cells is high write energy and latency. In this paper, we propose a low-cost write filter (WF) cache which resides between the load/store queue and STTRAM-based L1 data cache. To maximize efficiency of the WF cache, the line allocation and access policies are optimized for reducing energy consumption of STTRAM-based L1 data cache. By efficiently filtering the write operations in the STTRAM-based L1 data cache, our proposed WF cache reduces energy consumption of the STTRAM-based L1 data cache by up to 43.0% compared to the case without the WF cache. In addition, thanks to the fast hit latency of the WF cache, it slightly improves performance by 0.2%.

Forgetting based File Cache Management Scheme for Non-Volatile Memory (데이터 망각을 활용한 비휘발성 메모리 기반 파일 캐시 관리 기법)

  • Kang, Dongwoo;Choi, Jongmoo
    • Journal of KIISE
    • /
    • v.42 no.8
    • /
    • pp.972-978
    • /
    • 2015
  • Non-volatile memory (NVM) supports both byte addressability and non-volatility. These characteristics make it feasible for NVM to be employed at any layer of the memory hierarchy such as cache, memory and disk. An interesting characteristic of NVM is that, even though it supports non-volatility, its retention capability is limited. Furthermore NVM has tradeoff between its retention capability and write latency. In this paper, we propose a novel NVM-based file cache management scheme that makes use of the limited retention capability to improve the cache performance. Experimental results with real-workloads show that our scheme can reduce access latency by up to 31% (24.4% average) compared with the conventional LRU based cache management scheme.

Demand-based FTL Cache Partitioning for Large Capacity SSDs (대용량 SSD를 위한 요구 기반 FTL 캐시 분리 기법)

  • Bae, Jinwook;Kim, Hanbyeol;Im, Junsu;Lee, Sungjin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.2
    • /
    • pp.71-78
    • /
    • 2019
  • As the capacity of SSDs rapidly increases, the amount of DRAM to keep a mapping table size in SSDs becomes very huge. To address a Demand-based FTL (DFTL) scheme that caches part of mapping entries in DRAM is considered to be a feasible alternative. However, owing to its unpredictable behaviors, DFTL fails to provide consistent I/O response times. In this paper, we a) analyze a root cause that results in fluctuation on read latency and b) propose a new demand-based FTL scheme that ensures guaranteed read response time with low write amplification. By preventing mapping evictions while serving reads, the proposed technique guarantees every host read requests to be done in 2 NAND read operations. Moreover, only with 25% of a cache ratio, the proposed scheme improves random write performance and random mixed performance by 1.65x and 1.15x, respectively, over the traditional DFTL.

Buffer Policy based on High-capacity Hybrid Memories for Latency Reduction of Read/Write Operations in High-performance SSD Systems

  • Kim, Sungho;Hwang, Sang-Ho;Lee, Myungsub;Kwak, Jong Wook;Park, Chang-Hyeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.7
    • /
    • pp.1-8
    • /
    • 2019
  • Recently, an SSD with hybrid buffer memories is actively researching to reduce the overall latency in server computing systems. However, existing hybrid buffer policies caused many swapping operations in pages because it did not consider the overall latency such as read/write operations of flash chips in the SSD. This paper proposes the clock with hybrid buffer memories (CLOCK-HBM) for a new hybrid buffer policy in the SSD with server computing systems. The CLOCK-HBM constructs new policies based on unique characteristics in both DRAM buffer and NVMs buffer for reducing the number of swapping operations in the SSD. In experimental results, the CLOCK-HBM reduced the number of swapping operations in the SSD by 43.5% on average, compared with LRU, CLOCK, and CLOCK-DNV.

Technology of the next generation low power memory system

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.10 no.4
    • /
    • pp.6-11
    • /
    • 2018
  • As embedded memory technology evolves, the traditional Static Random Access Memory (SRAM) technology has reached the end of development. For deepening the manufacturing process technology, the next generation memory technology is highly required because of the exponentially increasing leakage current of SRAM. Non-volatile memories such as STT-MRAM (Spin Torque Transfer Magnetic Random Access Memory), PCM (Phase Change Memory) are good candidates for replacing SRAM technology in embedded memory systems. They have many advanced characteristics in the perspective of power consumption, leakage power, size (density) and latency. Nonetheless, nonvolatile memories have two major problems that hinder their use it the next-generation memory. First, the lifetime of the nonvolatile memory cell is limited by the number of write operations. Next, the write operation consumes more latency and power than the same size of the read operation.These disadvantages can be solved using the compiler. The disadvantage of non-volatile memory is in write operations. Therefore, when the compiler decides the layout of the data, it is solved by optimizing the write operation to allocate a lot of data to the SRAM. This study provides insights into how these compiler and architectural designs can be developed.

Memory Latency Hiding Techniques (메모리 지연을 감추는 기법들)

  • Ki, An-Do
    • Electronics and Telecommunications Trends
    • /
    • v.13 no.3 s.51
    • /
    • pp.61-70
    • /
    • 1998
  • The obvious way to make a computer system more powerful is to make the processor as fast as possible. Furthermore, adopting a large number of such fast processors would be the next step. This multiprocessor system could be useful only if it distributes workload uniformly and if its processors are fully utilized. To achieve a higher processor utilization, memory access latency must be reduced as much as possible and even more the remaining latency must be hidden. The actual latency can be reduced by using fast logic and the effective latency can be reduced by using cache. This article discusses what the memory latency problem is, how serious it is by presenting analytical and simulation results, and existing techniques for coping with it; such as write-buffer, relaxed consistency model, multi-threading, data locality optimization, data forwarding, and data prefetching.

A Hierarchical Binary-search Tree for the High-Capacity and Asymmetric Performance of NVM (비대칭적 성능의 고용량 비휘발성 메모리를 위한 계층적 구조의 이진 탐색 트리)

  • Jeong, Minseong;Lee, Mijeong;Lee, Eunji
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.2
    • /
    • pp.79-86
    • /
    • 2019
  • For decades, in-memory data structures have been designed for DRAM-based main memory that provides symmetric read/write performances and has no limited write endurance. However, such data structures provide sub-optimal performance for NVM as it has different characteristics to DRAM. With this motivation, we rethink a conventional red-black tree in terms of its efficacy under NVM settings. The original red-black tree constantly rebalances sub-trees so as to export fast access time over dataset, but it inevitably increases the write traffic, adversely affecting the performance for NVM with a long write latency and limited endurance. To resolve this problem, we present a variant of the red-black tree called a hierarchical balanced binary search tree. The proposed structure maintains multiple keys in a single node so as to amortize the rebalancing cost. The performance study reveals that the proposed hierarchical binary search tree effectively reduces the write traffic by effectively reaping the high capacity of NVM.

A Novel Memory Hierarchy for Flash Memory Based Storage Systems

  • Yim, Keno-Soo
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.5 no.4
    • /
    • pp.262-269
    • /
    • 2005
  • Semiconductor scientists and engineers ideally desire the faster but the cheaper non-volatile memory devices. In practice, no single device satisfies this desire because a faster device is expensive and a cheaper is slow. Therefore, in this paper, we use heterogeneous non-volatile memories and construct an efficient hierarchy for them. First, a small RAM device (e.g., MRAM, FRAM, and PRAM) is used as a write buffer of flash memory devices. Since the buffer is faster and does not have an erase operation, write can be done quickly in the buffer, making the write latency short. Also, if a write is requested to a data stored in the buffer, the write is directly processed in the buffer, reducing one write operation to flash storages. Second, we use many types of flash memories (e.g., SLC and MLC flash memories) in order to reduce the overall storage cost. Specifically, write requests are classified into two types, hot and cold, where hot data is vulnerable to be modified in the near future. Only hot data is stored in the faster SLC flash, while the cold is kept in slower MLC flash or NOR flash. The evaluation results show that the proposed hierarchy is effective at improving the access time of flash memory storages in a cost-effective manner thanks to the locality in memory accesses.

A Study on Design and Cache Replacement Policy for Cascaded Cache Based on Non-Volatile Memories (비휘발성 메모리 시스템을 위한 저전력 연쇄 캐시 구조 및 최적화된 캐시 교체 정책에 대한 연구)

  • Juhee Choi
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.106-111
    • /
    • 2023
  • The importance of load-to-use latency has been highlighted as state-of-the-art computing cores adopt deep pipelines and high clock frequencies. The cascaded cache was recently proposed to reduce the access cycle of the L1 cache by utilizing differences in latencies among banks of the cache structure. However, this study assumes the cache is comprised of SRAM, making it unsuitable for direct application to non-volatile memory-based systems. This paper proposes a novel mechanism and structure for lowering dynamic energy consumption. It inserts monitoring logic to keep track of swap operations and write counts. If the ratio of swap operations to total write counts surpasses a set threshold, the cache controller skips the swap of cache blocks, which leads to reducing write operations. To validate this approach, experiments are conducted on the non-volatile memory-based cascaded cache. The results show a reduction in write operations by an average of 16.7% with a negligible increase in latencies.

  • PDF