• Title/Summary/Keyword: 캐시메모리

Search Result 242, Processing Time 0.024 seconds

Hybrid Main Memory based Buffer Cache Scheme by Using Characteristics of Mobile Applications (모바일 애플리케이션의 특성을 이용한 하이브리드 메모리 기반 버퍼 캐시 정책)

  • Oh, Chansoo;Kang, Dong Hyun;Lee, Minho;Eom, Young Ik
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1314-1321
    • /
    • 2015
  • Mobile devices employ buffer cache mechanisms, just as in computer systems such as desktops or servers, to mitigate the performance gap between main memory and secondary storage. However, DRAM has a problem in that it accelerates battery consumption by performing refresh operations periodically to maintain the stored data. In this paper, we propose a novel buffer cache scheme to increase the battery lifecycle in mobile devices based on a hybrid main memory architecture consisting of DRAM and non-volatile PCM. We also suggest a new buffer cache policy that allocates buffers based on process states to optimize the performance and endurance of PCM. In particular, our algorithm allocates each page to the appropriate position corresponding to the state of the application that owns the page, and tries to ensure a rapid response of foreground applications even with a small amount of DRAM memory. The experimental results indicate that the proposed scheme reduces the elapsed time of foreground applications by 58% on average and power consumption by 23% on average without negatively impacting the performance of background applications.

An Energy Efficient and High Performance Data Cache Structure Utilizing Tag History of Cache Addresses (캐시 주소의 태그 이력을 활용한 에너지 효율적 고성능 데이터 캐시 구조)

  • Moon, Hyun-Ju;Jee, Sung-Hyun
    • The KIPS Transactions:PartA
    • /
    • v.14A no.1 s.105
    • /
    • pp.55-62
    • /
    • 2007
  • Uptime of embedded processors for mobile devices are dependent on battery consumption. Especially the large portion of power consumption is known to be due to cache management in embedded processors. This paper proposes an energy efficient data cache structure for high performance embedded processors. High performance prefetching data cache issues prefetching instructions before issuing demand-fetch instructions based on reference predictions. These prefetching instruction bring reduction on memory delay by improving cache hit ratio, but on the other hand those increase energy consumption in proportion to the number of prefetching instructions. In this paper, we adopt tag history table on prefetching data cache for reducing energy consumption by minimizing parallel tag comparison. Experimental results show the proposed data cache improves performance on energy consumption as well as memory delay.

Multi-Strided Prefetching Using Adjacent Region Table (인접 영역 테이블을 이용한 다중 간격 프리페치 기법)

  • Shim, Jae-Seong;Jun, Ho-Yoon;Lee, Yong-Surk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.37-40
    • /
    • 2014
  • 프로세서와 메모리 간의 속도 차이로 인해 메모리 시스템의 성능 향상이 프로세서의 성능을 높이기 위한 중요한 요인이 되었고, 이를 위해 캐시 미스율을 감소시키는 방법이 연구되고 있다. 데이터 프리페치는 캐시의 미스율을 감소시키는 기법 중 하나이며 실제로 최근 프로세서에서 메모리 시스템의 성능을 향상시키기 위해 사용된다. 데이터 프리페치를 효과적으로 수행하기 위해서 메모리 주소의 접근 패턴을 파악하는 것이 중요하며, 이를 위해 순차적으로 접근하는 경우, 한 종류의 1 보다 크거나 같은 간격(stride)으로 뛰면서 접근하는 경우, 다수의 간격이 규칙적으로 반복되며 접근하는 경우 등의 다양한 패턴을 찾는 프리페치 기법들이 등장했다. 본 논문에서 소개하는 다중 간격 프리페치의 경우, 메모리 공간을 메모리 주소의 일부 상위 비트를 통해 여러 개의 영역으로 나누고, 하나의 패턴을 하나의 영역 안에서만 학습하여, 다른 영역에 속한 메모리 주소 접근 시 현재 학습하는 패턴에 어긋나는 주소라고 여기기 때문에 학습을 방해하지 않도록 하였다. 그러나 이 방법은 영역의 크기보다 같은 패턴을 갖는 메모리 주소 스트림의 크기가 더 클 때, 접근 주소의 영역이 바뀜으로 인해 불필요한 학습을 추가적으로 해야 하는 문제점이 있다. 이에 본 논문에서 인접 영역 테이블(ART: Adjacent Region Table)을 이용하여 같은 패턴을 갖는 메모리 접근 스트림의 크기가 영역의 크기보다 클 경우, 기존의 학습된 패턴대로 프리페치를 수행할 수 있도록 하였다. 본 논문에서 제안한 알고리즘으로 실험한 결과, 기존의 다중 간격 프리페치보다 캐시 미스율을 약 6.7% 낮췄고, 시스템 전체의 성능의 지표인 IPC의 경우, 약 5.78% 높아지는 성능 향상의 결과를 얻었다.

A Policy of Page Management Using Double Cache for NAND Flash Memory File System (NAND 플래시 메모리 파일 시스템을 위한 더블 캐시를 활용한 페이지 관리 정책)

  • Park, Myung-Kyu;Kim, Sung-Jo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.5
    • /
    • pp.412-421
    • /
    • 2009
  • Due to the physical characteristics of NAND flash memory, overwrite operations are not permitted at the same location, and therefore erase operations are required prior to rewriting. These extra operations cause performance degradation of NAND flash memory file system. Since it also has an upper limit to the number of erase operations for a specific location, frequent erases should reduce the lifetime of NAND flash memory. These problems can be resolved by delaying write operations in order to improve I/O performance: however, it will lower the cache hit ratio. This paper proposes a policy of page management using double cache for NAND flash memory file system. Double cache consists of Real cache and Ghost cache to analyze page reference patterns. This policy attempts to delay write operations in Ghost cache to maintain the hit ratio in Real cache. It can also improve write performance by reducing the search time for dirty pages, since Ghost cache consists of Dirty and Clean list. We find that the hit ratio and I/O performance of our policy are improved by 20.57% and 20.59% in average, respectively, when comparing them with the existing policies. The number of write operations is also reduced by 30.75% in average, compared with of the existing policies.

Performance Analysis is of Clean Block First Replacement Scheme for Non-volatile Memory Based Storage Devices (비휘발성 메모리 기반 저장장치를 위한 클린 블록 우선 교체 기법의 성능 분석)

  • Yang, Soo-Hyun;Ryu, Yeonseung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.151-154
    • /
    • 2012
  • 최근 차세대 저장장치로서 비휘발생 플래시 메모리 기반 저장장치의 사용이 층가하고 있다. 본 논문에서는 플래시 메모리 기반 저장장치의 특생인 삭제 연산의 문제점을 고려하는 새로운 버퍼 캐시 교체 기법을 연구하였다. 제안한 클린 블록 우선 (Clean Block First) 기법은 버퍼를 플래시 메모리의 삭제 블록의 리스트로 관리하고 클린 페이지를 가진 블록을 우선적으로 교체하여 플래시 메모리의 삭제 연산 횟수를 줄인다. 트레이스 기반의 시뮬레이션을 수행하여 교체를 위해 검색하는 클린 블록 개수의 변화에 대한 캐시 적중률과 삭제 연산 횟수를 분석하였다.

Peducing the Overhead of Virtual Address Translation Process (가상주소 변환 과정에 대한 부담의 줄임)

  • U, Jong-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.1
    • /
    • pp.118-126
    • /
    • 1996
  • Memory hierarchy is a useful mechanism for improving the memory access speed and making the program space larger by layering the memories and separating program spaces from memory spaces. However, it needs at least two memory accesses for each data reference : a TLB(Translation Lookaside Buffer) access for the address translation and a data cache access for the desired data. If the cache size increases to the multiplication of page size and the cache associativity, it is difficult to access the TLB with the cache in parallel, thereby making longer the critical timing path in the processor. To achieve such parallel accesses, we present the hybrid mapped TLB which combines a direct mapped TLB with a very small fully-associative mapped TLB. The former can reduce the TLB access time. while the latter removes the conflict misses from the former. The trace-driven simulation shows that under given workloads the proposed TLB is effective even when a fully-associative mapped TLB with only four entries is added because the effects of its increased misses are offset by its speed benefits.

  • PDF

High-Performance FFT Using Data Reorganization (데이터 재구성 기법을 이용한 고성능 FFT)

  • Park Neungsoo;Choi Yungho
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.215-222
    • /
    • 2005
  • The efficient utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in computation of large DFTs causes cache conflict misses, thereby resulting in poor cache performance. It leads to a severe degradation in overall performance. In this paper, we propose a dynamic data layout approach considering the memory hierarchy system. In our approach, data reorganization is performed between computation stages to reduce the number of cache misses. Also, we develop an efficient search algorithm to determine the optimal tree with the minimum execution time among possible factorization trees considering the size of DFTs and the data access stride. Our approach is applied to compute the fast Fourier Transform (FFT). Experiments were performed on Pentium 4, $Athlon^{TM}$ 64, Alpha 21264, UtraSPARC III. Experiment results show that our FFT achieve performance improvement of up to 3.37 times better than the previous FFT packages.

Radiation-Induced Soft Error Detection Method for High Speed SRAM Instruction Cache (고속 정적 RAM 명령어 캐시를 위한 방사선 소프트오류 검출 기법)

  • Kwon, Soon-Gyu;Choi, Hyun-Suk;Park, Jong-Kang;Kim, Jong-Tae
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.6B
    • /
    • pp.948-953
    • /
    • 2010
  • In this paper, we propose multi-bit soft error detection method which can use an instruction cache of superscalar CPU architecture. Proposed method is applied to high-speed static RAM for instruction cache. Using 1D parity and interleaving, it has less memory overhead and detects more multi-bit errors comparing with other methods. It only detects occurrence of soft errors in static RAM. Error correction is treated like a cache miss situation. When soft errors are occurred, it is detected by 1D parity. Instruction cache just fetch the words from lower-level memory to correct errors. This method can detect multi-bit errors in maximum 4$\times$4 window.

Cache Memory and Replacement Algorithm Implementation and Performance Comparison

  • Park, Na Eun;Kim, Jongwan;Jeong, Tae Seog
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.11-17
    • /
    • 2020
  • In this paper, we propose practical results for cache replacement policy by measuring cache hit and search time for each replacement algorithm through cache simulation. Thus, the structure of each cache memory and the four types of alternative policies of FIFO, LFU, LRU and Random were implemented in software to analyze the characteristics of each technique. The paper experiment showed that the LRU algorithm showed hit rate and search time of 36.044% and 577.936ns in uniform distribution, 45.636% and 504.692ns in deflection distribution, while the FIFO algorithm showed similar performance to the LRU algorithm at 36.078% and 554.772ns in even distribution and 45.662% and 489.574ns in bias distribution. Then LFU followed, Random algorithm was measured at 30.042% and 622.866ns at even distribution, 36.36% at deflection distribution and 553.878ns at lowest performance. The LRU replacement method commonly used in cache memory has the complexity of implementation, but it is the most efficient alternative to conventional alternative algorithms, indicating that it is a reasonable alternative method considering the reference information of data.

Performance Analysis of Flash Memory SSD with Non-volatile Cache for Log Storage (비휘발성 캐시를 사용하는 플래시 메모리 SSD의 데이터베이스 로깅 성능 분석)

  • Hong, Dae-Yong;Oh, Gi-Hwan;Kang, Woon-Hak;Lee, Sang-Won
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.107-113
    • /
    • 2015
  • In a database system, updates on pages that are made by a transaction should be stored in a secondary storage before the commit is complete. Generic secondary storages have volatile DRAM caches to hide long latency for non-volatile media. However, as logs that are only written to the volatile DRAM cache don't ensure durability, logging latency cannot be hidden. Recently, a flash SSD with capacitor-backed DRAM cache was developed to overcome the shortcoming. Storage devices, like those with a non-volatile cache, will increase transaction throughput because transactions can commit as soon as the logs reach the cache. In this paper, we analyzed performance in terms of transaction throughput when the SSD with capacitor-backed DRAM cache was used as log storage. The transaction throughput can be improved over three times, by committing right after storing the logs to the DRAM cache, rather than to a secondary storage device. Also, we showed that it could acquire over 73% of the ideal logging performance with proper tuning.