• Title/Summary/Keyword: 캐시메모리

Search Result 242, Processing Time 0.025 seconds

Enhancing LRU Buffer Replacement Policy with Delayed Write of Not-cold-dirty-pages for Flash Memory (플래시 메모리를 위한 Not-cold-Page 쓰기지연을 통한 LRU 버퍼교체 정책 개선)

  • Jung Ho-Young;Park Sung-Min;Cha Jae-Hyuk;Kang Soo-Yong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.634-641
    • /
    • 2006
  • Flash memory has many advantages like non-volatility and fast I/O speed, but it has also disadvantages such as not-in-place-update data and asymmetric read/write/erase speed. For the performance of flash memory storage, it is essential for the buffer replacement algorithms to reduce the number of write operations that also affects the number of erase operations. A new buffer replacement algorithm is proposed in this paper, that delays the writes of not-cold-dirty pages in the buffer cache of flash storage. We show that this algorithm effectively decreases the number of write operations and erase operations without much degradation of hit ratio. As a result overall performance of flash I/O speed is improved.

Improving Flash Translation Layer for Hybrid Flash-Disk Storage through Sequential Pattern Mining based 2-Level Prefetching Technique (하이브리드 플래시-디스크 저장장치용 Flash Translation Layer의 성능 개선을 위한 순차패턴 마이닝 기반 2단계 프리패칭 기법)

  • Chang, Jae-Young;Yoon, Un-Keum;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.4
    • /
    • pp.101-121
    • /
    • 2010
  • This paper presents an intelligent prefetching technique that significantly improves performance of hybrid fash-disk storage, a combination of flash memory and hard disk. Since flash memory embedded in a hybrid device is much faster than hard disk in terms of I/O operations, it can be utilized as a 'cache' space to improve system performance. The basic strategy for prefetching is to utilize sequential pattern mining, with which we can extract the access patterns of objects from historical access sequences. We use two techniques for enhancing the performance of hybrid storage with prefetching. One of them is to modify a FAST algorithm for mapping the flash memory. The other is to extend the unit of prefetching to a block level as well as a file level for effectively utilizing flash memory space. For evaluating the proposed technique, we perform the experiments using the synthetic data and real UCC data, and prove the usability of our technique.

Instructions and Data Prefetch Mechanism using Displacement History Buffer (변위 히스토리 버퍼를 이용한 명령어 및 데이터 프리페치 기법)

  • Jeong, Yong Su;Kim, JinHyuk;Cho, Tae Hwan;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.82-94
    • /
    • 2015
  • In this paper, we propose hardware prefetch mechanism with an efficient cache replacement policy by giving priority to the trigger block in which a spatial region and producing a spatial region by using the displacement field. It could be taken into account the sequence of the program since a history is based on the trigger block of history record, and it could be quickly prefetching the instructions or data address by adding a stored value to the trigger address and displacement field since a history is stored as a displacement value. Also, we proposed a method of replacing at random by the cache replacement policy from the low priority block when the cache area is full after giving priority to the trigger block. We analyzed using the memory simulator program gem5 and PARSEC benchmark to assess the performance of the hardware prefetcher. As a result, compared to the existing hardware prefecture to generate the spatial region using a bit vector, L1 data cache miss rate was reduced about 44.5% on average and an average of 26.1% of L1 instruction misses occur. In addition, IPC (Instruction Per Cycle) showed an improvement of about 23.7% on average.

Multi-layer Caching Scheme Considering Sub-graph Usage Patterns (서브 그래프의 사용 패턴을 고려한 다중 계층 캐싱 기법)

  • Yoo, Seunghun;Jeong, Jaeyun;Choi, Dojin;Park, Jaeyeol;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.3
    • /
    • pp.70-80
    • /
    • 2018
  • Due to the recent development of social media and mobile devices, graph data have been using in various fields. In addition, caching techniques for reducing I/O costs in the process of large capacity graph data have been studied. In this paper, we propose a multi-layer caching scheme considering the connectivity of the graph, which is the characteristics of the graph topology, and the history of the past subgraph usage. The proposed scheme divides a cache into Used Data Cache and Prefetched Cache. The Used Data Cache maintains data by weights according to the frequently used sub-graph patterns. The Prefetched Cache maintains the neighbor data of the recently used data that are not used. In order to extract the graph patterns, their past history information is used. Since the frequently used sub-graphs have high probabilities to be reused, they are cached. It uses a strategy to replace new data with less likely data to be used if the memory is full. Through the performance evaluation, we prove that the proposed caching scheme is superior to the existing cache management scheme.

Scalar First Replacement Strategy for Reference Prediction Table Used in Prefetching Streaming Data (스트리밍 데이터의 선인출에 사용되는 참조예측표의 스칼라 우선 교체 전략)

  • Lim, Chul-hoo;Chon, Young-Suk;Kim, Suk-il;Jeon, Joong-nam
    • The KIPS Transactions:PartA
    • /
    • v.11A no.3
    • /
    • pp.163-172
    • /
    • 2004
  • Multimedia applications tend to access their data as a streaming pattern with regular intervals. This characteristic can be utilized in prefetching the multimedia data into cache memory so as to reduce their execution speeds. The reference-prediction prefetch algorithm predicts the memory address that seems to be used in the next time based on the previous history of memory references stored in the prediction reference table. This paper proposes a strategy to manipulate the reference prediction table which contains all of the data reference instructions to scalar and streaming data. We have recognized that the scalar reference instructions do not contribute to the data prefetching algorithm. Therefore, when replacing an element in the reference prediction table, the proposed algorithm preferentially selects the scalar reference instruction before the stream reference instruction. It makes the stream reference instruction to stay for a long time compared to the FIFO replacement policy, and eventually improves the performance of data prefetching.

A Development of Fusion Processor Architecture for Efficient Main Memory Access in CPU-GPU Environment (CPU-GPU환경에서 효율적인 메인메모리 접근을 위한 융합 프로세서 구조 개발)

  • Park, Hyun-Moon;Kwon, Jin-San;Hwang, Tae-Ho;Kim, Dong-Sun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.11 no.2
    • /
    • pp.151-158
    • /
    • 2016
  • The HSA resolves an old problem with existing CPU and GPU architectures by allowing both units to directly access each other's memory pools via unified virtual memory. In a physically realized system, however, frequent data exchanges between CPU and GPU for a virtual memory block result bottlenecks and coherence request overheads. In this paper, we propose Fusion Processor Architecture for efficient access of main memory from both CPU and GPU. It consists of Job Manager, Re-mapper, and Pre-fetcher to control, organize, and distribute work loads and working areas for GPU cores. These components help on reducing memory exchanges between the two processors and improving overall efficiency by eliminating faulty page table requests. To verify proposed algorithm architectures, we develop an emulator based on QEMU, and compare several architectures such as CUDA(Compute Unified Device Architecture), OpenMP, OpenCL. As a result, Proposed fusion processor architectures show 198% faster than others by removing unnecessary memory copies and cache-miss overheads.

Performance Evaluation of HMB-Supported DRAM-Less NVMe SSDs (HMB를 지원하는 DRAM-Less NVMe SSD의 성능 평가)

  • Kim, Kyu Sik;Kim, Tae Seok
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.7
    • /
    • pp.159-166
    • /
    • 2019
  • Unlike modern Solid-State Drives with DRAM, DRAM-less SSDs do not have DRAM because they are cheap and consume less power. Obviously, they have performance degradation problem due to lack of DRAM in the controller and this problem can be alleviated by utilizing host memory buffer(HMB) feature of NVMe, which allows SSDs to utilize the DRAM of host. In this paper, we show that commercial DRAM-less SSDs surely exhibit lower I/O performance than other SSDs with DRAM, but they can be improved by utilizing the HMB feature. Through various experiments and analysis, we also show that DRAM-less SSDs mainly exploit the DRAM of host as mapping table cache rather than read cache or write buffer to improve I/O performance.

Performance Optimization of Numerical Ocean Modeling on Cloud Systems (클라우드 시스템에서 해양수치모델 성능 최적화)

  • JUNG, KWANGWOOG;CHO, YANG-KI;TAK, YONG-JIN
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.3
    • /
    • pp.127-143
    • /
    • 2022
  • Recently, many attempts to run numerical ocean models in cloud computing environments have been tried actively. A cloud computing environment can be an effective means to implement numerical ocean models requiring a large-scale resource or quickly preparing modeling environment for global or large-scale grids. Many commercial and private cloud computing systems provide technologies such as virtualization, high-performance CPUs and instances, ether-net based high-performance-networking, and remote direct memory access for High Performance Computing (HPC). These new features facilitate ocean modeling experimentation on commercial cloud computing systems. Many scientists and engineers expect cloud computing to become mainstream in the near future. Analysis of the performance and features of commercial cloud services for numerical modeling is essential in order to select appropriate systems as this can help to minimize execution time and the amount of resources utilized. The effect of cache memory is large in the processing structure of the ocean numerical model, which processes input/output of data in a multidimensional array structure, and the speed of the network is important due to the communication characteristics through which a large amount of data moves. In this study, the performance of the Regional Ocean Modeling System (ROMS), the High Performance Linpack (HPL) benchmarking software package, and STREAM, the memory benchmark were evaluated and compared on commercial cloud systems to provide information for the transition of other ocean models into cloud computing. Through analysis of actual performance data and configuration settings obtained from virtualization-based commercial clouds, we evaluated the efficiency of the computer resources for the various model grid sizes in the virtualization-based cloud systems. We found that cache hierarchy and capacity are crucial in the performance of ROMS using huge memory. The memory latency time is also important in the performance. Increasing the number of cores to reduce the running time for numerical modeling is more effective with large grid sizes than with small grid sizes. Our analysis results will be helpful as a reference for constructing the best computing system in the cloud to minimize time and cost for numerical ocean modeling.

Wall Cuckoo: A Method for Reducing Memory Access Using Hash Function Categorization (월 쿠쿠: 해시 함수 분류를 이용한 메모리 접근 감소 방법)

  • Moon, Seong-kwang;Min, Dae-hong;Jang, Rhong-ho;Jung, Chang-hun;NYang, Dae-hun;Lee, Kyung-hee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.6
    • /
    • pp.127-138
    • /
    • 2019
  • The data response speed is a critical issue of cloud services because it directly related to the user experience. As such, the in-memory database is widely adopted in many cloud-based applications for achieving fast data response. However, the current implementation of the in-memory database is mostly based on the linked list-based hash table which cannot guarantee the constant data response time. Thus, cuckoo hashing was introduced as an alternative solution, however, there is a disadvantage that only half of the allocated memory can be used for storing data. Subsequently, bucketized cuckoo hashing (BCH) improved the performance of cuckoo hashing in terms of memory efficiency but still cannot overcome the limitation that the insert overhead. In this paper, we propose a data management solution called Wall Cuckoo which aims to improve not only the insert performance but also lookup performance of BCH. The key idea of Wall Cuckoo is that separates the data among a bucket according to the different hash function be used. By doing so, the searching range among the bucket is narrowed down, thereby the amount of slot accesses required for the data lookup can be reduced. At the same time, the insert performance will be improved because the insert is following up the operation of the lookup. According to analysis, the expected value of slot access required for our Wall Cuckoo is less than that of BCH. We conducted experiments to show that Wall Cuckoo outperforms the BCH and Sorting Cuckoo in terms of the amount of slot access in lookup and insert operations and in different load factor (i.e., 10%-95%).

Research on Event Mechanism for Reducing Power Overheads in Cache Memory Synchronization (캐시 메모리 동기화 전력 감소를 위한 이벤트 메커니즘에 대한 연구)

  • Pak, Young-Jin;Jeong, Ha-Young;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.3
    • /
    • pp.69-75
    • /
    • 2011
  • In this paper, we propose an anycast event driven synchronization mechanism to reduce power overheads. Our proposed mechanism can reduce unnecessary polling operations on SHI(Snoop Hit Invalidate) or SHR(Snoop Hit Read) states. It prevents waisting bandwidth and reduces power overheads on polling operation. Also it decreases transition power of state change compared to broadcast model. Simulation results indicated that the proposed architecture had about 15.3% of power decrease compared to spin-lock model and about 4.7% of power decrease compared to broadcast model. Overall results indicated that proposed synchronization mechanism could increase power efficiency of multi-core system by reducing power overheads.