• Title/Summary/Keyword: 캐시메모리

Search Result 242, Processing Time 0.03 seconds

An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU (Coloring이 적용된 Gauss-Seidel 해법을 통한 CPU와 GPU의 연산 효율에 관한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.41 no.2
    • /
    • pp.117-124
    • /
    • 2017
  • The performance of the colored Gauss-Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss-Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss-Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.

Limiting CPU Frequency Scaling Considering Main Memory Accesses (주메모리 접근을 고려한 CPU 주파수 조정 제한)

  • Park, Moonju
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.483-491
    • /
    • 2014
  • Contemporary computer systems exploits DVFS (Dynamic Voltage/Frequency Scaling) technology for balancing performance and power consumption. The efficiency of DVFS depends on how much performance we get for larger power consumption due to elevated CPU frequency. Especially for memory-bounded applications, higher CPU frequency often does not result in higher performance. In this paper, we present an upper bound of CPU frequency scaling based on memory accesses. It is observed that the performance gain due to higher CPU frequency is limited by memory accesses (last level cache misses) per instructions by experiments. Using the results, we present the CPU frequency upper bound with little performance gain. Experimental results show that for a memory-bounded application, applying the frequency upper bound enhances the energy efficiency of the application by above 30%.

Design and Implementation of an In-Memory File System Cache with Selective Compression (대용량 파일시스템을 위한 선택적 압축을 지원하는 인-메모리 캐시의 설계와 구현)

  • Choe, Hyeongwon;Seo, Euiseong
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.658-667
    • /
    • 2017
  • The demand for large-scale storage systems has continued to grow due to the emergence of multimedia, social-network, and big-data services. In order to improve the response time and reduce the load of such large-scale storage systems, DRAM-based in-memory cache systems are becoming popular. However, the high cost of DRAM severely restricts their capacity. While the method of compressing cache entries has been proposed to deal with the capacity limitation issue, compression and decompression, which are technically difficult to parallelize, induce significant processing overhead and in turn retard the response time. A selective compression scheme is proposed in this paper for in-memory file system caches that rapidly estimates the compression ratio of incoming cache entries with their Shannon entropies and compresses cache entries with low compression ratio. In addition, a description is provided of the design and implementation of an in-kernel in-memory file system cache with the proposed selective compression scheme. The evaluation showed that the proposed scheme reduced the execution time of benchmarks by approximately 18% in comparison to the conventional non-compressing in-memory cache scheme. It also provided a cache hit ratio similar to the all-compressing counterpart and reduced 7.5% of the execution time by reducing the compression overhead. In addition, it was shown that the selective compression scheme can reduce the CPU time used for compression by 28% compared to the case of the all-compressing scheme.

A Study on Design and Cache Replacement Policy for Cascaded Cache Based on Non-Volatile Memories (비휘발성 메모리 시스템을 위한 저전력 연쇄 캐시 구조 및 최적화된 캐시 교체 정책에 대한 연구)

  • Juhee Choi
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.106-111
    • /
    • 2023
  • The importance of load-to-use latency has been highlighted as state-of-the-art computing cores adopt deep pipelines and high clock frequencies. The cascaded cache was recently proposed to reduce the access cycle of the L1 cache by utilizing differences in latencies among banks of the cache structure. However, this study assumes the cache is comprised of SRAM, making it unsuitable for direct application to non-volatile memory-based systems. This paper proposes a novel mechanism and structure for lowering dynamic energy consumption. It inserts monitoring logic to keep track of swap operations and write counts. If the ratio of swap operations to total write counts surpasses a set threshold, the cache controller skips the swap of cache blocks, which leads to reducing write operations. To validate this approach, experiments are conducted on the non-volatile memory-based cascaded cache. The results show a reduction in write operations by an average of 16.7% with a negligible increase in latencies.

  • PDF

A Prefetching and Memory Management Policy for Personal Solid State Drives (개인용 SSD를 위한 선반입 및 메모리 관리 정책)

  • Baek, Sung-Hoon
    • The KIPS Transactions:PartA
    • /
    • v.19A no.1
    • /
    • pp.35-44
    • /
    • 2012
  • Traditional technologies that are used to improve the performance of hard disk drives show many negative cases if they are applied to solid state drives (SSD). Access time and block sequence in hard disk drives that consist of mechanical components are very important performance factors. Meanwhile, SSD provides superior random read performance that is not affected by block address sequence due to the characteristics of flash memory. Practically, it is recommended to disable prefetching if a SSD is installed in a personal computer. However, this paper presents a combinational method of a prefetching scheme and a memory management that consider the internal structure of SSD and the characteristics of NAND flash memory. It is important that SSD must concurrently operate multiple flash memory chips. The I/O unit size of NAND flash memory tends to increase and it exceeded the block size of operating systems. Hence, the proposed prefetching scheme performs in an operating unit of SSD. To complement a weak point of the prefetching scheme, the proposed memory management scheme adaptively evicts uselessly prefetched data to maximize the sum of cache hit rate and prefetch hit rate. We implemented the proposed schemes as a Linux kernel module and evaluated them using a commercial SSD. The schemes improved the I/O performance up to 26% in a given experiment.

An Efficient Data Block Replacement and Rearrangement Technique for Hybrid Hard Disk Drive (하이브리드 하드디스크를 위한 효율적인 데이터 블록 교체 및 재배치 기법)

  • Park, Kwang-Hee;Lee, Geun-Hyung;Kim, Deok-Hwan
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.1-10
    • /
    • 2010
  • Recently heterogeneous storage system such as hybrid hard disk drive (H-HDD) combining flash memory and magnetic disk is launched, according as the read performance of NAND flash memory is enhanced as similar to that of hard disk drive (HDD) and the power consumption of NAND flash memory is reduced less than that of HDD. However, the read and write operations of NAND flash memory are slower than those of rotational disk. Besides, serious overheads are incurred on CPU and main memory in the case that intensive write requests to flash memory are repeatedly occurred. In this paper, we propose the Least Frequently Used-Hot scheme that replaces the data blocks whose reference frequency of read operation is low and update frequency of write operation is high, and the data flushing scheme that rearranges the data blocks into the multi-zone of the rotation disk. Experimental results show that the execution time of the proposed method is 38% faster than those of conventional LRU and LFU block replacement schemes in I/O performance aspect and the proposed method increases the life span of Non-Volatile Cache 40% higher than those of conventional LRU, LFU, FIFO block replacement schemes.

Distributed In-Memory Caching Method for ML Workload in Kubernetes (쿠버네티스에서 ML 워크로드를 위한 분산 인-메모리 캐싱 방법)

  • Dong-Hyeon Youn;Seokil Song
    • Journal of Platform Technology
    • /
    • v.11 no.4
    • /
    • pp.71-79
    • /
    • 2023
  • In this paper, we analyze the characteristics of machine learning workloads and, based on them, propose a distributed in-memory caching technique to improve the performance of machine learning workloads. The core of machine learning workload is model training, and model training is a computationally intensive task. Performing machine learning workloads in a Kubernetes-based cloud environment in which the computing framework and storage are separated can effectively allocate resources, but delays can occur because IO must be performed through network communication. In this paper, we propose a distributed in-memory caching technique to improve the performance of machine learning workloads performed in such an environment. In particular, we propose a new method of precaching data required for machine learning workloads into the distributed in-memory cache by considering Kubflow pipelines, a Kubernetes-based machine learning pipeline management tool.

  • PDF

Advanced Disk Block Caching Algorithm for Disk I/O sub-system (디스크 입출력 서브시스템을 위한 개선된 디스크 블록 캐싱 알고리즘)

  • Jung, Soo-Mok;Rho, Kyung-Taeg
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.6
    • /
    • pp.139-146
    • /
    • 2007
  • A hard disk. which can be classified as an external storage is usually capacious and economical. In spite of the attractive characteristics and efforts on the performance improvement, however, the operation of the hard disk is apparently slower than a processor and the advancement has also been slowly conducted since it is based on mechanical process. On the other hand. the advancement of the processor has been drastically performed as semiconductor technology does. So, disk I/O sub-system becomes bottleneck of computer systems' performance. For this reason. the research on disk I/O sub-system is in progress to improve computer systems' performance. In this paper, we proposed multi-level LRU scheme and then apply it to the computer systems with buffer cache and disk cache. By applying the proposed scheme to computer systems. the average access time to ask blocks can be decreased. The efficiency of the proposed algorithm was verified by simulation results.

  • PDF

Improving Search Performance of Tries Data Structures for Network Filtering by Using Cache (네트워크 필터링에서 캐시를 적용한 트라이 구조의 탐색 성능 개선)

  • Kim, Hoyeon;Chung, Kyusik
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.6
    • /
    • pp.179-188
    • /
    • 2014
  • Due to the tremendous amount and its rapid increase of network traffic, the performance of network equipments are becoming an important issue. Network filtering is one of primary functions affecting the performance of the network equipment such as a firewall or a load balancer to process the packet. In this paper, we propose a cache based tri method to improve the performance of the existing tri method of searching for network filtering. When several packets are exchanged at a time between a server and a client, the tri method repeats the same search procedure for network filtering. However, the proposed method can avoid unnecessary repetition of search procedure by exploiting cache so that the performance of network filtering can be improved. We performed network filtering experiments for the existing method and the proposed method. Experimental results showed that the proposed method could process more packets up to 790,000 per second than the existing method. When the size of cache list is 11, the proposed method showed the most outstanding performance improvement (18.08%) with respect to memory usage increase (7.75%).

Page Replacement Algorithm for Improving Performance of Hybrid Main Memory (하이브리드 메인 메모리의 성능 향상을 위한 페이지 교체 기법)

  • Lee, Minhoe;Kang, Dong Hyun;Kim, Junghoon;Eom, Young Ik
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.1
    • /
    • pp.88-93
    • /
    • 2015
  • In modern computer systems, DRAM is commonly used as main memory due to its low read/write latency and high endurance. However, DRAM is volatile memory that requires periodic power supply (i.e., memory refresh) to sustain the data stored in it. On the other hand, PCM is a promising candidate for replacement of DRAM because it is non-volatile memory, which could sustain the stored data without memory refresh. PCM is also available for byte-addressable access and in-place update. However, PCM is unsuitable for using main memory of a computer system because it has two limitations: high read/write latency and low endurance. To take the advantage of both DRAM and PCM, a hybrid main memory, which consists of DRAM and PCM, has been suggested and actively studied. In this paper, we propose a novel page replacement algorithm for hybrid main memory. To cope with the weaknesses of PCM, our scheme focuses on reducing the number of PCM writes in the hybrid main memory. Experimental results shows that our proposed page replacement algorithm reduces the number of PCM writes by up to 80.5% compared with the other page replacement algorithms.