Search | Korea Science

CPC: A File I/O Cache Management Policy for Compute-Bound Workloads

Bahn, Hyokyung
- International journal of advanced smart convergence
- /
- v.11 no.2
- /
- pp.1-6
- /
- 2022
With the emergence of the new era of the 4th industrial revolution, compute-bound workloads with large memory footprint like big data processing increase dramatically. Even in such compute-bound workloads, however, we observe bulky I/Os while loading big data from storage to memory. Although file I/O cache plays a role of accelerating the performance of storage I/O, we found out that the cache hit rate in such environments is not improved even though we increase the file I/O cache capacity because of some special I/O references generated by compute-bound workloads. To cope with this situation, we propose a new file I/O cache management policy that improves the cache hit rate for compute-bound workloads significantly. Trace-driven simulations by replaying file I/O reference logs of compute-bound workloads show that the proposed cache management policy improves the cache hit rate compared to the well-acknowledged CLOCK algorithm by a large margin.
https://doi.org/10.7236/IJASC.2022.11.2.1 인용 PDF KSCI

Extended Pairing Heap Algorithms Considering Cache Effect (캐쉬 효과를 고려한 확장된 Pairing Heap 알고리즘)

정균락;김경훈
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.5_6
- /
- pp.250-257
- /
- 2003
As the memory access time becomes slower relative to the fast processor speed, most systems use cache memory to reduce the gap. The cache performance has an increasingly large impact on the performance of algorithms. Blocking is the well known method to utilize cache and has shown good results in multiplying matrices and search trees like d-heap. But if we use blocking in the data structures which require rotation during insertion or deletion, the execution time increases as the data movements between blocks are necessary. In this paper, we have proposed the extended pairing heap algorithms using block node and shown by experiments that our structure is superior Also in case of using block node, we use less memory space as the number of pointers decreases.
PDF KSCI

A Study on Direct Cache-to-Cache Transfer for Hybrid Cache Architecture to Reduce Write Operations (쓰기 횟수 감소를 위한 하이브리드 캐시 구조에서의 캐시간 직접 전송 기법에 대한 연구)

Juhee Choi
- Journal of the Semiconductor & Display Technology
- /
- v.23 no.1
- /
- pp.65-70
- /
- 2024
Direct cache-to-cache transfer has been studied to reduce the latency and bandwidth consumption related to the shared data in multiprocessor system. Even though these studies lead to meaningful results, they assume that caches consist of SRAM. For example, if the system employs the non-volatile memory, the one of the most important parts to consider is to decrease the number of write operations. This paper proposes a hybrid write avoidance cache coherence protocol that considers the hybrid cache architecture. A new state is added to finely control what is stored in the non-volatile memory area, and experimental results showed that the number of writes was reduced by about 36% compared to the existing schemes.
PDF

Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing

Huangfu, Yijie;Zhang, Wei
- Journal of Computing Science and Engineering
- /
- v.11 no.2
- /
- pp.69-77
- /
- 2017
Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU programs, GPGPU programs typically have considerably less temporal/spatial locality. Moreover, the L1 data cache is used by many threads that access a data size typically considerably larger than the L1 cache, making it critical to bypass L1 data cache intelligently to enhance GPU cache performance. In this paper, we examine GPU cache access behavior and propose a simple hardware-based GPU cache bypassing method that can be applied to GPU applications without recompiling programs. Moreover, we introduce a hybrid method that integrates static profiling information and hardware-based bypassing to further enhance performance. Our experimental results reveal that hardware-based cache bypassing can boost performance for most benchmarks, and the hybrid method can achieve performance comparable to state-of-the-art compiler-based bypassing with considerably less profiling cost.
https://doi.org/10.5626/JCSE.2017.11.2.69 인용 PDF KSCI

Performance Analysis of Cache and Internal Memory of a High Performance DSP for an Optimal Implementation of Motion Picture Encoder (고성능 DSP에서 동영상 인코더의 최적화 구현을 위한 캐쉬 및 내부 메모리 성능 분석)

Lim, Se-Hun;Chung, Sun-Tae
- The Journal of the Korea Contents Association
- /
- v.8 no.5
- /
- pp.72-81
- /
- 2008
High Performance DSP usually supports cache and internal memory. For an optimal implementation of a multimedia stream application on such a high performance DSP, one needs to utilize the cache and internal memory efficiently. In this paper, we investigate performance analysis of cache, and internal memory configuration and placement necessary to achieve an optimal implementation of multimedia stream applications like motion picture encoder on high performance DSP, TMS320C6000 series, and propose strategies to improve performance for cache and internal memory placement. From the results of analysis and experiments, it is verified that 2-way L2 cache configuration with the remaining memory configured as internal memory shows relatively good performance. Also, it is shown that L1P cache hit rate is enhanced when frequently called routines and routines having caller-callee relationships with them are continuously placed in the internal memory and that L1D cache hit rate is enhanced by the simple change of the data size. The results in the paper are expected to contribute to the optimal implementation of multimedia stream applications on high performance DSPs.
https://doi.org/10.5392/JKCA.2008.8.5.072 인용 PDF

Performance Analysis of Adaptive Partition Cache Replacement using Various Monitoring Ratios for Non-volatile Memory Systems

Hwang, Sang-Ho;Kwak, Jong Wook
- Journal of the Korea Society of Computer and Information
- /
- v.23 no.4
- /
- pp.1-8
- /
- 2018
In this paper, we propose an adaptive partition cache replacement policy and evaluate the performance of our scheme using various monitoring ratios to help lifetime extension of non-volatile main memory systems without performance degradation. The proposal combines conventional LRU (Least Recently Used) replacement policy and Early Eviction Zone (E2Z), which considers a dirty bit as well as LRU bits to select a candidate block. In particular, this paper shows the performance of non-volatile memory using various monitoring ratios and determines optimized monitoring ratio and partition size of E2Z for reducing the number of writebacks using cache hit counter logic and hit predictor. In the experiment evaluation, we showed that 1:128 combination provided the best results of writebacks and runtime, in terms of performance and complexity trade-off relation, and our proposal yielded up to 42% reduction of writebacks, compared with others.
https://doi.org/10.9708/jksci.2018.23.04.001 인용 PDF KSCI

Study of Cache Performance on GPGPU

Choi, Kyu Hyun;Kim, Seon Wook
- IEIE Transactions on Smart Processing and Computing
- /
- v.4 no.2
- /
- pp.78-82
- /
- 2015
General-purpose graphics processing units (GPGPUs) provide tremendous computational and processing power. Despite the latency hiding mechanism, a GPU architecture requires high memory bandwidth and lower latency between computational units and the memory system. For this reason, the current GPU architecture has private L1 caches in each core and a shared L2 cache to increase performance by reducing memory latency. But in some cases, this CPU-like cache design is not suitable for GPGPUs. In this paper, we analyze detailed cache performance related to GPGPU application characteristics, and suggest technical alternatives for the GPGPU architecture as future work.
https://doi.org/10.5573/IEIESPC.2015.4.2.078 인용 PDF KSCI

Design of an Asynchronous Data Cache with FIFO Buffer for Write Back Mode (Write Back 모드용 FIFO 버퍼 기능을 갖는 비동기식 데이터 캐시)

Park, Jong-Min;Kim, Seok-Man;Oh, Myeong-Hoon;Cho, Kyoung-Rok
- The Journal of the Korea Contents Association
- /
- v.10 no.6
- /
- pp.72-79
- /
- 2010
In this paper, we propose the data cache architecture with a write buffer for a 32bit asynchronous embedded processor. The data cache consists of CAM and data memory. It accelerates data up lood cycle between the processor and the main memory that improves processor performance. The proposed data cache has 8 KB cache memory. The cache uses the 4-way set associative mapping with line size of 4 words (16 bytes) and pseudo LRU replacement algorithm for data replacement in the memory. Dirty register and write buffer is used for write policy of the cache. The designed data cache is synthesized to a gate level design using $0.13-{\mu}m$ process. Its average hit rate is 94%. And the system performance has been improved by 46.53%. The proposed data cache with write buffer is very suitable for a 32-bit asynchronous processor.
https://doi.org/10.5392/JKCA.2010.10.6.072 인용 PDF KSCI

High Performance Data Cache Memory Architecture (고성능 데이터 캐시 메모리 구조)

Kim, Hong-Sik;Kim, Cheong-Ghil
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.9 no.4
- /
- pp.945-951
- /
- 2008
In this paper, a new high performance data cache scheme that improves exploitation of both the spatial and temporal locality is proposed. The proposed data cache consists of a hardware prefetch unit and two sub-caches such as a direct-mapped (DM) cache with a large block size and a fully associative buffer with a small block size. Spatial locality is exploited by fetching and storing large blocks into a direct mapped cache, and is enhanced by prefetching a neighboring block when a DM cache hit occurs. Temporal locality is exploited by storing small blocks from the DM cache in the fully associative buffer according to their activity in the DM cache when they are replaced. Experimental results on Spec2000 programs show that the proposed scheme can reduce the average miss ratio by $12.53%\sim23.62%$ and the AMAT by $14.67%\sim18.60%$ compared to the previous schemes such as direct mapped cache, 4-way set associative cache and SMI(selective mode intelligent) cache[8].
https://doi.org/10.5762/KAIS.2008.9.4.945 인용 PDF

Cache Sensitive T-tree Main Memory Index for Range Query Search (범위질의 검색을 위한 캐시적응 T-트리 주기억장치 색인구조)

Choi, Sang-Jun;Lee, Jong-Hak
- Journal of Korea Multimedia Society
- /
- v.12 no.10
- /
- pp.1374-1385
- /
- 2009
Recently, advances in speed of the CPU have for out-paced advances in memory speed. Main-memory access is increasingly a performance bottleneck for main-memory database systems. To reduce memory access speed, cache memory have incorporated in the memory subsystem. However cache memories can reduce the memory speed only when the requested data is found in the cache. We propose a new cache sensitive T-tree index structure called as $CST^*$-tree for range query search. The $CST^*$-tree reduces the number of cache miss occurrences by loading the reduced internal nodes that do not have index entries. And it supports the sequential access of index entries for range query by connecting adjacent terminal nodes and internal index nodes. For performance evaluation, we have developed a cost model, and compared our $CST^*$-tree with existing CST-tree, that is the conventional cache sensitive T-tree, and $T^*$-tree, that is conventional the range query search T -tree, by using the cost model. The results indicate that cache miss occurrence of $CST^*$-tree is decreased by 20~30% over that of CST-tree in a single value search, and it is decreased by 10~20% over that of $T^*$-tree in a range query search.
PDF

Search Result 372, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)