Search | Korea Science

Program Cache Busy Time Control Method for Reducing Peak Current Consumption of NAND Flash Memory in SSD Applications

Park, Se-Chun;Kim, You-Sung;Cho, Ho-Youb;Choi, Sung-Dae;Yoon, Mi-Sun;Kim, Tae-Yun;Park, Kun-Woo;Park, Jongsun;Kim, Soo-Won
- ETRI Journal
- /
- v.36 no.5
- /
- pp.876-879
- /
- 2014
In current NAND flash design, one of the most challenging issues is reducing peak current consumption (peak ICC), as it leads to peak power drop, which can cause malfunctions in NAND flash memory. This paper presents an efficient approach for reducing the peak ICC of the cache program in NAND flash memory - namely, a program Cache Busy Time (tPCBSY) control method. The proposed tPCBSY control method is based on the interesting observation that the array program current (ICC2) is mainly decided by the bit-line bias condition. In the proposed approach, when peak ICC2 becomes larger than a threshold value, which is determined by a cache loop number, cache data cannot be loaded to the cache buffer (CB). On the other hand, when peak ICC2 is smaller than the threshold level, cache data can be loaded to the CB. As a result, the peak ICC of the cache program is reduced by 32% at the least significant bit page and by 15% at the most significant bit page. In addition, the program throughput reaches 20 MB/s in multiplane cache program operation, without restrictions caused by a drop in peak power due to cache program operations in a solid-state drive.
https://doi.org/10.4218/etrij.14.0213.0537 인용 PDF KSCI KPUBS

Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing

Huangfu, Yijie;Zhang, Wei
- Journal of Computing Science and Engineering
- /
- v.11 no.2
- /
- pp.69-77
- /
- 2017
Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU programs, GPGPU programs typically have considerably less temporal/spatial locality. Moreover, the L1 data cache is used by many threads that access a data size typically considerably larger than the L1 cache, making it critical to bypass L1 data cache intelligently to enhance GPU cache performance. In this paper, we examine GPU cache access behavior and propose a simple hardware-based GPU cache bypassing method that can be applied to GPU applications without recompiling programs. Moreover, we introduce a hybrid method that integrates static profiling information and hardware-based bypassing to further enhance performance. Our experimental results reveal that hardware-based cache bypassing can boost performance for most benchmarks, and the hybrid method can achieve performance comparable to state-of-the-art compiler-based bypassing with considerably less profiling cost.
https://doi.org/10.5626/JCSE.2017.11.2.69 인용 PDF KSCI

CPC: A File I/O Cache Management Policy for Compute-Bound Workloads

Bahn, Hyokyung
- International journal of advanced smart convergence
- /
- v.11 no.2
- /
- pp.1-6
- /
- 2022
With the emergence of the new era of the 4th industrial revolution, compute-bound workloads with large memory footprint like big data processing increase dramatically. Even in such compute-bound workloads, however, we observe bulky I/Os while loading big data from storage to memory. Although file I/O cache plays a role of accelerating the performance of storage I/O, we found out that the cache hit rate in such environments is not improved even though we increase the file I/O cache capacity because of some special I/O references generated by compute-bound workloads. To cope with this situation, we propose a new file I/O cache management policy that improves the cache hit rate for compute-bound workloads significantly. Trace-driven simulations by replaying file I/O reference logs of compute-bound workloads show that the proposed cache management policy improves the cache hit rate compared to the well-acknowledged CLOCK algorithm by a large margin.
https://doi.org/10.7236/IJASC.2022.11.2.1 인용 PDF KSCI

Performance Analysis of Cache and Internal Memory of a High Performance DSP for an Optimal Implementation of Motion Picture Encoder (고성능 DSP에서 동영상 인코더의 최적화 구현을 위한 캐쉬 및 내부 메모리 성능 분석)

Lim, Se-Hun;Chung, Sun-Tae
- The Journal of the Korea Contents Association
- /
- v.8 no.5
- /
- pp.72-81
- /
- 2008
High Performance DSP usually supports cache and internal memory. For an optimal implementation of a multimedia stream application on such a high performance DSP, one needs to utilize the cache and internal memory efficiently. In this paper, we investigate performance analysis of cache, and internal memory configuration and placement necessary to achieve an optimal implementation of multimedia stream applications like motion picture encoder on high performance DSP, TMS320C6000 series, and propose strategies to improve performance for cache and internal memory placement. From the results of analysis and experiments, it is verified that 2-way L2 cache configuration with the remaining memory configured as internal memory shows relatively good performance. Also, it is shown that L1P cache hit rate is enhanced when frequently called routines and routines having caller-callee relationships with them are continuously placed in the internal memory and that L1D cache hit rate is enhanced by the simple change of the data size. The results in the paper are expected to contribute to the optimal implementation of multimedia stream applications on high performance DSPs.
https://doi.org/10.5392/JKCA.2008.8.5.072 인용 PDF

A Study on Direct Cache-to-Cache Transfer for Hybrid Cache Architecture to Reduce Write Operations (쓰기 횟수 감소를 위한 하이브리드 캐시 구조에서의 캐시간 직접 전송 기법에 대한 연구)

Juhee Choi
- Journal of the Semiconductor & Display Technology
- /
- v.23 no.1
- /
- pp.65-70
- /
- 2024
Direct cache-to-cache transfer has been studied to reduce the latency and bandwidth consumption related to the shared data in multiprocessor system. Even though these studies lead to meaningful results, they assume that caches consist of SRAM. For example, if the system employs the non-volatile memory, the one of the most important parts to consider is to decrease the number of write operations. This paper proposes a hybrid write avoidance cache coherence protocol that considers the hybrid cache architecture. A new state is added to finely control what is stored in the non-volatile memory area, and experimental results showed that the number of writes was reduced by about 36% compared to the existing schemes.
PDF

Design of an Asynchronous Data Cache with FIFO Buffer for Write Back Mode (Write Back 모드용 FIFO 버퍼 기능을 갖는 비동기식 데이터 캐시)

Park, Jong-Min;Kim, Seok-Man;Oh, Myeong-Hoon;Cho, Kyoung-Rok
- The Journal of the Korea Contents Association
- /
- v.10 no.6
- /
- pp.72-79
- /
- 2010
In this paper, we propose the data cache architecture with a write buffer for a 32bit asynchronous embedded processor. The data cache consists of CAM and data memory. It accelerates data up lood cycle between the processor and the main memory that improves processor performance. The proposed data cache has 8 KB cache memory. The cache uses the 4-way set associative mapping with line size of 4 words (16 bytes) and pseudo LRU replacement algorithm for data replacement in the memory. Dirty register and write buffer is used for write policy of the cache. The designed data cache is synthesized to a gate level design using $0.13-{\mu}m$ process. Its average hit rate is 94%. And the system performance has been improved by 46.53%. The proposed data cache with write buffer is very suitable for a 32-bit asynchronous processor.
https://doi.org/10.5392/JKCA.2010.10.6.072 인용 PDF KSCI

Extended Pairing Heap Algorithms Considering Cache Effect (캐쉬 효과를 고려한 확장된 Pairing Heap 알고리즘)

정균락;김경훈
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.5_6
- /
- pp.250-257
- /
- 2003
As the memory access time becomes slower relative to the fast processor speed, most systems use cache memory to reduce the gap. The cache performance has an increasingly large impact on the performance of algorithms. Blocking is the well known method to utilize cache and has shown good results in multiplying matrices and search trees like d-heap. But if we use blocking in the data structures which require rotation during insertion or deletion, the execution time increases as the data movements between blocks are necessary. In this paper, we have proposed the extended pairing heap algorithms using block node and shown by experiments that our structure is superior Also in case of using block node, we use less memory space as the number of pointers decreases.
PDF KSCI

A Cache Controller to Maximize Effectiveness of Hierarchical Memory Architecture (계층적 메모리 구조의 효과를 극대화하는 캐시 제어기)

Uh Bong Yong;Ju Young Kwan;Cheon Joong Nam;Kim Suk Il
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.608-616
- /
- 2005
A cache architecture is proposed here which evokes prefetch at level 1 cache miss. Existing structures only prefetch at level 2 cache miss. In the proposed cache architecture, level 1 cache miss would select demand fetch block and prefetch block from the level 2 cache and store to level 1 cache and prefetch cache, respectively. According to an experimental analysis using 11 benchmark programs, the hierarchical cache architecture that employs both a level 1 cache prefetcher and a level 2 cache prefetcher obtained a maximum $19\%$ increased performance when compared to the cache architecture that employs only a level 2 cache prefetcher.
PDF KSCI

High Performance Data Cache Memory Architecture (고성능 데이터 캐시 메모리 구조)

Kim, Hong-Sik;Kim, Cheong-Ghil
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.9 no.4
- /
- pp.945-951
- /
- 2008
In this paper, a new high performance data cache scheme that improves exploitation of both the spatial and temporal locality is proposed. The proposed data cache consists of a hardware prefetch unit and two sub-caches such as a direct-mapped (DM) cache with a large block size and a fully associative buffer with a small block size. Spatial locality is exploited by fetching and storing large blocks into a direct mapped cache, and is enhanced by prefetching a neighboring block when a DM cache hit occurs. Temporal locality is exploited by storing small blocks from the DM cache in the fully associative buffer according to their activity in the DM cache when they are replaced. Experimental results on Spec2000 programs show that the proposed scheme can reduce the average miss ratio by $12.53%\sim23.62%$ and the AMAT by $14.67%\sim18.60%$ compared to the previous schemes such as direct mapped cache, 4-way set associative cache and SMI(selective mode intelligent) cache[8].
https://doi.org/10.5762/KAIS.2008.9.4.945 인용 PDF

Performance Analysis of Adaptive Partition Cache Replacement using Various Monitoring Ratios for Non-volatile Memory Systems

Hwang, Sang-Ho;Kwak, Jong Wook
- Journal of the Korea Society of Computer and Information
- /
- v.23 no.4
- /
- pp.1-8
- /
- 2018
In this paper, we propose an adaptive partition cache replacement policy and evaluate the performance of our scheme using various monitoring ratios to help lifetime extension of non-volatile main memory systems without performance degradation. The proposal combines conventional LRU (Least Recently Used) replacement policy and Early Eviction Zone (E2Z), which considers a dirty bit as well as LRU bits to select a candidate block. In particular, this paper shows the performance of non-volatile memory using various monitoring ratios and determines optimized monitoring ratio and partition size of E2Z for reducing the number of writebacks using cache hit counter logic and hit predictor. In the experiment evaluation, we showed that 1:128 combination provided the best results of writebacks and runtime, in terms of performance and complexity trade-off relation, and our proposal yielded up to 42% reduction of writebacks, compared with others.
https://doi.org/10.9708/jksci.2018.23.04.001 인용 PDF KSCI

Search Result 412, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)