Search | Korea Science

Multi-Strided Prefetching Using Adjacent Region Table (인접 영역 테이블을 이용한 다중 간격 프리페치 기법)

Shim, Jae-Seong;Jun, Ho-Yoon;Lee, Yong-Surk
- Annual Conference of KIPS
- /
- 2014.11a
- /
- pp.37-40
- /
- 2014
프로세서와 메모리 간의 속도 차이로 인해 메모리 시스템의 성능 향상이 프로세서의 성능을 높이기 위한 중요한 요인이 되었고, 이를 위해 캐시 미스율을 감소시키는 방법이 연구되고 있다. 데이터 프리페치는 캐시의 미스율을 감소시키는 기법 중 하나이며 실제로 최근 프로세서에서 메모리 시스템의 성능을 향상시키기 위해 사용된다. 데이터 프리페치를 효과적으로 수행하기 위해서 메모리 주소의 접근 패턴을 파악하는 것이 중요하며, 이를 위해 순차적으로 접근하는 경우, 한 종류의 1 보다 크거나 같은 간격(stride)으로 뛰면서 접근하는 경우, 다수의 간격이 규칙적으로 반복되며 접근하는 경우 등의 다양한 패턴을 찾는 프리페치 기법들이 등장했다. 본 논문에서 소개하는 다중 간격 프리페치의 경우, 메모리 공간을 메모리 주소의 일부 상위 비트를 통해 여러 개의 영역으로 나누고, 하나의 패턴을 하나의 영역 안에서만 학습하여, 다른 영역에 속한 메모리 주소 접근 시 현재 학습하는 패턴에 어긋나는 주소라고 여기기 때문에 학습을 방해하지 않도록 하였다. 그러나 이 방법은 영역의 크기보다 같은 패턴을 갖는 메모리 주소 스트림의 크기가 더 클 때, 접근 주소의 영역이 바뀜으로 인해 불필요한 학습을 추가적으로 해야 하는 문제점이 있다. 이에 본 논문에서 인접 영역 테이블(ART: Adjacent Region Table)을 이용하여 같은 패턴을 갖는 메모리 접근 스트림의 크기가 영역의 크기보다 클 경우, 기존의 학습된 패턴대로 프리페치를 수행할 수 있도록 하였다. 본 논문에서 제안한 알고리즘으로 실험한 결과, 기존의 다중 간격 프리페치보다 캐시 미스율을 약 6.7% 낮췄고, 시스템 전체의 성능의 지표인 IPC의 경우, 약 5.78% 높아지는 성능 향상의 결과를 얻었다.
https://doi.org/10.3745/PKIPS.y2014m11a.37 인용 PDF

A Policy of Page Management Using Double Cache for NAND Flash Memory File System (NAND 플래시 메모리 파일 시스템을 위한 더블 캐시를 활용한 페이지 관리 정책)

Park, Myung-Kyu;Kim, Sung-Jo
- Journal of KIISE:Computer Systems and Theory
- /
- v.36 no.5
- /
- pp.412-421
- /
- 2009
Due to the physical characteristics of NAND flash memory, overwrite operations are not permitted at the same location, and therefore erase operations are required prior to rewriting. These extra operations cause performance degradation of NAND flash memory file system. Since it also has an upper limit to the number of erase operations for a specific location, frequent erases should reduce the lifetime of NAND flash memory. These problems can be resolved by delaying write operations in order to improve I/O performance: however, it will lower the cache hit ratio. This paper proposes a policy of page management using double cache for NAND flash memory file system. Double cache consists of Real cache and Ghost cache to analyze page reference patterns. This policy attempts to delay write operations in Ghost cache to maintain the hit ratio in Real cache. It can also improve write performance by reducing the search time for dirty pages, since Ghost cache consists of Dirty and Clean list. We find that the hit ratio and I/O performance of our policy are improved by 20.57% and 20.59% in average, respectively, when comparing them with the existing policies. The number of write operations is also reduced by 30.75% in average, compared with of the existing policies.
PDF KSCI

Performance Analysis is of Clean Block First Replacement Scheme for Non-volatile Memory Based Storage Devices (비휘발성 메모리 기반 저장장치를 위한 클린 블록 우선 교체 기법의 성능 분석)

Yang, Soo-Hyun;Ryu, Yeonseung
- Annual Conference of KIPS
- /
- 2012.04a
- /
- pp.151-154
- /
- 2012
최근 차세대 저장장치로서 비휘발생 플래시 메모리 기반 저장장치의 사용이 층가하고 있다. 본 논문에서는 플래시 메모리 기반 저장장치의 특생인 삭제 연산의 문제점을 고려하는 새로운 버퍼 캐시 교체 기법을 연구하였다. 제안한 클린 블록 우선 (Clean Block First) 기법은 버퍼를 플래시 메모리의 삭제 블록의 리스트로 관리하고 클린 페이지를 가진 블록을 우선적으로 교체하여 플래시 메모리의 삭제 연산 횟수를 줄인다. 트레이스 기반의 시뮬레이션을 수행하여 교체를 위해 검색하는 클린 블록 개수의 변화에 대한 캐시 적중률과 삭제 연산 횟수를 분석하였다.
https://doi.org/10.3745/PKIPS.y2012m04a.151 인용 PDF

Peducing the Overhead of Virtual Address Translation Process (가상주소 변환 과정에 대한 부담의 줄임)

U, Jong-Jeong
- The Transactions of the Korea Information Processing Society
- /
- v.3 no.1
- /
- pp.118-126
- /
- 1996
Memory hierarchy is a useful mechanism for improving the memory access speed and making the program space larger by layering the memories and separating program spaces from memory spaces. However, it needs at least two memory accesses for each data reference : a TLB(Translation Lookaside Buffer) access for the address translation and a data cache access for the desired data. If the cache size increases to the multiplication of page size and the cache associativity, it is difficult to access the TLB with the cache in parallel, thereby making longer the critical timing path in the processor. To achieve such parallel accesses, we present the hybrid mapped TLB which combines a direct mapped TLB with a very small fully-associative mapped TLB. The former can reduce the TLB access time. while the latter removes the conflict misses from the former. The trace-driven simulation shows that under given workloads the proposed TLB is effective even when a fully-associative mapped TLB with only four entries is added because the effects of its increased misses are offset by its speed benefits.
PDF

High-Performance FFT Using Data Reorganization (데이터 재구성 기법을 이용한 고성능 FFT)

Park Neungsoo;Choi Yungho
- The KIPS Transactions:PartA
- /
- v.12A no.3 s.93
- /
- pp.215-222
- /
- 2005
The efficient utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in computation of large DFTs causes cache conflict misses, thereby resulting in poor cache performance. It leads to a severe degradation in overall performance. In this paper, we propose a dynamic data layout approach considering the memory hierarchy system. In our approach, data reorganization is performed between computation stages to reduce the number of cache misses. Also, we develop an efficient search algorithm to determine the optimal tree with the minimum execution time among possible factorization trees considering the size of DFTs and the data access stride. Our approach is applied to compute the fast Fourier Transform (FFT). Experiments were performed on Pentium 4, $Athlon^{TM}$ 64, Alpha 21264, UtraSPARC III. Experiment results show that our FFT achieve performance improvement of up to 3.37 times better than the previous FFT packages.
https://doi.org/10.3745/KIPSTA.2005.12A.3.215 인용 PDF KSCI

Radiation-Induced Soft Error Detection Method for High Speed SRAM Instruction Cache (고속 정적 RAM 명령어 캐시를 위한 방사선 소프트오류 검출 기법)

Kwon, Soon-Gyu;Choi, Hyun-Suk;Park, Jong-Kang;Kim, Jong-Tae
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.35 no.6B
- /
- pp.948-953
- /
- 2010
In this paper, we propose multi-bit soft error detection method which can use an instruction cache of superscalar CPU architecture. Proposed method is applied to high-speed static RAM for instruction cache. Using 1D parity and interleaving, it has less memory overhead and detects more multi-bit errors comparing with other methods. It only detects occurrence of soft errors in static RAM. Error correction is treated like a cache miss situation. When soft errors are occurred, it is detected by 1D parity. Instruction cache just fetch the words from lower-level memory to correct errors. This method can detect multi-bit errors in maximum 4$\times$4 window.
PDF KSCI

Cache Memory and Replacement Algorithm Implementation and Performance Comparison

Park, Na Eun;Kim, Jongwan;Jeong, Tae Seog
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.3
- /
- pp.11-17
- /
- 2020
In this paper, we propose practical results for cache replacement policy by measuring cache hit and search time for each replacement algorithm through cache simulation. Thus, the structure of each cache memory and the four types of alternative policies of FIFO, LFU, LRU and Random were implemented in software to analyze the characteristics of each technique. The paper experiment showed that the LRU algorithm showed hit rate and search time of 36.044% and 577.936ns in uniform distribution, 45.636% and 504.692ns in deflection distribution, while the FIFO algorithm showed similar performance to the LRU algorithm at 36.078% and 554.772ns in even distribution and 45.662% and 489.574ns in bias distribution. Then LFU followed, Random algorithm was measured at 30.042% and 622.866ns at even distribution, 36.36% at deflection distribution and 553.878ns at lowest performance. The LRU replacement method commonly used in cache memory has the complexity of implementation, but it is the most efficient alternative to conventional alternative algorithms, indicating that it is a reasonable alternative method considering the reference information of data.
https://doi.org/10.9708/jksci.2020.25.03.011 인용 PDF KSCI

Performance Analysis of Flash Memory SSD with Non-volatile Cache for Log Storage (비휘발성 캐시를 사용하는 플래시 메모리 SSD의 데이터베이스 로깅 성능 분석)

Hong, Dae-Yong;Oh, Gi-Hwan;Kang, Woon-Hak;Lee, Sang-Won
- Journal of KIISE
- /
- v.42 no.1
- /
- pp.107-113
- /
- 2015
In a database system, updates on pages that are made by a transaction should be stored in a secondary storage before the commit is complete. Generic secondary storages have volatile DRAM caches to hide long latency for non-volatile media. However, as logs that are only written to the volatile DRAM cache don't ensure durability, logging latency cannot be hidden. Recently, a flash SSD with capacitor-backed DRAM cache was developed to overcome the shortcoming. Storage devices, like those with a non-volatile cache, will increase transaction throughput because transactions can commit as soon as the logs reach the cache. In this paper, we analyzed performance in terms of transaction throughput when the SSD with capacitor-backed DRAM cache was used as log storage. The transaction throughput can be improved over three times, by committing right after storing the logs to the DRAM cache, rather than to a secondary storage device. Also, we showed that it could acquire over 73% of the ideal logging performance with proper tuning.
https://doi.org/10.5626/JOK.2015.42.1.107 인용 KSCI

Performance Improvement of Meta-search Scheme for Comparison Shopping Sites using Memory Cache (메모리 캐시를 이용한 비교 쇼핑 사이트들에 대한 메타검색의 성능 향상)

조강의;조성제;우진운
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.10c
- /
- pp.718-720
- /
- 2001
최근 비교 쇼핑 에이전트 기술을 적용하여 소비자가 원하는 상품을 최적의 가격으로 구매할 수 있도록 여러 쇼핑몰들의 상품 정보를 검색해 주는 비교 쇼핑 사이트들이 등장하고 있다. 이러한 비교 쇼핑 사이트들이 경우에 따라서 최적의 가격을 제시해 주지 못하고, 소비자가 원하는 상품에 대한 가격 비교가 쉽지 않기 때문에 실시간 검색 에이전트를 이용하는 베타검색이 제안되었다. 이 방법은 상품 검색에서의 신뢰도는 높였지만 시스템의 성능 면에서는 좋은 효율성을 보이지 않았다. 본 논문에서는 데이터베이스와 메모리 캐시 공간을 이용함으로써 성능이 향상된 베타검색을 사용하는 메타-비교 쇼핑 에이전트 시스템을 제안한다.
PDF

Optimizing Both Cache and Disk Performance of R-Trees (R-Tree를 위한 캐시와 디스크 성능 최적화)

박명선;이석호
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.04a
- /
- pp.749-751
- /
- 2003
R-Tree는 일반적으로 트리 노드의 크기를 디스크 페이지의 크기와 같게 함으로써 I/O 성능에 최적이 되도록 구현한다. 최근에는 CPU 캐시 성능을 최적화하는 R-Tree의 변형이 개발되었다. 이는 노드의 크기를 캐시 라인 크기의 수 배로 하고 MBR에 저장되는 키를 압축하여 노드 하나에 더 많은 엔트리를 저장함으로써 가능하였다. 그러나, 디스크 최적 R-Tree와 CPU 캐시 최적 R-Tree의 노드 크기 사이에는 수십-수백 바이트와 수-수십 킬로바이트라는 큰 차이가 있으므로, 디스크 최적 R-Tree는 캐시 성능이 나쁘고, CPU 캐시 최적 H-Tree는 나쁜 디스크 성능을 보이는 문제점을 가지고 있다. 이 논문에서는 CPU 캐시와 디스크에 모두 최적인 R-Tree. TR-Tree를 제안한다. 먼저, 디스크 페이지 안에 들어가는 페이지 내부 트리의 높이와 단말, 중간 노드의 크기를 결정하는 방법을 제시한다. 그리고, 이틀 이용하여 TR-Tree의 검색 연산에 필요한 캐시 미스 수를 최소화였고. TR-Tree의 검색 성능을 최적화하였다. 또한, 디스크 I/O 성능을 최적화하기 위해 메모리 노드들을 디스크 페이지에 잘 맞게 배치하였다. 여기에서 구현한 TR-Tree는 디스크 최적 R-Tree보다 삽입 연산에서 6에서 28배 정도 빨랐으며, 검색 연산에서는 1.28배에서 2배의 성능 향상을 보였다.
PDF

Search Result 243, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)