Search | Korea Science

Improved Cache-hot Page Allocation Technique for Reducing Page Initialization Latency of Linux Based Systems (리눅스 기반 시스템의 페이지 초기화 지연 단축을 위한 향상된 캐시-핫 페이지 할당 기법)

Yang, Seokwoo;Noh, Sunhyeon;Hong, Seongsoo
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2019.01a
- /
- pp.415-418
- /
- 2019
최근 사용자 대화형(user-interactive) 응용들은 OS에게 많은 양의 메모리를 빈번하게 요구한다는 특징을 보인다. 응용의 메모리 할당 요청이 발생하면 OS는 할당할 페이지의 초기화 작업을 필수적으로 수행하는데, 빈번하게 발생하는 페이지 초기화 작업이 응용의 성능을 저하시키고 있다. 기존 리눅스 기반 시스템은 페이지 초기화 지연을 단축하기 위해 CPU의 캐시에 매핑되어 있어서 초기 값을 빠르게 쓸 수 있는 페이지인 캐시-핫(cache-hot) 페이지를 우선적으로 할당한다. 하지만 기존 리눅스는 각 코어별로 캐시-핫 페이지를 인식하고 관리하며, 다른 코어가 관리하는 캐시-핫 페이지에는 접근할 수 없다. 이러한 정책 때문에 다른 코어가 공유 캐시(shared cache)에 매핑된 캐시-핫 페이지를 관리하고 있더라도, 이를 할당받지 못하고 캐시-콜드(cache-cold) 페이지를 할당받는 경우가 발생한다. 본 논문에서는 공유 캐시에 매핑된 것으로 추정되는 캐시-핫 페이지를 별도로 인식하고 공유 캐시에 매핑된 것으로 추정되는 캐시-핫 페이지를 모든 코어가 활용할 수 있게 하여, 응용이 캐시-핫 페이지를 할당받을 확률을 기존 기법보다 높이는 향상된 캐시-핫 페이지 할당 기법을 제안한다. 제안된 기법은 페이지 할당 요청이 발생하면 먼저 각 코어의 사유 캐시에 매핑된 것으로 추정되는 캐시-핫 페이지를 우선적으로 할당하고, 할당에 실패하면 공유 캐시에 매핑된 것으로 추정되는 캐시-핫 페이지를 할당한다. 이를 통해 캐시-핫 페이지를 할당받을 확률을 기존 기법보다 높이고, 결과적으로 평균 페이지 초기화 지연을 단축한다. 제안된 기법을 리눅스 커널 4.18.10버전 기반 환경에서 구현하여 실험한 결과, 평균 페이지 초기화 지연이 기존 리눅스 시스템과 비교하여 약 7% 단축되었다.
PDF

Mapping Cache for High-Performance Memory Mapped File I/O in Memory File Systems (메모리 파일 시스템 기반 고성능 메모리 맵 파일 입출력을 위한 매핑 캐시)

Kim, Jiwon;Choi, Jungsik;Han, Hwansoo
- Journal of KIISE
- /
- v.43 no.5
- /
- pp.524-530
- /
- 2016
The desire to access data faster and the growth of next-generation memories such as non-volatile memories, contribute to the development of research on memory file systems. It is recommended that memory mapped file I/O, which has less overhead than read-write I/O, is utilized in a high-performance memory file system. Memory mapped file I/O, however, brings a page table overhead, which becomes one of the big overheads that needs to be resolved in the entire file I/O performance. We find that same overheads occur unnecessarily, because a page table of a file is removed whenever a file is opened after being closed. To remove the duplicated overhead, we propose the mapping cache, a technique that does not delete a page table of a file but saves the page table to be reused when the mapping of the file is released. We demonstrate that mapping cache improves the performance of traditional file I/O by 2.8x and web server performance by 12%.
https://doi.org/10.5626/JOK.2016.43.5.524 인용 KSCI

Dynamic Cache Management Scheme on Demand-Based FTL Considering Data Access Pattern (데이터 접근 패턴을 고려한 요구 기반 FTL 내 캐시의 동적 관리 기법)

Lee, Bit-Na;Song, Nae-Young;Koh, Kern
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06a
- /
- pp.547-550
- /
- 2011
플래시 메모리는 낮은 전력 소비와 높은 성능으로 인해 휴대용 기기에 널리 사용되고 있다. FTL은 플래시 내 자료를 관리하는 소프트웨어 계층으로 플래시 전체의 성능에 영향을 끼친다. 그 중 페이지 레벨 매핑 기법을 적용한 FTL은 유연성이 높고 속도가 빠르나 주소 변환 테이블의 크기가 큰 단점이 있다. 이를 해결하기 위해 자주 접근되는 영역의 매핑 주소만을 매핑 테이블 캐시에 올려놓는 Demand-based FTL(DFTL)이 제안되었다. DFTL 에서는 CMT(Cache Mapping Table)의 참조율이 떨어지는 경우 빈번한 플래시 메모리 접근 오버헤드가 발생하게 된다. 이러한 문제는 흔히 발생하는 일반적인 순차 접근에서조차 문제가 된다. 이에 본 논문에서는 저장 장치의 접근 패턴을 예측하여 CMT의 참조 엔트리를 미리 읽어오는 기법을 제안한다. 제안하는 기법은 저장 장치 접근 패턴의 순차성을 판단하여 연속된 매핑 주소를 미리 CMT에 올려놓고, 읽어오는 매핑 주소 엔트리의 양은 동적으로 관리한다. 추가적으로 CMT에서 발생하는 스래싱(thrashing) 을 파악하기 위해 쫓겨나는 희생 엔트리의 접근 여부를 분석하여 이를 활용하였다. 실험 결과에서 본 기법은 기존의 DFTL에 비해 약간의 공간 오버헤드와 함께 평균 50% 증가한 참조율을 보였다.

A Multi-Level Flash Translation Layer for Large Capacity Solid State Drives

Kim, Yong-Seok
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.2
- /
- pp.11-18
- /
- 2021
The flash translation layer(FTL) of SSD maps the logical page number requested from the host to the actual recorded flash memory page number. It is very important to reduce the amount of RAM used to manage the mapping information. In the existing demand-based FTLs, two-level method is applied in which mapping information is also recorded in flash memory pages and only their addresses are managed as a table in RAM. As the capacities of SSDs are growing to tens of terabytes, the amount of RAM for mapping table becomes too large. In this paper, ML-FTL was proposed as a method of managing mapping information in three levels to reduce the amount of RAM required drastically. From an evaluation, the increase in overhead was minimal compared to the conventional two-level method by properly utilizing cache.
https://doi.org/10.9708/jksci.2021.26.02.011 인용 PDF KSCI HTML

Analysis and Advice on Cache Algorithms of SSD FTL (SSD FTL 캐시 알고리즘 분석 및 제언)

Hyung Bong, Lee;Tae Yun, Chung
- KIPS Transactions on Computer and Communication Systems
- /
- v.12 no.1
- /
- pp.1-8
- /
- 2023
It is impossible to overwrite on an already allocated page in SSDs, so whenever a write operation occurs a page replacement with a clean page is required. To resolve this problem, SSDs have an internal flash translation layer called FTL that maps logical pages managed by a file system of operating system to currently allocated physical pages. SSD pages discarded due to write operations must be recycled through initialization, but since the number of initialization times is limited the FTL provides a caching function to reduce the number of writes in addition to the page mapping function, which is a core function. In this study, we focus on the FTL cache methodologies reducing the number of page writes and analyze the related algorithms, and propose a write-only cache strategy. As a result of experimenting with the write-only cache using a simulator, it showed an improvement of up to 29%.
https://doi.org/10.3745/KTCCS.2023.12.1.1 인용 PDF

An Address Translation Technique Large NAND Flash Memory using Page Level Mapping (페이지 단위 매핑 기반 대용량 NAND플래시를 위한 주소변환기법)

Seo, Hyun-Min;Kwon, Oh-Hoon;Park, Jun-Seok;Koh, Kern
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.3
- /
- pp.371-375
- /
- 2010
SSD is a storage medium based on NAND Flash memory. Because of its short latency, low power consumption, and resistance to shock, it's not only used in PC but also in server computers. Most SSDs use FTL to overcome the erase-before-overwrite characteristic of NAND flash. There are several types of FTL, but page mapped FTL shows better performance than others. But its usefulness is limited because of its large memory footprint for the mapping table. For example, 64MB memory space is required only for the mapping table for a 64GB MLC SSD. In this paper, we propose a novel caching scheme for the mapping table. By using the mapping-table-meta-data we construct a fully associative cache, and translate the address within O(1) time. The simulation results show more than 80 hit ratio with 32KB cache and 90% with 512KB cache. The overall memory footprint was only 1.9% of 64MB. The time overhead of cache miss was measured lower than 2% for most workload.
PDF KSCI

Power Aware Suffer Cache (저전력 버퍼 캐시)

Lee, Min;Seo, Eui-Seong;Lee, Joon-Won
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.07a
- /
- pp.766-768
- /
- 2005
컴퓨팅 환경이 무선과 휴대용 시스템으로 변화하면서, 전력효율이 점점 중요해지고 있다. 특히 내장형 시스템일 경우에 더욱 그러한데 이중 메모리에서 소모되는 전력이 전체 전력소모의 두 번째 큰 요소가 되고 있다. 메모리 시스템에서의 전력소모를 줄이기 위해서 DRAM의 저전력 모드인 냅모드(nap mode)를 활용할 수 있다. 냅모드는 액티브 모드(active mode)일 때의 $28\%$의 전력만을 소모한다. 하지만 하드웨어 컨트롤러는 운영체제가 협조하지 않으면 이 기능을 효율적으로 활용하지 못한다. 이 논문에서는 DRAM의 액티브 유닛(active unit)의 수를 최소화하는 방법에 초점을 맞춘다. 운영체제는 참조되지 않는 메모리를 냅모드에 놓음으로써 최소한의 유닛들만을 액티브 모드에 놓아 프로그램이 수행될 수 있도록 피지컬(physical) 페이지들을 할당한다. 이것은 PAVM(Power Aware Virtual Memory) 연구의 일반화된 시스템 전반에 대한 연구라고 할 수 있다. 우리는 모든 피지컬 메모리를 고려하고 있으며, 특히 평균적으로 전체 메모리의 절반을 사용하는 버퍼 캐시를 고려하고 있다. 버퍼 캐시의 용량과 그 중요성 때문에 PAVM 방식은 버퍼 캐시를 고려하지 않고는 완전한 해법이 되지 못한다. 이 논문에서 우리는 메모리의 사용처를 분석하고 저전력 페이지 할당 정책을 제안한다. 특히 프로세스의 주소공간에 매핑(mapping)된 페이지들과 버퍼 캐시가 고려된다. 이 두 종류의 페이지들간의 상호작용과 그 관계를 분석하고 저전력을 위해 이러한 관계를 이용한다.
PDF

Dynamic Prefetch Filtering Schemes to enhance Utilization of Data Cache (데이타 캐시의 활용도를 높이는 동적 선인출 필터링 기법)

Chon, Young-Suk;Kim, Suk-Il;Jeon, Joong-Nam
- Journal of KIISE:Computer Systems and Theory
- /
- v.35 no.1
- /
- pp.30-43
- /
- 2008
Memory reference instructions such as loads or stores are critical factors that limit the processing power of processor. The prefetching technique is an effective way to reduce the latency caused from memory access. However, excessively aggressive prefetch leads to cache pollution so as to cancel out the advantage of prefetch. In this study, four filtering schemes have been compared and evaluated which dynamically decide whether to begin prefetch after referring a filtering table to decrease cache pollution. First, A bi-states scheme has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete state scheme has been introduced to be used as a reference for the comparative study. A block address lookup scheme has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the bi-states scheme, the contents of each entry have the fields the same as the complete state scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. Experimental results from commonly used general benchmarks and multimedia programs show that average cache miss ratio have been decreased by 10.5% for the block address lookup scheme(BAL) compare to conventional dynamic filter scheme(2-bitSC).
PDF KSCI

Design of an Asynchronous Data Cache with FIFO Buffer for Write Back Mode (Write Back 모드용 FIFO 버퍼 기능을 갖는 비동기식 데이터 캐시)

Park, Jong-Min;Kim, Seok-Man;Oh, Myeong-Hoon;Cho, Kyoung-Rok
- The Journal of the Korea Contents Association
- /
- v.10 no.6
- /
- pp.72-79
- /
- 2010
In this paper, we propose the data cache architecture with a write buffer for a 32bit asynchronous embedded processor. The data cache consists of CAM and data memory. It accelerates data up lood cycle between the processor and the main memory that improves processor performance. The proposed data cache has 8 KB cache memory. The cache uses the 4-way set associative mapping with line size of 4 words (16 bytes) and pseudo LRU replacement algorithm for data replacement in the memory. Dirty register and write buffer is used for write policy of the cache. The designed data cache is synthesized to a gate level design using $0.13-{\mu}m$ process. Its average hit rate is 94%. And the system performance has been improved by 46.53%. The proposed data cache with write buffer is very suitable for a 32-bit asynchronous processor.
https://doi.org/10.5392/JKCA.2010.10.6.072 인용 PDF KSCI

Texture Cache with Automatical Index Splitting Based on Texture Size (텍스처의 크기에 따라 인덱스를 자동 분할하는 텍스처 캐시)

Kim, Jin-Woo;Park, Young-Jin;Kim, Young-Sik;Han, Tack-Don
- Journal of Korea Game Society
- /
- v.8 no.2
- /
- pp.57-68
- /
- 2008
Texture Mapping is a technique for adding realism to an image in 3D graphics Chip. Bilinear filtering mode of this technique needs accesses of 4 texels to process one pixel. In this paper we analyzed the access pattern of texture, and proposed the high performance texture cache which can access 4 texels simultaneously. We evaluated using simulation results of 3D game(Quake 3, Unreal Tournament 2004). Simulation results show that proposed texture cache has high performance on the case where physical size is less then or equal 8KBytes.
PDF

Search Result 19, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)