• Title/Summary/Keyword: 캐시 메모리

Search Result 243, Processing Time 0.029 seconds

A Enhanced Set-Associative Page Cache Scheme using Pollute Buffer (오염 버퍼를 적용한 집합 연상 페이지 캐시 기법)

  • An, Deukhyeon;Kim, Jeehong;Eom, Young Ik
    • Annual Conference of KIPS
    • /
    • 2012.11a
    • /
    • pp.241-242
    • /
    • 2012
  • 큰 데이터 트래픽을 일으키는 I/O 작업을 수행할 경우에 많은 디스크 접근과 데이터 처리가 발생하며 이는 컴퓨팅 성능의 하락을 일으킨다. 이를 위해 메모리와 디스크 사이에 버퍼 역할을 하는 페이지 캐시 기법이 사용된다. 그러나 LRU 를 사용하는 페이지 캐시의 특성상, 많은 양의 데이터가 한번만 접근되고 다시 사용되지 않는다면 성능상의 큰 효과가 없다. 본 논문에서는 집합 연상 페이지 캐시에 오염 버퍼를 둠으로써, 재사용되지 못하고 페이지 캐시의 크기만 커지는 현상을 최소화시켜 I/O 성능을 개선시킬 수 있는 방법을 제안한다.

A Multimedia Data Prefetching Based on 2 Dimensional Block Structure (이차원 블록 구조에 근거한 선인출 기법)

  • Kim, Seok-Ju
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.8
    • /
    • pp.1086-1096
    • /
    • 2004
  • In case of a multimedia application which deals with streaming data, in terms of cache management, cache loses its efficiency due to weak temporal locality of the data. This means that when data have been brought into cache, much of the data are supposed to be replaced without being accessed again during its service. However, there is a good chance that such multimedia data has a commanding locality in it. In this paper, to take advantage of the memory reference regularity which typically innates even in the multimedia data showing up its weak temporal locality, a method is suggested. The suggested method with the feature of dynamic regular-stride reference prefetching can identify for 2-dimensional array format(block pattern). The suggested method is named as block-reference-prediction-technique (BRPT) since it identifies a block pattern and place an address to be prefetched by the regulation of the block format. BRPT proved to be reassuring to reduce memory reference time significantly for applications having abundant block patterns although new rule has complicated the prefetching system even further.

  • PDF

Design of a Parallel Rendering Processor Architecture with Effective Memory System (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계)

  • Park Woo-Chan;Yoon Duk-Ki;Kim Kyoung-Su
    • The KIPS Transactions:PartA
    • /
    • v.13A no.4 s.101
    • /
    • pp.305-316
    • /
    • 2006
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.

Compact Field Remapping for Dynamically Allocated Structures (동적으로 할당된 구조체를 위한 압축된 필드 재배치)

  • Kim, Jeong-Eun;Han, Hwan-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.1003-1012
    • /
    • 2005
  • The most significant difference of embedded systems from general purpose systems is that embedded systems are allowed to use only limited resources including battery and memory. Especially, the number of applications increases which deal with multimedia data. In those systems with high data computations, the delay of memory access is one of the major bottlenecks hurting the system performance. As a result, many researchers have investigated various techniques to reduce the memory access cost. Most programs generally have locality in memory references. Temporal locality of references means that a resource accessed at one point will be used again in the near future. Spatial locality of references is that likelihood of using a resource gets higher if resources near it were just accessed. The latest embedded processors usually adapt cache memory to exploit these two types of localities. Processors access faster cache memory than off-chip memory, reducing the latency. In this paper we will propose the enhanced dynamic allocation technique for structure-type data in order to eliminate unused memory space and to reduce both the cache miss rate and the application execution time. The proposed approach aggregates fields from multiple records dynamically allocated and consecutively remaps them on the memory space. Experiments on Olden benchmarks show $13.9\%$ L1 cache miss rate drop and $15.9\%$ L2 cache miss drop on average, compared to the previously proposed techniques. We also find execution time reduced by $10.9\%$ on average, compared to the previous work.

A Performance Improvement Scheme for a Wireless Internet Proxy Server Cluster (무선 인터넷 프록시 서버 클러스터 성능 개선)

  • Kwak, Hu-Keun;Chung, Kyu-Sik
    • Journal of KIISE:Information Networking
    • /
    • v.32 no.3
    • /
    • pp.415-426
    • /
    • 2005
  • Wireless internet, which becomes a hot social issue, has limitations due to the following characteristics, as different from wired internet. It has low bandwidth, frequent disconnection, low computing power, and small screen in user terminal. Also, it has technical issues to Improve in terms of user mobility, network protocol, security, and etc. Wireless internet server should be scalable to handle a large scale traffic due to rapidly growing users. In this paper, wireless internet proxy server clusters are used for the wireless Internet because their caching, distillation, and clustering functions are helpful to overcome the above limitations and needs. TranSend was proposed as a clustering based wireless internet proxy server but it has disadvantages; 1) its scalability is difficult to achieve because there is no systematic way to do it and 2) its structure is complex because of the inefficient communication structure among modules. In our former research, we proposed the All-in-one structure which can be scalable in a systematic way but it also has disadvantages; 1) data sharing among cache servers is not allowed and 2) its communication structure among modules is complex. In this paper, we proposed its improved scheme which has an efficient communication structure among modules and allows data to be shared among cache servers. We performed experiments using 16 PCs and experimental results show 54.86$\%$ and 4.70$\%$ performance improvement of the proposed system compared to TranSend and All-in-one system respectively Due to data sharing amount cache servers, the proposed scheme has an advantage of keeping a fixed size of the total cache memory regardless of cache server numbers. On the contrary, in All-in-one, the total cache memory size increases proportional to the number of cache servers since each cache server should keep all cache data, respectively.

Policy for Selective Flushing of Smartphone Buffer Cache using Persistent Memory (영속 메모리를 이용한 스마트폰 버퍼 캐시의 선별적 플러시 정책)

  • Lim, Soojung;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.1
    • /
    • pp.71-76
    • /
    • 2022
  • Buffer cache bridges the performance gap between memory and storage, but its effectiveness is limited due to periodic flush, performed to prevent data loss in smartphones. This paper shows that selective flushing technique with small persistent memory can reduce the flushing overhead of smartphone buffer cache significantly. This is due to our I/O analysis of smartphone applications in that a certain hot data account for most of file writes, while a large proportion of file data incurs single-writes. The proposed selective flushing policy performs flushing to persistent memory for frequently updated data, and storage flushing is performed only for single-write data. This eliminates storage write traffic and also improves the space efficiency of persistent memory. Simulations with popular smartphone application I/O traces show that the proposed policy reduces write traffic to storage by 24.8% on average and up to 37.8%.

A Buffer Replacement Policy using Hot Page Management Scheme for Improving Performance of Flash Memory (플래시 메모리 성능향상을 위한 핫 페이지 관리 기법을 이용한 버퍼교체 정책)

  • Daeyoung Kim;Junghan Kim;Hyun-jin Cho;Young Ik Eom
    • Annual Conference of KIPS
    • /
    • 2008.11a
    • /
    • pp.860-863
    • /
    • 2008
  • 플래시 메모리는 우리 생활에 널리 사용되고 있는 휴대용 저장장치 중의 하나이다. 빠른 입출력 속도와 저전력, 무소음, 작은 크기 등의 장점을 가지나 덮어쓰기가 불가능하고 읽기/쓰기의 속도에 비해 소거 연산의 속도가 매우 느리다는 단점이 있다. 이를 보완하기 위해, 호스트와 플래시 메모리 사이에 버퍼 캐시를 두어 사용하고 있으며, 버퍼 캐시에 사용되는 교체 정책에 따라 플래시 메모리 장치의 성능이 크게 영향을 받는다. 본 논문에서는 블록 단위의 LRU 기법의 단점을 개선한 HPLRU 기법을 제안한다. HPLRU 기법은 최근에 자주 참조되었던 페이지인 핫 페이지 들을 모아 리스트를 만들어 관리하고, 이를 통해 페이지 적중률을 향상시키고 다른 페이지들로 인해 핫 페이지들이 소거되는 현상을 개선하였다. 이 알고리즘은 임의 데이터 패턴에 좋은 성능을 보이며 쓰기 발생 횟수를 많이 감소시키는 결과를 보였다.

An Optimal Technic to Utilize Resource on Extended Web Cache Server (확장된 웹 캐시 서버에서 자원이용률 최적화 기법)

  • 김원기;김두상;김성락;구용완
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10e
    • /
    • pp.184-186
    • /
    • 2002
  • 대규모 웹 캐시 서버의 자원 이용도는 네트워크와 디스크 I/O 대기 시간에 주로 의존하고 또한 작업 부하 패턴에 있어 네트웍 사용이 폭주하는 시간과 새벽과 같은 한가한 시간간의 변동성이 심하다. 따라서, 한정된 자원범위에서 최상의 서비스를 제공키위해서는 절정기 동안 자원 이용도를 낮추고 이들 작업부하를 비절정기 때에 나누어 수행토록 함으로써 자원 활용도를 최대로 끌어 올리자는데, 연구의 목적이 있다 이를 위해 비절정기 동안 캐시압축 기법을 이용하여 디스크 입출력 작업을 미래예측 기법은 어느 점에서의 실제 작업 세트가 작았다는 것과 페이지 재사용 패턴의 정확한 예측은 물리적 메모리 크기의 캐시에서 높은 히트율을 생산할 것이라는 점을 보여주었다.

  • PDF

I/O Traffic based Task Classification for Shared Last Level Cache Utilization in NUMA Systems (NUMA 시스템의 공유 LLC 활용을 위한 I/O 트래픽에 따른 태스크 분류법)

  • An, Deukhyeon;Kim, Jihong;Eom, Young Ik
    • Annual Conference of KIPS
    • /
    • 2012.04a
    • /
    • pp.199-201
    • /
    • 2012
  • 디스크나 이더넷과 같은 I/O 장치로부터 발생하는 I/O 트래픽은, 여러 개의 노드를 가진 NUMA 시스템의 공유 LLC에 캐시 오염을 일으켜 캐시 라인이 재사용되는 것을 방해한다. 이러한 태스크는 캐시를 효율적으로 이용할 수 있는 메모리 집중적인 태스크들과 따로 분리하여 다룰 필요가 있다. 본 논문에서는 이러한 캐시 오염을 발생시키는 태스크들을 해당 태스크의 I/O 트래픽을 이용하여 실시간으로 감시하고 분류하는 기법을 제안한다. 또한 대량의 I/O 트래픽을 일으키는 태스크의 특성을 알아본다. 이를 통해, NUMA 시스템 환경에서 각 노드의 공유 LLC를 보다 효율적으로 사용할 수 있는 운영체제 스케줄링 기법을 연구하기 위한 토대를 마련하였다.

An L1 Cache Prefetching Scheme using Excessively Aggressive Prefetchering and a Small Direct-mapped Filtering Cache (공격적인 선인출 및 직접 사상 필터링을 이용한 L1 캐시 선인출 기법)

  • Chon, Young-Suk
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.11
    • /
    • pp.836-852
    • /
    • 2006
  • This paper proposes an L1 cache prefetch scheme using an excessively aggressive hardware prefetcher and a hardware prefetch filter having a small direct-mapped filtering cache. A quantitative analysis method has been introduced and applied to analyze nonideal effects of aggressive cache prefetching. From those analysis results, the structure and algorithm of a prefetch filter has been derived and simulated, and the overall system performance has been measured using a cycle-by-cycle cache simulator. Experimental results show that the proposed scheme improves the overall system performance by 18% on the average over several benchmarks