• Title/Summary/Keyword: Buffer cache

Search Result 132, Processing Time 0.027 seconds

Implementation of Memory Copy Reduction Scheme for Networked Multimedia Service in Linux (리눅스 커널에서 네트워크 멀티미디어 서비스를 위한 메모리 복사 감소 기법 구현)

  • Kim, Jeong-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.2B
    • /
    • pp.129-137
    • /
    • 2003
  • Multimedia streams, like MPEG continuously retrieve multimedia data because of their incessant playback. While these streams need an efficient support of kernel, the current buffer cache mechanism of Linux kernel such as Unix operating system was designed apt for small files, which is aperiodically requested as well as time uncritical. But, in case of continuous media, the CPU must enormously copy memory from kernel address space to user address space. This must lead to a large CPU overhead. This overhead both degrades system throughput and cannot guarantee QOS. In this paper, we have designed and implemented two memory copy reduction schemes in Linux kernel, direct I/O and one copy. The direct I/O skips the buffer cache layer of Linux kernel and results in dramatic reduction of CPU memory copy overhead. And, the one copy provides a fast disk-to-network data path without copying to user address space. The experimental results show considerable reduction of CPU overhead and throughput improvements.

Design and Performance Analysis of Caching Algorithms for Distributed Non-uniform Objects (분산 이질형 객체 환경에서 캐슁 알고리즘의 설계 및 성능 분석)

  • Bahn, Hyo-Kyung;Noh, Sam-Hyeok;Min, Sang-Lyul;Koh, Kern
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.6
    • /
    • pp.583-591
    • /
    • 2000
  • Caching mechanisms have been studied extensively to buffer the speed gap of hierarchical storages in the context of cache memory, paging system, and buffer management system. As the wide-area distributed environments such as the WWW extend broadly, caching of remote objects becomes more and more important. In the wide-area distributed environments, the cost and the benefit of caching an object is not uniform due to the location of the object; which should be considered in the cache replacement algorithms. For online operation, the time complexity of the replacement algorithm should not be excessive. To date, most replacement algorithms for the wide-area distributed environments do not meet both the non-uniformity of objects and the time complexity constraint. This paper proposes a replacement algorithm which considers the non-uniformity of objects properly; it also allows for an efficient implementation. Trace-driven simulations show that proposed algorithm outperforms existing replacement algorithms.

  • PDF

Segment-based Buffer Management for Multi-level Streaming Service in the Proxy System (프록시 시스템에서 multi-level 스트리밍 서비스를 위한 세그먼트 기반의 버퍼관리)

  • Lee, Chong-Deuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.11
    • /
    • pp.135-142
    • /
    • 2010
  • QoS in the proxy system are under heavy influence from interferences such as congestion, latency, and retransmission. Also, multi-level streaming services affects from temporal synchronization, which lead to degrade the service quality. This paper proposes a new segment-based buffer management mechanism which reduces performance degradation of streaming services and enhances throughput of streaming due to drawbacks of the proxy system. The proposed paper optimizes streaming services by: 1) Use of segment-based buffer management mechanism, 2) Minimization of overhead due to congestion and interference, and 3) Minimization of retransmission due to disconnection and delay. This paper utilizes fuzzy value $\mu$ and cost weight $\omega$ to process the result. The simulation result shows that the proposed mechanism has better performance in buffer cache control rate, average packet loss rate, and delay saving rate with stream relevance metric than the other existing methods of fixed segmentation method, pyramid segmentation method, and skyscraper segmentation method.

Enhancing LRU Buffer Replacement Policy with Delayed Write of Not-cold-dirty-pages for Flash Memory (플래시 메모리를 위한 Not-cold-Page 쓰기지연을 통한 LRU 버퍼교체 정책 개선)

  • Jung Ho-Young;Park Sung-Min;Cha Jae-Hyuk;Kang Soo-Yong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.634-641
    • /
    • 2006
  • Flash memory has many advantages like non-volatility and fast I/O speed, but it has also disadvantages such as not-in-place-update data and asymmetric read/write/erase speed. For the performance of flash memory storage, it is essential for the buffer replacement algorithms to reduce the number of write operations that also affects the number of erase operations. A new buffer replacement algorithm is proposed in this paper, that delays the writes of not-cold-dirty pages in the buffer cache of flash storage. We show that this algorithm effectively decreases the number of write operations and erase operations without much degradation of hit ratio. As a result overall performance of flash I/O speed is improved.

Hybrid Value Predictor in Wide-Issue Superscalar Processor (슈퍼스칼라 프로세서에서 명령 윈도우 크기에 따른 혼합형 값 예측기)

  • Jeon, Byoung-Chan;Choi, Gyoo-Seok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.2
    • /
    • pp.97-103
    • /
    • 2009
  • In this paper, the performance of a hybrid value predictor according to the instruction fetch rate on window size superscalar processors is evaluated. In general, the data dependency relations of instructions are increased with the number of the fetched instructions. Therefore, it is expected that the performance of a value predictor will be higher when the instruction fetch rate is increased. The performance is studied for the machine with collapsing buffer and he one with trace cache as an instruction fetch mechanism. As a result of experiment, it is showed that the performance effect of a value predictor is higher as the instruction fetch rate of instruction window size, IPC, predict rate on apply with non-tc and tc is increased.

  • PDF

Implementation and Performance Analysis of Event Processing and Buffer Managing Techniques for DDS (고성능 데이터 발간/구독 미들웨어의 이벤트, 버퍼 처리 기술 및 성능 분석)

  • Yoon, Gunjae;Choi, Hoon
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.449-459
    • /
    • 2017
  • Data Distribution Service (DDS) is a communication middleware that supports a flexible, scalable and real-time communication capability. This paper describes several techniques to improve the performance of DDS middleware. Detailed events for the internal behavior of the middleware are defined. A DDS message is disassembled into several submessages of independent, meaningful units for event-driven structuring in order to reduce the processing complexity. The proposed technique of history cache management is also described. It utilizes the fact that status access and random access to the history cache occur more frequently in the DDS. These methods have been implemented in the EchoDDS, the DDS implementation developed by our team, and it showed improved performance.

Dual Cache System Based on the Locality Decision Mechanism (지역성 결정 메커니즘을 기반으로 한 이중 캐쉬 시스템)

  • Lee, Jeong-Hun;Lee, Jang-Su;Kim, Sin-Deok
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.11
    • /
    • pp.908-918
    • /
    • 2000
  • 캐쉬의 성능을 향상시키는 가장 효과적인 방법은 프로그램 수행 특성에 내재되어 있는 시간적 (temporal locality) -공간적 지역성 (spatial locality)을 활용하는 것이다. 본 논문에서는 추가적인 장치나 컴파일러의 도움 없이 단지 캐쉬의 구조적인 특징과 간단한 메커니즘만을 이용하여 두 가지 타입의 지역성을 효과적으로 반영할 수 있는 새로운 캐쉬 시스템이 제안된다. 제안하는 새로운 캐쉬 시스템은 다른 블록 크기와 다른 연관도를 가지는 두 개의 캐쉬로써 구성되어 진다. 즉 작은 블록 크기를 지원하는 직접사상 캐쉬 (direct-mapped cache)와 큰 블록을 지원하는 완전 연관 버퍼 (fully-associative buffer)로 구성되어 진다. 큰 블록은 여러 개의 작은 블록으로 구성되어지며 두 캐쉬에서 접근 실패가 발생할 경우 직접사상 캐쉬의 접근 실패가 발생한 작은 블록과 그 이웃 작은 블록을 완전 연관 버퍼에 저장시킴으로써 한번 참조가 일어난 블록의 이웃 블록이 참조될 확률이 높다는 공간적 지역성의 특성을 효과적으로 반영할 수 있다. 또한 참조가 일어난 블록은 제어 비트를 사용하여 선택적으로 작은 블록을 직접사상 캐쉬에 저장함으로써 시간적 지역성을 보다 효과적으로 사용할 수 있다 시뮬레이션 결과에 따르면 기존의 직접사상 캐쉬의 4배 크기보다도 좋은 성능 향상을 보이고 있으며, 동일한 크기의 victim 캐쉬보다 우수한 성능을 보이고 소비 전력 면에서는 5% 정도의 전력 감소를 보이고 있다.

  • PDF

An Industrial Case Study of the ARM926EJ-S Power Modeling

  • Kim, Hyun-Suk;Kim, Seok-Hoon;Lee, Ik-Hwan;Yoo, Sung-Joo;Chung, Eui-Young;Choi, Kyu-Myung;Kong, Jeong-Taek;Eo, Soo-Kwan
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.5 no.4
    • /
    • pp.221-228
    • /
    • 2005
  • In this work, our goal is to develop a fast and accurate power model of the ARM926EJ-S processor in the industrial design environment. Compared with existing work on processor power modeling which focuses on the power states of processor core, our model mostly focuses on the cache power model. It gives more than 93% accuracy and 1600 times speedup compared with post-layout gate-level power estimation. We also address two practical issues in applying the processor power model to the real design environment. One is to incorporate the power model into an existing commercial instruction set simulator. The other is the re-characterization of power model parameters to cope with different gate-level netlists of the processor obtained from different design teams and different fabrication technology.

Analytical Models and their Performance Analysis of Superscalar Processors (수퍼스칼라 프로세서의 해석적 모델 및 성능 분석)

  • Kim, Hak-Jun;Kim, Seon-Mo;Choe, Sang-Bang
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.7
    • /
    • pp.847-862
    • /
    • 1999
  • 본 논문에서는 유한버퍼의(finite-buffered) 동기화된(synchronous) 큐잉모델(queueing model)을 이용하여 명령어들간의 병렬성, 분기명령의 빈도수, 분기예측(branch prediction)의 정확도, 캐쉬미스 등의 파라미터들을 고려하여 프로세서의 명령어 실행율을 예측하며 캐쉬의 성능과 파이프라인 성능간의 관계를 분석할 수 있는 새로운 해석적 모델을 제안하였다. 해석적 모델은 모델의 타당성을 검증하기 위해서 시뮬레이션을 수행하여 얻은 결과와 비교하였다. 해석적 모델과 시뮬레이션을 비교한 결과 대부분 10% 오차 내에서 일치하였다. 본 연구를 통하여 얻은 해석적 모델을 사용하면 시뮬레이션에서는 드러나지 않는 성능제약의 원인에 대한 명확한 규명이 가능하기 때문에 성능향상을 위한 설계자료를 얻을 수 있으며, 시스템 성능 밸런스를 위한 캐쉬와 비순차이슈 파이프라인 성능간의 관계에 대한 정확한 분석이 가능하다.Abstract This research presents a novel analytic model to predict the instruction execution rate of superscalar processors using the queuing model with finite-buffer size and synchronous operation mode. The proposed model is also able to analyze the performance relationship between cache and pipeline. The proposed model takes into account various kinds of architectural parameters such as instruction-level parallelism, branch probability, the accuracy of branch prediction, cache miss, and etc.. To prove the correctness of the model, we performed extensive simulations and compared the results with the analytic model. Simulation results showed that the proposed model can estimate the average execution rate accurately within 10% error compared to simulation results. The proposed model can explain the causes of performance bottleneck which cannot be uncovered by the simulation method only. The model is also able to show the effect of the cache miss on the performance of out-of-order issue superscalar processors, which can provide an valuable information in designing a balanced system.

Instructions and Data Prefetch Mechanism using Displacement History Buffer (변위 히스토리 버퍼를 이용한 명령어 및 데이터 프리페치 기법)

  • Jeong, Yong Su;Kim, JinHyuk;Cho, Tae Hwan;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.82-94
    • /
    • 2015
  • In this paper, we propose hardware prefetch mechanism with an efficient cache replacement policy by giving priority to the trigger block in which a spatial region and producing a spatial region by using the displacement field. It could be taken into account the sequence of the program since a history is based on the trigger block of history record, and it could be quickly prefetching the instructions or data address by adding a stored value to the trigger address and displacement field since a history is stored as a displacement value. Also, we proposed a method of replacing at random by the cache replacement policy from the low priority block when the cache area is full after giving priority to the trigger block. We analyzed using the memory simulator program gem5 and PARSEC benchmark to assess the performance of the hardware prefetcher. As a result, compared to the existing hardware prefecture to generate the spatial region using a bit vector, L1 data cache miss rate was reduced about 44.5% on average and an average of 26.1% of L1 instruction misses occur. In addition, IPC (Instruction Per Cycle) showed an improvement of about 23.7% on average.