• Title/Summary/Keyword: Cache utilization

Search Result 51, Processing Time 0.026 seconds

Cache-Friendly Adaptive Video Streaming Framework Exploiting Regular Expression in Content Centric Networks (콘텐트 중심 네트워크에서 정규표현식을 활용한 캐시친화적인 적응형 스트리밍 프레임워크)

  • Son, Donghyun;Choi, Daejin;Choi, Nakjung;Song, Junghwan;Kwon, Ted Taekyoung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.9
    • /
    • pp.1776-1785
    • /
    • 2015
  • Content Centric Network (CCN) has been introduced as a new paradigm due to a shift of users's perspective of using Internet from host-centric to content-centric. On the other hand, a demand for video streaming has been increasing. Thus, Adaptive streaming has been introduced and researched for achieving higher user's satisfaction. If an architecture of Internet is replaced with CCN architecture, it is necessary to consider adaptive video streaming in CCN according to the demand of users. However, if the same rate decision algorithm used in Internet is deployed in CCN, there are a limitation of utilizing content store (CS) in CCN router and a problem of reflecting dynamic requirements. Therefore, this paper presents a framework adequate to CCN protocol and cache utilization, adapting content naming method of exploiting regular expression to the rate decision algorithm of the existing adaptive streaming. In addition, it also improves the quality of video streaming and verifies the performance through dynamic expression strategies and selection algorithm of the strategies.

Comprehensive Investigations on QUEST: a Novel QoS-Enhanced Stochastic Packet Scheduler for Intelligent LTE Routers

  • Paul, Suman;Pandit, Malay Kumar
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.2
    • /
    • pp.579-603
    • /
    • 2018
  • In this paper we propose a QoS-enhanced intelligent stochastic optimal fair real-time packet scheduler, QUEST, for 4G LTE traffic in routers. The objective of this research is to maximize the system QoS subject to the constraint that the processor utilization is kept nearly at 100 percent. The QUEST has following unique advantages. First, it solves the challenging problem of starvation for low priority process - buffered streaming video and TCP based; second, it solves the major bottleneck of the scheduler Earliest Deadline First's failure at heavy loads. Finally, QUEST offers the benefit of arbitrarily pre-programming the process utilization ratio.Three classes of multimedia 4G LTE QCI traffic, conversational voice, live streaming video, buffered streaming video and TCP based applications have been considered. We analyse two most important QoS metrics, packet loss rate (PLR) and mean waiting time. All claims are supported by discrete event and Monte Carlo simulations. The simulation results show that the QUEST scheduler outperforms current state-of-the-art benchmark schedulers. The proposed scheduler offers 37 percent improvement in PLR and 23 percent improvement in mean waiting time over the best competing current scheduler Accuracy-aware EDF.

Optimization of LU-SGS Code for the Acceleration on the Modern Microprocessors

  • Jang, Keun-Jin;Kim, Jong-Kwan;Cho, Deok-Rae;Choi, Jeong-Yeol
    • International Journal of Aeronautical and Space Sciences
    • /
    • v.14 no.2
    • /
    • pp.112-121
    • /
    • 2013
  • An approach for composing a performance optimized computational code is suggested for the latest microprocessors. The concept of the code optimization, termed localization, is maximizing the utilization of the second level cache that is common to all the latest computer systems, and minimizing the access to system main memory. In this study, the localized optimization of the LU-SGS (Lower-Upper Symmetric Gauss-Seidel) code for the solution of fluid dynamic equations was carried out in three different levels and tested for several different microprocessor architectures widely used these days. The test results of localized optimization showed a remarkable performance gain of more than two times faster solution than the baseline algorithm for producing exactly the same solution on the same computer system.

Localized Composition of LU-SGS Code on Latest Microprocessors (최신 마이크로프로세서상에서 LU-SGS 코드의 국소화 작성)

  • Choi Jeong-Yeol
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2001.05a
    • /
    • pp.45-50
    • /
    • 2001
  • An approach of composing a performance optimized computational code is suggested for latest microprocessors. The approach named as localization is a concept of minimizing the access to system's main memory and maximizing the utilization of second level cache that is common to all the latest computer system. The localized compositions of LU-SGS scheme for fluid dynamics were made in three different levels and tested on three different microprocessor architectures most widely used in these days. The test results of localization concept showed a remarkable performance, that is the showing gain up to 4.5 times faster solution than the baseline algorithm $450\%$ for producing an exactly the same solution.

  • PDF

An Effective Pre-refresh Mechanism for Embedded Web Browser of Mobile Handheld Devices

  • Li Huaqiang;Kim Young-Hak;Kim Tae-Hyung
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.12
    • /
    • pp.1754-1764
    • /
    • 2004
  • Lately mobile handheld devices such as Personal Digital Assistant (PDA) and cellular phones are getting more popular for personal web surfing. However, today most mobile handheld devices have relatively poor web browsing capability due to their low performance so their users have to suffer longer communication latency than those of desktop Personal Computers (PCs). In this paper, we propose an effective pre-refresh mechanism for embedded web browser of mobile handheld devices to reduce this problem. The proposed mechanism uses the idle time to pre-refresh the expired web objects in an embedded web browser's cache memory. It increases the utilization of Central Processing Unit (CPU) power and network bandwidth during the idle time and consequently reduces the client's latency and web browsing cost. An experiment was done using a simulator designed by us to evaluate the efficacy of the proposed mechanism. The experiment result demonstrates that it has a good performance to make web surfing faster.

  • PDF

Memory Latency Hiding Techniques (메모리 지연을 감추는 기법들)

  • Ki, An-Do
    • Electronics and Telecommunications Trends
    • /
    • v.13 no.3 s.51
    • /
    • pp.61-70
    • /
    • 1998
  • The obvious way to make a computer system more powerful is to make the processor as fast as possible. Furthermore, adopting a large number of such fast processors would be the next step. This multiprocessor system could be useful only if it distributes workload uniformly and if its processors are fully utilized. To achieve a higher processor utilization, memory access latency must be reduced as much as possible and even more the remaining latency must be hidden. The actual latency can be reduced by using fast logic and the effective latency can be reduced by using cache. This article discusses what the memory latency problem is, how serious it is by presenting analytical and simulation results, and existing techniques for coping with it; such as write-buffer, relaxed consistency model, multi-threading, data locality optimization, data forwarding, and data prefetching.

Dynamic Prefetch Filtering Schemes to Enhance Utilization of Data Cache (데이터 캐시의 활용도를 높이는 동적 선인출 필터링 기법)

  • 전영숙;이병권;김석일;전중남
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.562-564
    • /
    • 2004
  • 캐시 선인출 기법은 메모리 참조에 따른 지연시간을 줄이는 효과적인 방법이다. 그러나 너무 적극적인 선인출은 캐시 오염을 유발시켜 선인출에 의한 장점을 상쇄시킨다. 본 연구에서는 캐시의 오염을 줄이기 위해 동적으로 필터 테이블을 참조하여 선인출 명령을 수행할 지의 여부를 결정하는 4가지 필터링 방법들을 비교 평가한다. 비교 연구를 위한 이상적인 필터링 구조를 제안하였으며, 기존 연구에서의 잠김 현상을 개선하기 위한 이진 상태 구조를 제안하였다. 또한, 정교한 필터링을 위한 블록주소 참조 방식을 제안하였다. 일반적으로 많이 사용되는 일반 벤치마크 프로그램과 멀티미디어 벤치마크 프로그램들에 대하여 실험한 결과, 캐시 미스율이 이진 상태 구조는 평균 5.6%, 블록주소 참조 구조는 7.9% 각각 감소하였다.

  • PDF

I/O Traffic based Task Classification for Shared Last Level Cache Utilization in NUMA Systems (NUMA 시스템의 공유 LLC 활용을 위한 I/O 트래픽에 따른 태스크 분류법)

  • An, Deukhyeon;Kim, Jihong;Eom, Young Ik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.199-201
    • /
    • 2012
  • 디스크나 이더넷과 같은 I/O 장치로부터 발생하는 I/O 트래픽은, 여러 개의 노드를 가진 NUMA 시스템의 공유 LLC에 캐시 오염을 일으켜 캐시 라인이 재사용되는 것을 방해한다. 이러한 태스크는 캐시를 효율적으로 이용할 수 있는 메모리 집중적인 태스크들과 따로 분리하여 다룰 필요가 있다. 본 논문에서는 이러한 캐시 오염을 발생시키는 태스크들을 해당 태스크의 I/O 트래픽을 이용하여 실시간으로 감시하고 분류하는 기법을 제안한다. 또한 대량의 I/O 트래픽을 일으키는 태스크의 특성을 알아본다. 이를 통해, NUMA 시스템 환경에서 각 노드의 공유 LLC를 보다 효율적으로 사용할 수 있는 운영체제 스케줄링 기법을 연구하기 위한 토대를 마련하였다.

A Compact Representation of Translation Pages for Flash Translation Layers of Solid State Drives

  • Kim, Yong-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.2
    • /
    • pp.1-7
    • /
    • 2019
  • This paper presents CTP (Compact Translation Page), a compact representation of translation pages, for page mapping-based flash translation layers to improve RAM utilization and reduce the response time of solid state drives. CTP can store translation information twice in a translation page and the total number of translation pages stored in flash is reduced to half. Therefore, CTP halves the RAM size of the directory of translation pages and uses the saved RAM space for translation cache. CTP shows the best response time when compared to existing page mapping-based flash translation layers.

Parallel Cell-Connectivity Information Extraction Algorithm for Ray-casting on Unstructured Grid Data (비정렬 격자에 대한 광선 투사를 위한 셀 사이 연결정보 추출 병렬처리 알고리즘)

  • Lee, Jihun;Kim, Duksu
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.1
    • /
    • pp.17-25
    • /
    • 2020
  • We present a novel multi-core CPU based parallel algorithm for the cell-connectivity information extraction algorithm, which is one of the preprocessing steps for volume rendering of unstructured grid data. We first check the synchronization issues when parallelizing the prior serial algorithm naively. Then, we propose a 3-step parallel algorithm that achieves high parallelization efficiency by removing synchronization in each step. Also, our 3-step algorithm improves the cache utilization efficiency by increasing the spatial locality for the duplicated triangle test process, which is the core operation of building cell-connectivity information. We further improve the efficiency of our parallel algorithm by employing a memory pool for each thread. To check the benefit of our approach, we implemented our method on a system consisting of two octa-core CPUs and measured the performance. As a result, our method shows continuous performance improvement as we add threads. Also, it achieves up to 82.9 times higher performance compared with the prior serial algorithm when we use thirty-two threads (sixteen physical cores). These results demonstrate the high parallelization efficiency and high cache utilization efficiency of our method. Also, it validates the suitability of our algorithm for large-scale unstructured data.