• Title/Summary/Keyword: cache performance

Search Result 659, Processing Time 0.028 seconds

A Fast IP Lookups using Dynamic Trie Compression (능동적 트라이 압축을 이용한 고속 IP 검색)

  • Oh, Seung-Hyun
    • The KIPS Transactions:PartA
    • /
    • v.10A no.5
    • /
    • pp.453-462
    • /
    • 2003
  • IP address lookup of router searches and decide proper output link using destination address of IP packet that arrie into router. The IP address lookup is essential part in te development of high-speed router needed to high-speed backbone network as one of bottleneck of router performance. This paper introduces DTC data structure that can support gigabit IP address lookup by dynamic trie compression technique that just uses small memory in conventional Pentium CPU. When make a forwarding table by trie compression, the DTC can dynamically select a size of data structure with considering correlation between table's size and searching speed. Also, when compress the prefix trie, DTC makes IP address lookup on the forwarding table of a search on the high speed SRAM cache by minimizing the size of data structure reflecting the structure of the trie. In the experiment result, the DTC data structure recorded performance of maximum $12.5{\times}10^5$ LPS (lookup per second) in conventional Pentium CPU through a dynamic building of most suitable compression over variety of routing tables.

Segment-based Buffer Management for Multi-level Streaming Service in the Proxy System (프록시 시스템에서 multi-level 스트리밍 서비스를 위한 세그먼트 기반의 버퍼관리)

  • Lee, Chong-Deuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.11
    • /
    • pp.135-142
    • /
    • 2010
  • QoS in the proxy system are under heavy influence from interferences such as congestion, latency, and retransmission. Also, multi-level streaming services affects from temporal synchronization, which lead to degrade the service quality. This paper proposes a new segment-based buffer management mechanism which reduces performance degradation of streaming services and enhances throughput of streaming due to drawbacks of the proxy system. The proposed paper optimizes streaming services by: 1) Use of segment-based buffer management mechanism, 2) Minimization of overhead due to congestion and interference, and 3) Minimization of retransmission due to disconnection and delay. This paper utilizes fuzzy value $\mu$ and cost weight $\omega$ to process the result. The simulation result shows that the proposed mechanism has better performance in buffer cache control rate, average packet loss rate, and delay saving rate with stream relevance metric than the other existing methods of fixed segmentation method, pyramid segmentation method, and skyscraper segmentation method.

A Processor Allocation Policy using Program Characteristics on Shared Bus (공유 버스상에서 프로그램 특성을 사용한 프로세서 할당 정책)

  • Jeong, In-Beom;Lee, Jun-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.9
    • /
    • pp.1073-1082
    • /
    • 1999
  • 본 논문에서는 시스템 내의 프로세서들을 효과적으로 사용하기 위한 적응적 프로세서 할당 정책을 제안한다. 프로그램의 병렬성을 향상시키기 위하여 일반적으로 병렬 처리에 사용될 프로세서 개수를 증가시킨다. 그러나 증가된 프로세서들은 그레인 크기에 변화를 일으키며 이는 캐쉬 성능에 영향을 미친다. 특히 대역이 제한된 공유 버스를 사용하는 시스템에서는 프로세서 개수의 증가는 공유 버스에 대한 접근 경쟁을 크게 증가하므로 버스에서 대기하는 시간이 프로세서 증가에 의한 계산 능력 이득을 상쇄시키는 주요한 원인이 되고 있다. 본 논문에서 제안한 적응적 프로세서 할당 정책은 프로그램이 수행되는 도중에 임의의 기간동안 공유버스에 대기중인 프로세서 분포에 관한 정보를 얻는다. 그리고 이 정보를 바탕으로 프로세서 개수를 변경하는 방법이다. 모의 시험에서 적응적 프로세서 할당 정책은 프로그램들의 버스 트래픽 특성에 따른 최적의 적합한 프로세서 개수를 발견함을 보인다. 그리고 적응적 프로세서 할당 정책은 고정된 프로세서 개수를 사용한 가장 좋은 성능보다는 다소 떨어진 성능을 나타내었으나 시스템의 프로세서 활용성을 높여 효과적 시스템 사용에 기여함을 보인다. Abstract In this paper, the adaptive processor allocation policy is suggested to make effective use of processors in system. To enhance the parallelism, the number of processors used in the parallel computing may be increased. However, increasing the number of processors affects the grain size of the parallel program. Therefore, it affects the cache performance. In particular, when the shared bus is employed, since increasing the number of processors can result in a significant amount of contention to achieve the shared-bus, the increased computing power is offset by the bus waiting time due to these contentions. The adaptive processor allocation policy acquires the information about the distribution of waiting processors on shared bus for any execution period of programs. And it changes the number of processors working in parallel processing during the program's run. Our simulation results show that the adaptive processor allocation policy finds the optimum feasible number of processors based on the bus traffic characteristic of programs. Thus, it contributes to effective system utilization, even though it performs slightly less efficiently than using a fixed number of processors with the best performance.

Enhancing LRU Buffer Replacement Policy with Delayed Write of Not-cold-dirty-pages for Flash Memory (플래시 메모리를 위한 Not-cold-Page 쓰기지연을 통한 LRU 버퍼교체 정책 개선)

  • Jung Ho-Young;Park Sung-Min;Cha Jae-Hyuk;Kang Soo-Yong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.634-641
    • /
    • 2006
  • Flash memory has many advantages like non-volatility and fast I/O speed, but it has also disadvantages such as not-in-place-update data and asymmetric read/write/erase speed. For the performance of flash memory storage, it is essential for the buffer replacement algorithms to reduce the number of write operations that also affects the number of erase operations. A new buffer replacement algorithm is proposed in this paper, that delays the writes of not-cold-dirty pages in the buffer cache of flash storage. We show that this algorithm effectively decreases the number of write operations and erase operations without much degradation of hit ratio. As a result overall performance of flash I/O speed is improved.

The Study of the Object Replication Management using Adaptive Duplication Object Algorithm (적응적 중복 객체 알고리즘을 이용한 객체 복제본 관리 연구)

  • 박종선;장용철;오수열
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.1
    • /
    • pp.51-59
    • /
    • 2003
  • It is effective to be located in the double nodes in the distributed object replication systems, then object which nodes share is the same contents. The nodes store an access information on their local cache as it access to the system. and then the nodes fetch and use it, when it needed. But with time the coherence Problems will happen because a data carl be updated by other nodes. So keeping the coherence of the system we need a mechanism that we managed the to improve to improve the performance and availability of the system effectively. In this paper to keep coherence in the shared memory condition, we can set the limited parallel performance without the additional cost except the coherence cost using it to keep the object at the proposed adaptive duplication object(ADO) algorithms. Also to minimize the coherence maintenance cost which is the bi99est overhead in the duplication method, we must manage the object effectively for the number of replication and location of the object replica which is the most important points, and then it determines the cos. And that we must study the adaptive duplication object management mechanism which will improve the entire run time.

  • PDF

Data Replication and Migration Scheme for Load Balancing in Distributed Memory Environments (분산 인-메모리 환경에서 부하 분산을 위한 데이터 복제와 이주 기법)

  • Choi, Kitae;Yoon, Sangwon;Park, Jaeyeol;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.44-49
    • /
    • 2016
  • Recently, data has been growing dramatically along with the growth of social media and digital devices. A distributed memory processing system has been used to efficiently process large amounts of data. However, if a load is concentrated in a certain node in distributed environments, a node performance significantly degrades. In this paper, we propose a load balancing scheme to distribute load in a distributed memory environment. The proposed scheme replicates hot data to multiple nodes for managing a node's load and migrates the data by considering the load of the nodes when nodes are added or removed. The client reduces the number of accesses to the central server by directly accessing the data node through the metadata information of the hot data. In order to show the superiority of the proposed scheme, we compare it with the existing load balancing scheme through performance evaluation.

An Efficient P2P Service using Distributed Caches in MANETs (모바일 애드-혹 망에서 분산 캐시를 이용한 효율적인 P2P 서비스 방법)

  • Oh, Sun-Jin;Lee, Young-Dae
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.3
    • /
    • pp.165-171
    • /
    • 2009
  • With rapid growth of Mobile Ad-Hoc network(MANET) and P2P service technologies, many attempts for integration of MANET and P2P service and development of such applications are actively introduced recently. The implementation of stable P2P service, however, is very difficult challenge because of the high mobility of mobile users in MANET. In this paper, we propose an efficient mobile P2P service, which shares and manages multimedia data files efficiently in mobile environment, uses distributed caches to store files considering their popularities in order to achieve high performance. The performance of proposed P2P service is evaluated by an analytic model and compared with those of existing DHT based P2P service in peer-to-peer network.

  • PDF

Accelerating Medical Image Processing on Integrated GPU Using OpenCL (OpenCL을 이용한 내장형 GPU에서의 의학영상처리 가속화)

  • Kim, Beom-Jun;Shin, Byeong-seok
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.2
    • /
    • pp.1-10
    • /
    • 2017
  • A variety of filters are applied to improve the quality of noise and low resolution medical images. This is necessary to reduce the radiation dose of the patient and to improve the utilization of the conventional spherical imaging equipment. In the conventional method, it is common to perform filtering using the CPU of the PC. However, it is difficult to produce results in real time by applying various calculations and filters to high-resolution human images using only the CPU performance of a PC used in a hospital. In this paper, we analyze the structure and performance of Intel integrated GPU in CPU and propose a method to perform image filtering using OpenCL parallel processing function. By applying complex filters with high computational complexity to medical images, high quality images can be generated in real time.

Mobile Ad Hoc Routing Protocol Supporting Unidirectional Links (단방향 링크를 지원하는 이동 Ad Hoc 라우팅 프로토콜)

  • Lee, Kwang-Bae;Kim, Hyun-Ug;Jung, Kun-Won
    • Journal of IKEEE
    • /
    • v.5 no.1 s.8
    • /
    • pp.59-66
    • /
    • 2001
  • In this paper, we propose a dynamic source routing protocol supporting asymmetric paths for mobile Ad Hoc networks that contain unidirectional links. At present, the existing dynamic source routing protocol supports only symmetric paths that consist of bidirectional links for routing. However, in fact, there can exist unidirectional links due to asymmetric property of mobile terminals or current wireless environment. Thus, we implement a mobile Ad Hoc routing protocol supporting unidirectional links, which is fit for more general wireless environment. Especially, the proposed protocol uses an improved multipath maintenance method in order to perform rapid route reconfiguration when route error due to mobility is detected. For performance evaluation, we consider the following factors: average route discovery time and average data reception rate. In order to obtain the factors of the performance evaluation, we did simulation for 900 seconds in the step of 100 seconds with the following scenarios: usage of route cache in intermediate nodes on route path, and different mobility/connection patterns.

  • PDF

2-Level Adaptive Branch Prediction Based on Set-Associative Cache (세트 연관 캐쉬를 사용한 2단계 적응적 분기 예측)

  • Shim, Won
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.497-502
    • /
    • 2002
  • Conditional branches can severely limit the performance of instruction level parallelism by causing branch penalties. 2-level adaptive branch predictors were developed to get accurate branch prediction in high performance superscalar processors. Although 2 level adaptive branch predictors achieve very high prediction accuracy, they tend to be very costly. In this paper, set-associative cached correlated 2-level branch predictors are proposed to overcome the cost problem in conventional 2-level adaptive branch predictors. According to simulation results, cached correlated predictors deliver higher prediction accuracy than conventional predictors at a significantly lower cost. The best misprediction rates of global and local cached correlated predictors using set-associative caches are 5.99% and 6.28% respectively. They achieve 54% and 17% improvements over those of the conventional 2-level adaptive branch predictors.