• Title/Summary/Keyword: Cache hierarchy

Search Result 21, Processing Time 0.021 seconds

A Block Replacement Scheme using Analytic Hierarchy Process in Hybrid HDD (하이브리드 하드디스크에서 AHP를 적용한 블록 교체 기법)

  • Kim, Jeong-Won
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.5
    • /
    • pp.45-52
    • /
    • 2015
  • The read performance of hybrid hard disk is better than the legacy hard disk and power consumption is also considerably low. As blocks with enough localities may be located in the non-volatile cache whose size is generally limited, an effective block replacement scheme is required. As this replacement is inevitably affected by various parameters, we define this issue as a kind of multiple criteria decision model. To solve this problem, this paper suggests a new block replacement algorithm based on the analytic hierarchy process. Through simulation for our model, we confirmed that the proposed model could be used as a replacement algorithm of the hybrid hard disk as it may improve boot time as well as response time of general applications.

Peducing the Overhead of Virtual Address Translation Process (가상주소 변환 과정에 대한 부담의 줄임)

  • U, Jong-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.1
    • /
    • pp.118-126
    • /
    • 1996
  • Memory hierarchy is a useful mechanism for improving the memory access speed and making the program space larger by layering the memories and separating program spaces from memory spaces. However, it needs at least two memory accesses for each data reference : a TLB(Translation Lookaside Buffer) access for the address translation and a data cache access for the desired data. If the cache size increases to the multiplication of page size and the cache associativity, it is difficult to access the TLB with the cache in parallel, thereby making longer the critical timing path in the processor. To achieve such parallel accesses, we present the hybrid mapped TLB which combines a direct mapped TLB with a very small fully-associative mapped TLB. The former can reduce the TLB access time. while the latter removes the conflict misses from the former. The trace-driven simulation shows that under given workloads the proposed TLB is effective even when a fully-associative mapped TLB with only four entries is added because the effects of its increased misses are offset by its speed benefits.

  • PDF

Design of Embedded Processor Architecture Applicable to Mobile Multimedia (Mobile Multimedia 지원을 위한 Embedded Processor 구조 설계)

  • 이호석;한진호;배영환;조한진
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.5
    • /
    • pp.71-80
    • /
    • 2004
  • This paper describes embedded processor architecture design which is applicable to multimedia in mobile platform The main description is based on basic processor architecture and consideration about energy efficiency when used in mobile platform To design processor data path architecture (pipeline, branch prediction, multiple issue superscalar, function unit number) which is optimal to multimedia application and cache hierarchy and its structure, we have nut the simulation with variant architecture using MPEG4 test bench as multimedia application. We analyzed energy efficiency of architecture to check if it is applicable to mobile platform and decide basic processor architecture based on analysis result. The suggested basic processor architecture not only can be applied to mobile platform but also can be applied to basic processor architecture of configurable processor which is designed through automatic design environment.

Memory Hierarchy Optimization in Embedded Systems using On-Chip SRAM (On-Chip SRAM을 이용한 임베디드 시스템 메모리 계층 최적화)

  • Kim, Jung-Won;Kim, Seung-Kyun;Lee, Jae-Jin;Jung, Chang-Hee;Woo, Duk-Kyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.2
    • /
    • pp.102-110
    • /
    • 2009
  • The memory wall is the growing disparity of speed between CPU and memory outside the CPU chip. An economical solution is a memory hierarchy organized into several levels, such as processor registers, cache, main memory, disk storage. We introduce a novel memory hierarchy optimization technique in Linux based embedded systems using on-chip SRAM for the first time. The optimization technique allocates On-Chip SRAM to the code/data that selected by programmers by using virtual memory systems. Experiments performed with nine applications indicate that the runtime improvements can be achieved by up to 35%, with an average of 14%, and the energy consumption can be reduced by up to 40%, with an average of 15%.

Application Behavior-oriented Adaptive Remote Access Cache in Ring based NUMA System (링 구조 NUMA 시스템에서 적응형 다중 그레인 원격 캐쉬 설계)

  • 곽종욱;장성태;전주식
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.9
    • /
    • pp.461-476
    • /
    • 2003
  • Due to the implementation ease and alleviation of memory bottleneck effect, NUMA architecture has dominated in the multiprocessor systems for the past several years. However, because the NUMA system distributes memory in each node, frequent remote memory access is a key factor of performance degradation. Therefore, efficient design of RAC(Remote Access Cache) in NUMA system is critical for performance improvement. In this paper, we suggest Multi-Grain RAC which can adaptively control the RAC line size, with respect to each application behavior Then we simulate NUMA system with multi-grain RAC using MINT, event-driven memory hierarchy simulator. and analyze the performance results. At first, with profile-based determination method, we verify the optimal RAC line size for each application and, then, we compare and analyze the performance differences among NUMA systems with normal RAC, with optimal line size RAC, and with multi-grain RAC. The simulation shows that the worst case can be always avoided and results are very close to optimal case with any combination of application and RAC format.

A Study on Energy Conservative Hierarchical Clustering for Ad-hoc Network (애드-혹 네트워크에서의 에너지 보존적인 계층 클러스터링에 관한 연구)

  • Mun, Chang-Min;Lee, Kang-Whan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.12
    • /
    • pp.2800-2807
    • /
    • 2012
  • An ad-hoc wireless network provides self-organizing data networking while they are routing of packets among themselves. Typically multi-hop and control packets overhead affects the change of route of transmission. There are numerous routing protocols have been developed for ad hoc wireless networks as the size of the network scale. Hence the scalable routing protocol would be needed for energy efficient various network routing environment conditions. The number of depth or layer of hierarchical clustering nodes are analyzed the different clustering structure with topology in this paper. To estimate the energy efficient number of cluster layer and energy dissipation are studied based on distributed homogeneous spatial Poisson process with context-awareness nodes condition. The simulation results show that CACHE-R could be conserved the energy of node under the setting the optimal layer given parameters.

Energy-Efficient Instruction Cache Hierarchy for Embedded Processors (임베디드 프로세서를 위한 에너지 효율의 명령어 캐쉬 계층 구조)

  • Kang, Jin-Ku;Lee, In-Hwan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10a
    • /
    • pp.257-260
    • /
    • 2006
  • 계층적 메모리 구조는 성능 향상 이외에도 하위 캐쉬로의 접근을 줄임으로서 전체적인 소비 전력 효율을 높이는 방법으로 사용될 수 있다. 본 논문에서는 임베디드 프로세서의 대표적인 StrongARM의 단일 계층 구조를 대상으로 프로세서에 근접한 명령어 캐쉬를 새로 추가하여 첫 번째와 두 번째 계층의 명령어 캐쉬 크기에 따라 변화하는 소비 전력을 모의실험을 통해 측정하고 두 계층의 명령어 캐쉬 크기에 따른 상호 관계에 대해 알아본다. 직접 사상과 32B의 블록 크기를 갖는 L0 명령어 캐쉬를 삽입하여 에너지 효율이 가장 높은 크기를 찾아보고 효율적 크기에서 소비전력을 측정한 결과 온 칩 구조로 가정한 프로세서 전체의 소비 전력이 최대 약 65%로 감소됨을 볼 수 있으며, L1 명령어 캐쉬가 두 배씩 증가함에 따라 에너지 효율적인 L0 명령어 캐쉬의 크기 또한 두 배씩 증가함을 알 수 있다.

  • PDF

Performance Analysis and Identifying Characteristics of Processing-in-Memory System with Polyhedral Benchmark Suite (프로세싱 인 메모리 시스템에서의 PolyBench 구동에 대한 동작 성능 및 특성 분석과 고찰)

  • Jeonggeun Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.142-148
    • /
    • 2023
  • In this paper, we identify performance issues in executing compute kernels from PolyBench, which includes compute kernels that are the core computational units of various data-intensive workloads, such as deep learning and data-intensive applications, on Processing-in-Memory (PIM) devices. Therefore, using our in-house simulator, we measured and compared the various performance metrics of workloads based on traditional out-of-order and in-order processors with Processing-in-Memory-based systems. As a result, the PIM-based system improves performance compared to other computing models due to the short-term data reuse characteristic of computational kernels from PolyBench. However, some kernels perform poorly in PIM-based systems without a multi-layer cache hierarchy due to some kernel's long-term data reuse characteristics. Hence, our evaluation and analysis results suggest that further research should consider dynamic and workload pattern adaptive approaches to overcome performance degradation from computational kernels with long-term data reuse characteristics and hidden data locality.

  • PDF

Performance Optimization of Numerical Ocean Modeling on Cloud Systems (클라우드 시스템에서 해양수치모델 성능 최적화)

  • JUNG, KWANGWOOG;CHO, YANG-KI;TAK, YONG-JIN
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.3
    • /
    • pp.127-143
    • /
    • 2022
  • Recently, many attempts to run numerical ocean models in cloud computing environments have been tried actively. A cloud computing environment can be an effective means to implement numerical ocean models requiring a large-scale resource or quickly preparing modeling environment for global or large-scale grids. Many commercial and private cloud computing systems provide technologies such as virtualization, high-performance CPUs and instances, ether-net based high-performance-networking, and remote direct memory access for High Performance Computing (HPC). These new features facilitate ocean modeling experimentation on commercial cloud computing systems. Many scientists and engineers expect cloud computing to become mainstream in the near future. Analysis of the performance and features of commercial cloud services for numerical modeling is essential in order to select appropriate systems as this can help to minimize execution time and the amount of resources utilized. The effect of cache memory is large in the processing structure of the ocean numerical model, which processes input/output of data in a multidimensional array structure, and the speed of the network is important due to the communication characteristics through which a large amount of data moves. In this study, the performance of the Regional Ocean Modeling System (ROMS), the High Performance Linpack (HPL) benchmarking software package, and STREAM, the memory benchmark were evaluated and compared on commercial cloud systems to provide information for the transition of other ocean models into cloud computing. Through analysis of actual performance data and configuration settings obtained from virtualization-based commercial clouds, we evaluated the efficiency of the computer resources for the various model grid sizes in the virtualization-based cloud systems. We found that cache hierarchy and capacity are crucial in the performance of ROMS using huge memory. The memory latency time is also important in the performance. Increasing the number of cores to reduce the running time for numerical modeling is more effective with large grid sizes than with small grid sizes. Our analysis results will be helpful as a reference for constructing the best computing system in the cloud to minimize time and cost for numerical ocean modeling.

PMS : Prefetching Strategy for Multi-level Storage System (PMS : 다단계 저장장치를 고려한 효율적인 선반입 정책)

  • Lee, Kyu-Hyung;Lee, Hyo-Jeong;Noh, Sam-Hyuk
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.1
    • /
    • pp.26-32
    • /
    • 2009
  • The multi-level storage architecture has been widely adopted in servers and data centers. However, while prefetching has been shown as a crucial technique to exploit sequentiality in accesses common for such systems and hide the increasing relative cost of disk I/O, existing multi-level storage studies have focused mostly on cache replacement strategies. In this paper, we show that prefetching algorithms designed for single-level systems may have their limitations magnified when applied to multi-level systems. Overly conservative prefetching will not be able to effectively use the lower-level cache space, while overly aggressive prefetching will be compounded across levels and generate large amounts of wasted prefetch. We design and implement a hierarchy-aware lower-level prefetching strategy called PMS(Prefetching strategy for Multi-level Storage system) that applicable to any upper level prefetching algorithms. PMS does not require any application hints, a priori knowledge from the application or modification to the va interface. Instead, it monitors the upper-level access patterns as well as the lower-level cache status, and dynamically adjusts the aggressiveness of the lower-level prefetching activities. We evaluated the PMS through extensive simulation studies using a verified multi-level storage simulator, an accurate disk simulator, and access traces with different access patterns. Our results indicate that PMS dynamically controls aggressiveness of lower-level prefetching in reaction to multiple system and workload parameters, improving the overall system performance in all 32 test cases. Working with four well-known existing prefetching algorithms adopted in real systems, PMS obtains an improvement of up to 35% for the average request response time, with an average improvement of 16.56% over all cases.