• Title/Summary/Keyword: 캐시메모리

Search Result 242, Processing Time 0.026 seconds

Analysis of the Design Factors in NUMA-aware Scheduler (NUMA 기반의 스케줄러 설계를 위한 고려사항 분석)

  • Kim, Junghoon;Min, Changwoo;Eom, Young Ik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.195-196
    • /
    • 2012
  • 하드웨어 플랫폼은 다수 코어 아키텍처의 메모리 대역폭을 만족시키기 위해 NUMA 구조로 설계되고 있다. 이러한 NUMA 구조에서 다른 노드의 메모리에 접근할 경우, 해당 노드의 메모리 접근에 비해 1.5~2배 지연이 발생한다. 따라서 이러한 특성을 고려하는 NUMA 시스템 기반 스케줄러가 필요하다. 본 논문에서는 NUMA 기반 스케줄러 설계를 위해 고려되어야 할 사항에 대해 분석해 본다. 분석 결과, 공유 자원 경쟁과 리모트 접근을 최소화하는 것이 NUMA 스케줄러 설계의 핵심이라는 것을 확인할 수 있었다. 뿐만 아니라 같은 노드에서 실행되는 워크로드의 조합 및 캐시 오염 태스크 관리, 그리고 노드별 남아있는 메모리 정보 또한 고려되어야 한다는 것을 확인할 수 있었다.

Data Replication and Migration Scheme for Load Balancing in Distributed Memory Environments (분산 인-메모리 환경에서 부하 분산을 위한 데이터 복제와 이주 기법)

  • Choi, Kitae;Yoon, Sangwon;Park, Jaeyeol;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.44-49
    • /
    • 2016
  • Recently, data has been growing dramatically along with the growth of social media and digital devices. A distributed memory processing system has been used to efficiently process large amounts of data. However, if a load is concentrated in a certain node in distributed environments, a node performance significantly degrades. In this paper, we propose a load balancing scheme to distribute load in a distributed memory environment. The proposed scheme replicates hot data to multiple nodes for managing a node's load and migrates the data by considering the load of the nodes when nodes are added or removed. The client reduces the number of accesses to the central server by directly accessing the data node through the metadata information of the hot data. In order to show the superiority of the proposed scheme, we compare it with the existing load balancing scheme through performance evaluation.

A Cache Managing Strategy for Fast Media Data Access (미디어 데이터의 빠른 참조를 위한 캐시 운영 전략)

  • Moon, Hyun-Ju;Kim, Suk-il
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.11-20
    • /
    • 2004
  • Multimedia data processing in streaming pattern contains high spatial locality and low temporal locality. This paper has proposed a dynamic data prefetching scheme that fully exploits the regularity between memory addresses referred consecutively. Compared to the existing data Prefetching scheme, the Proposed scheme can reduce data Prefetching error when an application divides an way into smaller blocks and processes them block by block. Experimental results on various media benchmark programs show the proposed scheme predicts memory addresses more accurately and results in better performance than existing prefetching schemes.

Dynamic Cache Management Scheme on Demand-Based FTL Considering Data Access Pattern (데이터 접근 패턴을 고려한 요구 기반 FTL 내 캐시의 동적 관리 기법)

  • Lee, Bit-Na;Song, Nae-Young;Koh, Kern
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.547-550
    • /
    • 2011
  • 플래시 메모리는 낮은 전력 소비와 높은 성능으로 인해 휴대용 기기에 널리 사용되고 있다. FTL은 플래시 내 자료를 관리하는 소프트웨어 계층으로 플래시 전체의 성능에 영향을 끼친다. 그 중 페이지 레벨 매핑 기법을 적용한 FTL은 유연성이 높고 속도가 빠르나 주소 변환 테이블의 크기가 큰 단점이 있다. 이를 해결하기 위해 자주 접근되는 영역의 매핑 주소만을 매핑 테이블 캐시에 올려놓는 Demand-based FTL(DFTL)이 제안되었다. DFTL 에서는 CMT(Cache Mapping Table)의 참조율이 떨어지는 경우 빈번한 플래시 메모리 접근 오버헤드가 발생하게 된다. 이러한 문제는 흔히 발생하는 일반적인 순차 접근에서조차 문제가 된다. 이에 본 논문에서는 저장 장치의 접근 패턴을 예측하여 CMT의 참조 엔트리를 미리 읽어오는 기법을 제안한다. 제안하는 기법은 저장 장치 접근 패턴의 순차성을 판단하여 연속된 매핑 주소를 미리 CMT에 올려놓고, 읽어오는 매핑 주소 엔트리의 양은 동적으로 관리한다. 추가적으로 CMT에서 발생하는 스래싱(thrashing) 을 파악하기 위해 쫓겨나는 희생 엔트리의 접근 여부를 분석하여 이를 활용하였다. 실험 결과에서 본 기법은 기존의 DFTL에 비해 약간의 공간 오버헤드와 함께 평균 50% 증가한 참조율을 보였다.

A Cache-based Reconfigurable Accelerator in Die-stacked DRAM (3차원 구조 DRAM의 캐시 기반 재구성형 가속기)

  • Kim, Yongjoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.2
    • /
    • pp.41-46
    • /
    • 2015
  • The demand on low power and high performance system is soaring due to the extending of mobile and small electronic device market. The 3D die-stacking technology is widely studying for next generation integration technology due to its high density and low access time. We proposed the 3D die-stacked DRAM including a reconfigurable accelerator in a logic layer of DRAM. Also we discuss and suggest a cache-based local memory for a reconfigurable accelerator in a logic layer. The reconfigurable accelerator in logic layer of 3D die-stacked DRAM reduces the overhead of data management and transfer due to the characteristics of its location, so that can increase the performance highly. The proposed system archives 24.8 speedup in maximum.

Small Active Command Design for High Density DRAMs

  • Lee, Kwangho;Lee, Jongmin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.11
    • /
    • pp.1-9
    • /
    • 2019
  • In this paper, we propose a Small Active Command scheme which reduces the power consumption of the command bus to DRAM. To do this, we target the ACTIVE command, which consists of multiple packets, containing the row address that occupies the largest size among the addresses delivered to the DRAM. The proposed scheme identifies frequently referenced row addresses as Hot pages first, and delivers index numbers of small caches (tables) located in the memory controller and DRAM. I-ACTIVE and I-PRECHARGE commands using unused bits of existing DRAM commands are added for index number transfer and cache synchronization management. Experimental results show that the proposed method reduces the command bus power consumption by 20% and 8.1% on average in the close-page and open-page policies, respectively.

Energy and Performance-Efficient Dynamic Load Distribution for Mobile Heterogeneous Storage Devices (에너지 및 성능 효율적인 이종 모바일 저장 장치용 동적 부하 분산)

  • Kim, Young-Jin;Kim, Ji-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.4
    • /
    • pp.9-17
    • /
    • 2009
  • In this paper, we propose a dynamic load distribution technique at the operating system level in mobile storage systems with a heterogeneous storage pair of a small form-factor and disk and a flash memory, which aims at saving energy consumption as well as enhancing I/O performance. Our proposed technique takes a combinatory approach of file placement and buffer cache management techniques to find how the load can be distributed in an energy and performance-aware way for a heterogeneous mobile storage air of a hard disk and a flash memory. We demonstrate that the proposed technique provides better experimental results with heterogeneous mobile storage devices compared with the existing techniques through extensive simulations.

A architecture for parallel rendering processor with by effective memory organization (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 구조)

  • Kim, Kyung-Su;Yoon, Duk-Ki;Kim, Il-San;Park, Woo-Chan
    • Journal of Korea Game Society
    • /
    • v.5 no.3
    • /
    • pp.39-47
    • /
    • 2005
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simulaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that proposed architecture achieves almost linear speedup at best case even in sixteen rasterizer

  • PDF

Efficient Implementation of SVM-Based Speech/Music Classifier by Utilizing Temporal Locality (시간적 근접성 향상을 통한 효율적인 SVM 기반 음성/음악 분류기의 구현 방법)

  • Lim, Chung-Soo;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.149-156
    • /
    • 2012
  • Support vector machines (SVMs) are well known for their pattern recognition capability, but proper care should be taken to alleviate their inherent implementation cost resulting from high computational intensity and memory requirement, especially in embedded systems where only limited resources are available. Since the memory requirement determined by the dimensionality and the number of support vectors is generally too high for a cache in embedded systems to accomodate, frequent accesses to the main memory occur inevitably whenever the cache is not able to provide requested data to the processor. These frequent accesses to the main memory result in overall performance degradation and increased energy consumption because a memory access typically takes longer and consumes more energy than a cache access or a register access. In this paper, we propose a technique that reduces the number of main memory accesses by optimizing the data access pattern of the SVM-based classifier in such a way that the temporal locality of the accesses increases, fully utilizing data loaded into the processor chip. With experiments, we confirm the enhancement made by the proposed technique in terms of the number of memory accesses, overall execution time, and energy consumption.

Cache Replacement Policy Based on Dynamic Counter for High Performance Processor (고성능 프로세서를 위한 카운터 기반의 캐시 교체 알고리즘)

  • Jung, Do Young;Lee, Yong Surk
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.4
    • /
    • pp.52-58
    • /
    • 2013
  • Replacement policy is one of the key factors determining the effectiveness of a cache. The LRU replacement policy has remained the standard for caches for many years. However, the traditional LRU has ineffective performance in zero-reuse line intensive workloads, although it performs well in high temporal locality workloads. To address this problem, We propose a new replacement policy; DCR(Dynamic Counter based Replacement) policy. A temporal locality of workload dynamically changes across time and DCR policy is based on the detection of these changing. DCR policy improves cache miss rate over a traditional LRU policy, by as much as 2.7% at maximum and 0.47% at average.