• Title/Summary/Keyword: 메모리 계층

Search Result 260, Processing Time 0.023 seconds

Dynamic Cache Management Scheme on Demand-Based FTL Considering Data Access Pattern (데이터 접근 패턴을 고려한 요구 기반 FTL 내 캐시의 동적 관리 기법)

  • Lee, Bit-Na;Song, Nae-Young;Koh, Kern
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.547-550
    • /
    • 2011
  • 플래시 메모리는 낮은 전력 소비와 높은 성능으로 인해 휴대용 기기에 널리 사용되고 있다. FTL은 플래시 내 자료를 관리하는 소프트웨어 계층으로 플래시 전체의 성능에 영향을 끼친다. 그 중 페이지 레벨 매핑 기법을 적용한 FTL은 유연성이 높고 속도가 빠르나 주소 변환 테이블의 크기가 큰 단점이 있다. 이를 해결하기 위해 자주 접근되는 영역의 매핑 주소만을 매핑 테이블 캐시에 올려놓는 Demand-based FTL(DFTL)이 제안되었다. DFTL 에서는 CMT(Cache Mapping Table)의 참조율이 떨어지는 경우 빈번한 플래시 메모리 접근 오버헤드가 발생하게 된다. 이러한 문제는 흔히 발생하는 일반적인 순차 접근에서조차 문제가 된다. 이에 본 논문에서는 저장 장치의 접근 패턴을 예측하여 CMT의 참조 엔트리를 미리 읽어오는 기법을 제안한다. 제안하는 기법은 저장 장치 접근 패턴의 순차성을 판단하여 연속된 매핑 주소를 미리 CMT에 올려놓고, 읽어오는 매핑 주소 엔트리의 양은 동적으로 관리한다. 추가적으로 CMT에서 발생하는 스래싱(thrashing) 을 파악하기 위해 쫓겨나는 희생 엔트리의 접근 여부를 분석하여 이를 활용하였다. 실험 결과에서 본 기법은 기존의 DFTL에 비해 약간의 공간 오버헤드와 함께 평균 50% 증가한 참조율을 보였다.

A Parallel Speech Recognition Model on Distributed Memory Multiprocessors (분산 메모리 다중프로세서 환경에서의 병렬 음성인식 모델)

  • 정상화;김형순;박민욱;황병한
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.44-51
    • /
    • 1999
  • This paper presents a massively parallel computational model for the efficient integration of speech and natural language understanding. The phoneme model is based on continuous Hidden Markov Model with context dependent phonemes, and the language model is based on a knowledge base approach. To construct the knowledge base, we adopt a hierarchically-structured semantic network and a memory-based parsing technique that employs parallel marker-passing as an inference mechanism. Our parallel speech recognition algorithm is implemented in a multi-Transputer system using distributed-memory MIMD multiprocessors. Experimental results show that the parallel speech recognition system performs better in recognition accuracy than a word network-based speech recognition system. The recognition accuracy is further improved by applying code-phoneme statistics. Besides, speedup experiments demonstrate the possibility of constructing a realtime parallel speech recognition system.

  • PDF

Resource Sharing Method to Reduce Duplicate Operation Cost of Multiple Spatial Aggregates in u-GIS Environment (u-GIS 환경에서 다중 공간 집계 질의의 중복연산 비용을 감소시키기 위한 자원공유 기법)

  • Seo, Min-ho;Kim, Sang-Ki;Baek, Sung-Ha;Li, Yan;Lee, Dong-Wook;Bae, Hae-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.344-347
    • /
    • 2009
  • 데이터 스트림을 처리하기 위한 연속집계질의 수행 시 중복연산 및 메모리의 절약을 위하여 큐를 공유하는 자원공유기법이 연구되었다. 기존의 자원공유 기법들은 질의의 프리디킷이 일치할 때만 처리하기 때문에, 질의의 프리디킷이 차이가 나는 경우가 많은 다중공간 집계질의가 자주 요청되는 u-GIS 환경에서 효율적으로 중복영역을 처리할 수 있는 자원공유 기법이 요구된다. 본 논문에서는 공간영역을 효율적으로 그룹화하는 R-tree 의 특징을 이용하여 질의간의 중복영역을 그룹화하고 중복영역의 자원을 패인(Pane)구조를 이용하여 공유한다. 노드 수에 제한이 없고 레벨을 1로 하는 R-tree 로 유사한 위치의 질의들을 그룹화 한 후, 그 질의들의 영역이 겹쳐지는 부분을 패인을 이용해 집계 값을 공유하여 중복계산을 피하는 방법이다. 제안 기법은 공간 집계질의를 처리할 수 있고, 기존의 계층구조의 자원공유 기법을 사용할 때에 비해 자원을 적게 사용하고 질의 처리 시간을 단축시켰다. 성능평가를 통하여 제안기법이 메모리 사용량을 감소시키는 것을 보였으며, 질의 처리 속도가 증가하였다.

An Energy-Delay Efficient System with Adaptive Victim Caches (선택적 희생 캐쉬를 이용한 저전력 고성능 시스템 설계 방안)

  • Kim Cheol Hong;Shim Sunghoon;Jhon Chu Shik;Jhang Seong Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.663-674
    • /
    • 2005
  • We propose a system aimed at achieving high energy-delay efficiency by using adaptive victim caches. Particularly, we investigate methods to improve the hit rates in the first level of memory hierarchy, which reduces the number of accesses to mort power consuming memory structures such as L2 cache. Victim cache is a memory element for reducing conflict misses in a direct-mapped L1 cache. We present two techniques to fill the victim cache with the blocks that have higher probability to be re-reqeusted by processor. Hit-based victim cache ks tilled with the blocks which were referenced frequently by processor. Replacement-based victim cache is filled with the blocks which were evicted from the sets where block replacements had happened frequently According to our simulations, replacement-based victim cache scheme outperforms the conventional victim cache scheme about $2\%$ on average and refutes the power consumption by up to $8\%$.

Low Power TLB System by Using Continuous Accessing Distinction Algorithm (연속적 접근 판별 알고리즘을 이용한 저전력 TLB 구조)

  • Lee, Jung-Hoon
    • The KIPS Transactions:PartA
    • /
    • v.14A no.1 s.105
    • /
    • pp.47-54
    • /
    • 2007
  • In this paper we present a translation lookaside buffer (TLB) system with low power consumption for imbedded processors. The proposed TLB is constructed as multiple banks, each with an associated block buffer and a corresponding comparator. Either the block buffer or the main bank is selectively accessed on the basis of two bits in the block buffer (tag buffer). Dynamic power savings are achieved by reducing the number of entries accessed in parallel, as a result of using the tag buffer as a filtering mechanism. The performance overhead of the proposed TLB is negligible compared with other hierarchical TLB structures. For example, the two-cycle overhead of the proposed TLB is only about 1%, as compared with 5% overhead for a filter (micro)-TLB and 14% overhead for a same structure without continuos accessing distinction algorithm. We show that the average hit ratios of the block buffers and the main banks of the proposed TLB are 95% and 5% respectively. Dynamic power is reduced by about 95% with respect to with a fully associative TLB, 90% with respect to a filter-TLB, and 40% relative to a same structure without continuos accessing distinction algorithm.

An Efficient Flash Memory B-Tree Supporting Very Cheap Node Updates (플래시 메모리 B-트리를 위한 저비용 노드 갱신 기법)

  • Lim, Seong-Chae
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.8
    • /
    • pp.706-716
    • /
    • 2016
  • Because of efficient space utilization and fast key search times, B-trees have been widely accepted for the use of indexes in HDD-based DBMSs. However, when the B-ree is stored in flash memory, its costly operations of node updates may impair the performance of a DBMS. This is because the random updates in B-tree's leaf nodes could tremendously enlarge I/O costs for the garbage collecting actions of flash storage. To solve the problem, we make all the parents of leaf nodes the virtual nodes, which are not stored physically. Rather than, those nodes are dynamically generated and buffered by referring to their child nodes, at their access times during key searching. By performing node updates and tree reconstruction within a single flash block, our proposed B-tree can reduce the I/O costs for garbage collection and update operations in flash. Moreover, our scheme provides the better performance of key searches, compared with earlier flash-based B-trees. Through a mathematical performance model, we verify the performance advantages of the proposed flash B-tree.

Dynamic Monitoring Framework and Debugging System for Embedded Virtualization System (가상화 환경에서 임베디드 시스템을 위한 모니터링 프레임워크와 디버깅 시스템)

  • Han, Inkyu;Lim, Sungsoo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.12
    • /
    • pp.792-797
    • /
    • 2015
  • Effective profiling diagnoses the failure of the system and informs risk. If a failure in the target system occurs, it is impossible to diagnose more than one of the exiting tools. In this respect, monitoring of the system based on virtualization is useful. We present in this paper a monitoring framework that uses the characteristics of hardware virtualization to prevent side-effects from a target guest, and uses dynamic binary instrumentation with instruction-level trapping based on hardware virtualization to achieve efficiency and flexibility. We also present examples of some applications that use this framework. The framework provides tracing of guest kernel function, memory dump, and debugging that uses GDB stub with GDB remote protocol. The experimental evaluation of our prototype shows that the monitoring framework incurs at most 2% write memory performance overhead for end users.

Detailed-information Browsing Technology based on Level of Detail for 3D Cultural Asset Data (3D 문화재 데이터의 LOD 기반 상세정보 브라우징 기술)

  • Jung, Jung-Il;Cho, Jin-Soo;WhangBo, Tae-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.10
    • /
    • pp.110-121
    • /
    • 2009
  • In this paper, we propose the new method that offer detailed-information through relax the system memory limitation about 3D model to user. That method based on making LOD(Level of Detail) model from huge 3D data of structure cultural assets. In our method as transformed AOSP algorithm, first of all it create the hierarchical structure space about 3D data, and create the LOD model by surface simplification. Then it extract the ROI(Region of Interest) of user in simplified LOD model, and then do rendering by original model and same surface detailed-information after process the local detailed in extracted region. To evaluate the proposed method, we have some experiment by using the precise 3D scan data of structure cultural assets. Our method can offer the detailed-information same as exist method, and moreover 45% reduced consumption of memory experimentally by forming mesh structure same as ROI of simplified LOD model. So we can check the huge structure cultural assets particularly in general computer environment.

A Tool for On-the-fly Repairing of Atomicity Violation in GPU Program Execution

  • Lee, Keonpyo;Lee, Seongjin;Jun, Yong-Kee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.9
    • /
    • pp.1-12
    • /
    • 2021
  • In this paper, we propose a tool called ARCAV (Atomatic Recovery of CUDA Atomicity violation) to automatically repair atomicity violations in GPU (Graphics Processing Unit) program. ARCAV monitors information of every barrier and memory to make actual memory writes occur at the end of the barrier region or to make the program execute barrier region again. Existing methods do not repair atomicity violations but only detect the atomicity violations in GPU programs because GPU programs generally do not support lock and sleep instructions which are necessary for repairing the atomicity violations. Proposed ARCAV is designed for GPU execution model. ARCAV detects and repairs four patterns of atomicity violations which represent real-world cases. Moreover, ARCAV is independent of memory hierarchy and thread configuration. Our experiments show that the performance of ARCAV is stable regardless of the number of threads or blocks. The overhead of ARCAV is evaluated using four real-world kernels, and its slowdown is 2.1x, in average, of native execution time.

Hierarchical Ring Extension of NUMA Systems using Snooping Protocol (스누핑 프로토콜을 사용하는 NUMA 시스템의 계층적 링 구조로의 확장)

  • Seong, Hyeon-Jung;Kim, Hyeong-Ho;Jang, Seong-Tae;Jeon, Ju-Sik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.11
    • /
    • pp.1305-1317
    • /
    • 1999
  • NUMA 구조는 원격 메모리에 대한 접근이 불가피한 구조적 특성 때문에 상호 연결망이 성능을 좌우하는 큰 변수가 된다. 기존에 대중적으로 사용되던 버스는 물리적 확장성 및 대역폭에서 대규모 시스템을 구성하는 데 한계를 보인다. 이를 대체하는 고속의 지점간 링크를 사용한 링 구조는 버스가 가지는 확장성 및 대역폭의 한계라는 단점을 개선하였으나, 많은 클러스터가 연결되는 경우에는 전송 지연시간이 증가하는 문제점을 가지고 있다. 본 논문에서는 스누핑 프로토콜이 적용된 링 구조에서 클러스터 개수 증가에 따른 지연시간 증가의 문제점을 보완하기 위해 계층적 링 구조로의 확장을 제안하고, 이 구조에 효과적인 캐쉬 일관성 프로토콜을 설계하였다. 전역 링과 지역 링을 연결하는 브리지는 캐쉬 프로토콜을 관리하며 이 프로토콜에 의해 지역 링의 부하를 줄일 수 있도록 트랜잭션을 필터링하는 역할도 담당함으로써 시스템의 성능을 향상시킨다. probability-driven 시뮬레이터를 통해 계층적 링 구조가 시스템의 성능 및 링 이용률에 미치는 영향을 알아본다. Abstract Since NUMA architecture has to access remote memory, interconnection network performance determines performance of NUMA architecture. Bus, which has been used as popular interconnection network of NUMA, has a limit to build a large-scale system because of limited physical scalability and bandwidth. Ring interconnection network, composed of high-speed point-to-point link, made up for bus's defects of scalability and bandwidth. But, it also has problem of increasing delay as the number of clusters is increased. In this paper, we propose a hierarchical expansion of snoop-based ring architecture in order to overcome ring's defects of increasing delay. And we also design an efficient cache coherence protocol adopted to this architecture. Bridge, which connects local ring and global ring, maintains cache coherence protocol and does snoop-filtering which reduces local ring and cluster bus utilization. Therefore bridge can improve performance of this system. We analyze effects of hierarchical architecture on the performance of system and utilization of point-to-point links using probability-driven simulator.