• Title/Summary/Keyword: 캐쉬메모리

Search Result 176, Processing Time 0.075 seconds

Performance Evaluation of Cache Coherence Scheme for Data Allocation Methods (데이타 배치 방식에 따른 캐쉬 일관성 유지 기법의 성능 평가)

  • Lee, Dong-Kwang;Kweon, Hyek-Seong;Ahn, Byoung-Chul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.6
    • /
    • pp.592-598
    • /
    • 2000
  • The locality of data references at the distributed shared memory systems affects the performance significantly. Data allocation methods by considering the locality of data references can improve the performance of DSM systems. This paper evaluates the performance for the dynamic limited directory scheme which data allocation methods can apply very effectively. The information of the data allocation is used by the dynamic limited directory scheme to set the presence bit effectively. And the proper use of the presence bit improves the performance by reducing memory overhead and using directory pool efficiently. Simulations are conducted using three application programs which have various data sharing. The results show that the optimal data allocation method improves the performance up to 3.6 times in the proposed scheme.

  • PDF

Research on Event Mechanism for Reducing Power Overheads in Cache Memory Synchronization (캐시 메모리 동기화 전력 감소를 위한 이벤트 메커니즘에 대한 연구)

  • Pak, Young-Jin;Jeong, Ha-Young;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.3
    • /
    • pp.69-75
    • /
    • 2011
  • In this paper, we propose an anycast event driven synchronization mechanism to reduce power overheads. Our proposed mechanism can reduce unnecessary polling operations on SHI(Snoop Hit Invalidate) or SHR(Snoop Hit Read) states. It prevents waisting bandwidth and reduces power overheads on polling operation. Also it decreases transition power of state change compared to broadcast model. Simulation results indicated that the proposed architecture had about 15.3% of power decrease compared to spin-lock model and about 4.7% of power decrease compared to broadcast model. Overall results indicated that proposed synchronization mechanism could increase power efficiency of multi-core system by reducing power overheads.

Efficient Maximum Intensity Projection using SIMD Instruction and Streaming Memory Transfer (단일 명령 복수 데이터 연산과 순차적 메모리 참조를 이용한 효율적인 최대 휘소 투영 볼륨 가시화)

  • Kye, Hee-Won
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.4
    • /
    • pp.512-520
    • /
    • 2009
  • Maximum intensity projection (MIP) is a volume rendering method which extracts maximum values along the viewing direction through volume data. It visualizes high-density structures, such as angio-graphic datasets so that it is frequently used in medical imaging systems. We have proposed an efficient two-step MIP acceleration method that uses the recent CPUs. First, we exploited SIMD instructions to reduce conditional branch instructions which take up a considerable part of whole rendering process, so that we improved rendering speed. Second, we proposed a new method, which accesses volume and image data successively by modifying the shear-warp rendering. This method improves memory access patterns so that cache misses are reduced. Using the current CPUs, our method improved the rendering speed by a factor of 7 than that of the shear-warp rendering.

  • PDF

Performance Analysis and Enhancing Techniques of Kd-Tree Traversal Methods on GPU (GPU용 Kd-트리 탐색 방법의 성능 분석 및 향상 기법)

  • Chang, Byung-Joon;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.2
    • /
    • pp.177-185
    • /
    • 2010
  • Ray-object intersection is an important element in ray tracing that takes up a substantial amount of computing time. In general, such spatial data structure as kd-tree has been frequently used for static scenes to accelerate the intersection computation. Recently, a few variants of kd-tree traversal have been proposed suitable for the GPU that has a relatively restricted computing architecture compared to the CPU. In this article, we propose yet another two implementation techniques that can improve those previous ones. First, we present a cached stack method that is aimed to reduce the costly global memory access time needed when the stack is allocated to global memory. Secondly, we present a rope-with-short-stack method that eases the substantial memory requirement, often necessary for the previous rope method. In order to show the effectiveness of our techniques, we compare their performances with those of the previous GPU traversal methods. The experimental results will provide prospective GPU ray tracer developers with valuable information, helping them choose a proper kd-tree traversal method.

Real-Time Implementation of the EHSX Speech Coder Using a Floating Point DSP (부동 소수점 DSP를 이용한 4kbps EHSX 음성 부호화기의 실시간 구현)

  • 이인성;박동원;김정호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.420-427
    • /
    • 2004
  • This paper presents real time implementation of 4kbps EHSX (Enhanced Harmonic Stochastic Excitation) speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding for voiced frames and used the vector excitation coding with the structure of analysis-by-synthesis for unvoiced frames, respectively. For transition frames mixed with voiced and unvoiced signal, we use the time-separated transition coding. In this paper. we present the optimization methods of implementation speech coder on the EMS320C6701/sup (R)/ DSP. To reduce the complex for real-time implementation. we perform the optimization method in algorithm by replacing the complex sinusoidal synthesis method with IFFT. and we apply fully pipelines hand assembly coding after converting it from floating source to fixed source. To generate a more efficient code. we also make use or the available EMS320C6701/sup (R)/ resources such as Fastest67x library and memory organization.

Design of an Asynchronous Instruction Cache based on a Mixed Delay Model (혼합 지연 모델에 기반한 비동기 명령어 캐시 설계)

  • Jeon, Kwang-Bae;Kim, Seok-Man;Lee, Je-Hoon;Oh, Myeong-Hoon;Cho, Kyoung-Rok
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.3
    • /
    • pp.64-71
    • /
    • 2010
  • Recently, to achieve high performance of the processor, the cache is splits physically into two parts, one for instruction and one for data. This paper proposes an architecture of asynchronous instruction cache based on mixed-delay model that are DI(delay-insensitive) model for cache hit and Bundled delay model for cache miss. We synthesized the instruction cache at gate-level and constructed a test platform with 32-bit embedded processor EISC to evaluate performance. The cache communicates with the main memory and CPU using 4-phase hand-shake protocol. It has a 8-KB, 4-way set associative memory that employs Pseudo-LRU replacement algorithm. As the results, the designed cache shows 99% cache hit ratio and reduced latency to 68% tested on the platform with MI bench mark programs.

Cache System Design of Compressed Texture for High Performance Texture Mapping (고성능 텍스쳐 매핑을 위한 압축된 텍스쳐의 캐쉬 시스템 설계)

  • 양진기;박우찬;한탁돈
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.39-41
    • /
    • 1998
  • 보다 현실적인 3차원 영상을 얻기 위한 텍스쳐 매핑은 대부분의 그래픽 시스템에서사용한다. 3차원 그래픽 시스템이 생성한 객체의 표면 위에 2차원 이미지를 입힘으로써 그래픽 시스템의 성능저하를 가져오지 않으면서 영상의 현실성을 높이는 텍스쳐 매핑은 텍스쳐 이미지를 저장하기 위해 많음 메로리가 요구되면 고성능 텍스쳐 시스템을 위해 빠른 메로리 접근과 광대한 대역폭이 요구된다. 본 논문에서는 벡터 양자와(Vector quantization) 압축기법을 이용하여 텍스쳐 이미지에 대한 효율적인 압축을 통해 많은 메모리 요구를 해결하며 압축된 텍스쳐 이미지의 효율적인 캐싱을 통해 빠른 메로리 접근과 광대한 대역폭 문제를 해결할 수 있는 구조를 제시한다. 본 논문에서 제안된 구조는 버퍼링을 통해 메로리 접근 시간을 숨김으로써 고성능 텍스쳐 시스템을 지원할 수 있다.

  • PDF

A Study on Large Data File Management Using Buffer Cache and Virtual Memory File (가상메모리 화일과 버퍼캐쉬를 이용한 대형 데이타 화일의 처리에 관한 연구)

  • Kim, Byeong-Chul;Shin, Byeong-Seok;Hwang, Hee-Yeung
    • Proceedings of the KIEE Conference
    • /
    • 1991.11a
    • /
    • pp.185-188
    • /
    • 1991
  • In this paper we have designed and implemented a method of using extended memory and hard disk space as a data buffer for application programs to allow handling of large data files in DOS environment. We use a part of the conventional DOS memory as a buffer cache which allows the application program to use extended memory and hard disks transparently. Using buffer cache also allows some speed improvement for the application program. We have also implemented a number of functions to allow easier handling of pointer operations used by application programs.

  • PDF

Analysis of Barrier Waiting Times in Data Parallel Programs (데이터 병렬 프로그램에서 배리어 대기시간의 분석)

  • Jung, In-Bum
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.73-80
    • /
    • 2001
  • Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier than others should wait at the barrier, the total processor utilization decreases. In this paper, to find the sources of the barrier waiting time, parallel programs are executed on the various grain sizes through execution-driven simulations. In simulation studies, we found that even if approximately equal amounts of work are distributed to each processor, all processes may not arrive at a barrier at the same time. The reasons are that the different numbers of cache misses and instructions within partitioned grains result in the difference in arrival time of processors at the barrier.

  • PDF

A Trie-Based Model for RDF/XML Storage (RDF/XML 저장을 위한 Trie 기반 모델)

  • Bisai, Sumit;Kim, Ju-Ri;Lee, Hyun-Chang;Han, Sung-Kook
    • 한국IT서비스학회:학술대회논문집
    • /
    • 2009.05a
    • /
    • pp.384-387
    • /
    • 2009
  • 인터넷상에서 데이터 양은 빠른 속도로 증가하고 있으며, 더불어 RDF/XML 문서 크기 또한 상당히 크게 양산되고 있다. 이러한 상황은 효율적인 저장 모델을 요구하고 있으며, 저장된 데이터에 대해 효율적으로 질의를 처리할 수 있어야 한다. 이와 같은 많은 데이터를 저장하고 처리하기 위한 도구들이 존재하지만 이들 대부분은 과도한 조인연산, 데이터 및 스키마 관리에 있어서 문제점들을 안고 있다. 이에 본 연구에서는 문제점을 최소화하기 위해 Trie와 벡터(vector)를 사용하여 데이터를 저장할 수 있는 모델을 제안하며, 다음과 같은 내용을 중심으로 살펴본다. 먼저, 과도한 조인 연산을 피하며, 데이터를 지역화(localized) 및 캐쉬(cached)되게 저장한다. RDF 그래프를 상하로 이동하는 복잡도를 동일하게 하며, 스키마 정보와 데이터의 의미를 함께 저장하고 메모리 내(in-memory) 실행 문제 등을 해결한다. 이렇게 함으로서 의미 추출과 결과를 확대 시키는데 소요되는 시간을 최적화시킬 수 있다.

  • PDF