Search | Korea Science

An Improved Dynamic Branch Predictor by Selective Access of a Specific Element in 4-Way Cache (4-Way 캐쉬의 선택된 Element를 이용한 향상된 동적 분기 예측기 구현)

Hwang, In-Sung;Hwang, Sun-Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.38A no.12
- /
- pp.1094-1101
- /
- 2013
This paper proposes an improved branch predictor that reduces the number execution cycles of applications by selectively accessing a specific element in 4-way associative cache. When a branch instruction is fetched, the proposed branch predictor acquires a branch target address from the selected element in the cache by referring to MRU buffer. Branch prediction rate and application execution speed are considerably improved by increasing the number of BTAC entries in restricted power condition, when compared with that of previous branch predictor which accesses all elements. The effectiveness of the proposed dynamic branch predictor is verified by executing benchmark applications on the core simulator. Experimental results show that number of execution cycles decreases by an average of 10.1%, while power consumption increases an average of 7.4%, when compared to that of a core without a dynamic branch predictor. Execution cycles are reduced by 4.1% in comparison with a core which employs previous dynamic branch predictor.
https://doi.org/10.7840/kics.2013.38A.12.1094 인용 PDF KSCI

Cache Coherency Schemes for Database Sharing Systems with Primary Copy Authority (주사본 권한을 지원하는 공유 데이터베이스 시스템을 위한 캐쉬 일관성 기법)

Kim, Shin-Hee;Cho, Haeng-Rae;Kim, Byeong-Uk
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.6
- /
- pp.1390-1403
- /
- 1998
Database sharing system (DSS) refers to a system for high performance transaction processing. In DSS, the processing nodes are locally coupled via a high speed network and share a common database at the disk level. Each node has a local memory, a separate copy of operating system, and a DB'\fS. To reduce the number of disk accesses, the node caches database pages in its local memory buffer. However, since multiple nodes may be simultaneously cached a page, cache consistency must be cnsured so that every node can always access the'latest version of pages. In this paper, we propose efficient cache consistency schemes in DSS, where the database is logically partitioned using primary copy authority to reduce locking overhead, The proposed schemes can improve performance by reducing the disk access overhead and the message overhead due to maintaining cache consistency, Furthermore, they can show good performance when database workloads are varied dynamically.
PDF

Analytical Models and their Performance Analysis of Superscalar Processors (수퍼스칼라 프로세서의 해석적 모델 및 성능 분석)

Kim, Hak-Jun;Kim, Seon-Mo;Choe, Sang-Bang
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.7
- /
- pp.847-862
- /
- 1999
본 논문에서는 유한버퍼의(finite-buffered) 동기화된(synchronous) 큐잉모델(queueing model)을 이용하여 명령어들간의 병렬성, 분기명령의 빈도수, 분기예측(branch prediction)의 정확도, 캐쉬미스 등의 파라미터들을 고려하여 프로세서의 명령어 실행율을 예측하며 캐쉬의 성능과 파이프라인 성능간의 관계를 분석할 수 있는 새로운 해석적 모델을 제안하였다. 해석적 모델은 모델의 타당성을 검증하기 위해서 시뮬레이션을 수행하여 얻은 결과와 비교하였다. 해석적 모델과 시뮬레이션을 비교한 결과 대부분 10% 오차 내에서 일치하였다. 본 연구를 통하여 얻은 해석적 모델을 사용하면 시뮬레이션에서는 드러나지 않는 성능제약의 원인에 대한 명확한 규명이 가능하기 때문에 성능향상을 위한 설계자료를 얻을 수 있으며, 시스템 성능 밸런스를 위한 캐쉬와 비순차이슈 파이프라인 성능간의 관계에 대한 정확한 분석이 가능하다.Abstract This research presents a novel analytic model to predict the instruction execution rate of superscalar processors using the queuing model with finite-buffer size and synchronous operation mode. The proposed model is also able to analyze the performance relationship between cache and pipeline. The proposed model takes into account various kinds of architectural parameters such as instruction-level parallelism, branch probability, the accuracy of branch prediction, cache miss, and etc.. To prove the correctness of the model, we performed extensive simulations and compared the results with the analytic model. Simulation results showed that the proposed model can estimate the average execution rate accurately within 10% error compared to simulation results. The proposed model can explain the causes of performance bottleneck which cannot be uncovered by the simulation method only. The model is also able to show the effect of the cache miss on the performance of out-of-order issue superscalar processors, which can provide an valuable information in designing a balanced system.

A MAC System Design for High-speed UWB SoC (고속 UWB SoC의 MAC 시스템 설계)

Kim, Do-Hoon;Wee, Jeong-Wook;Lee, Chung-Yong
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.48 no.4
- /
- pp.1-5
- /
- 2011
We present the implementation of MAC system for MBOA UWB SoC. The implemented MBOA MAC algorithm is not master control mechanism, but distributed network mechanism. Therefore, mesh network can be easily constructed because MAC consists of distributed network and administrates network. The ARM926EJ with cache is adopted for high performnace and AMBA bus is applied for system design and reuse. In addition, the system operating clock management algorithm is implemented for low power consumption. The dedicated DMA for MAC is designed between the system memory buffer and MAC hardware, and the dedicated DMA for USB 2.0 is also implemented between system memory buffer and host for high data transaction.
PDF KSCI

Buffer Cache Management for Low Power Consumption (저전력을 위한 버퍼 캐쉬 관리 기법)

Lee, Min;Seo, Eui-Seong;Lee, Joon-Won
- Journal of KIISE:Computer Systems and Theory
- /
- v.35 no.6
- /
- pp.293-303
- /
- 2008
As the computing environment moves to the wireless and handheld system, the power efficiency is getting more important. That is the case especially in the embedded hand-held system and the power consumed by the memory system takes the second largest portion in overall. To save energy consumed in the memory system we can utilize low power mode of SDRAM. In the case of RDRAM, nap mode consumes less than 5% of the power consumed in active or standby mode. However hardware controller itself can't use this facility efficiently unless the operating system cooperates. In this paper we focus on how to minimize the number of active units of SDRAM. The operating system allocates its physical pages so that only a few units of SDRAM need to be activated and the unnecessary SDRAM can be put into nap mode. This work can be considered as a generalized and system-wide version of PAVM(Power-Aware Virtual Memory) research. We take all the physical memory into account, especially buffer cache, which takes an half of total memory usage on average. Because of the portion of buffer cache and its importance, PAVM approach cannot be robust without taking the buffer cache into account. In this paper, we analyze the RAM usage and propose power-aware page allocation policy. Especially the pages mapped into the process' address space and the buffer cache pages are considered. The relationship and interactions of these two kinds of pages are analyzed and exploited for energy saving.
PDF KSCI

Flash memory system with spatial smart buffer for the substitution of a hard-disk (하드디스크 대용을 위한 공간적 스마트 버퍼 플래시 메모리 시스템)

Jung, Bo-Sung;Jung, Jung-Hoon
- Journal of the Korea Society of Computer and Information
- /
- v.14 no.3
- /
- pp.41-49
- /
- 2009
Flash memory has become increasingly requestion for the importance and the demand as a storage due to its low power consumption, cheap prices and large capacity medium. This research is to design a high performance flash memory structure for the substitution of a hard-disk by dynamic prefetching of aggressive spatial locality from the spatial smart buffer system. The proposed buffer system in a NAND flash memory consists of three parts, i.e., a fully associative victim buffer for temporal locality, a fully associative spatial buffer for spatial locality, and a dynamic fetching unit. We proposed new dynamic prefetching algorithm for aggressive spatial locality. That is to use the flash memory instead of the hard disk, the proposed flash system can achieve better performance gain by overcoming many drawbacks of the flash memory by the new structure and the new algorithm. According to the simulation results, compared with the smart buffer system, the average miss ratio is reduced about 26% for Mediabench applications. The average memory access times are improved about 35% for Mediabench applications, over 30% for Spec2000 applications.
https://doi.org/10.9708/jksci.2009.14.3.041 인용 PDF

Cache System Design of Compressed Texture for High Performance Texture Mapping (고성능 텍스쳐 매핑을 위한 압축된 텍스쳐의 캐쉬 시스템 설계)

양진기;박우찬;한탁돈
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10a
- /
- pp.39-41
- /
- 1998
보다 현실적인 3차원 영상을 얻기 위한 텍스쳐 매핑은 대부분의 그래픽 시스템에서사용한다. 3차원 그래픽 시스템이 생성한 객체의 표면 위에 2차원 이미지를 입힘으로써 그래픽 시스템의 성능저하를 가져오지 않으면서 영상의 현실성을 높이는 텍스쳐 매핑은 텍스쳐 이미지를 저장하기 위해 많음 메로리가 요구되면 고성능 텍스쳐 시스템을 위해 빠른 메로리 접근과 광대한 대역폭이 요구된다. 본 논문에서는 벡터 양자와(Vector quantization) 압축기법을 이용하여 텍스쳐 이미지에 대한 효율적인 압축을 통해 많은 메모리 요구를 해결하며 압축된 텍스쳐 이미지의 효율적인 캐싱을 통해 빠른 메로리 접근과 광대한 대역폭 문제를 해결할 수 있는 구조를 제시한다. 본 논문에서 제안된 구조는 버퍼링을 통해 메로리 접근 시간을 숨김으로써 고성능 텍스쳐 시스템을 지원할 수 있다.
PDF

A Study on Large Data File Management Using Buffer Cache and Virtual Memory File (가상메모리 화일과 버퍼캐쉬를 이용한 대형 데이타 화일의 처리에 관한 연구)

Kim, Byeong-Chul;Shin, Byeong-Seok;Hwang, Hee-Yeung
- Proceedings of the KIEE Conference
- /
- 1991.11a
- /
- pp.185-188
- /
- 1991
In this paper we have designed and implemented a method of using extended memory and hard disk space as a data buffer for application programs to allow handling of large data files in DOS environment. We use a part of the conventional DOS memory as a buffer cache which allows the application program to use extended memory and hard disks transparently. Using buffer cache also allows some speed improvement for the application program. We have also implemented a number of functions to allow easier handling of pointer operations used by application programs.
PDF

Design of the Pipelined Scan Conversion Unit based on Tile Traversal Method for High Performance 3D Graphics Accelerator (고성능 3차원 그래픽 가속기를 위한 타일 트래버설 방식의 파이프라인된 스캔 컨버젼 유닛 설계)

전원호;최문희;박우찬;한탁돈;김신덕
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.10c
- /
- pp.16-18
- /
- 2001
3차원 영상을 처리하는데 있어 래스터라이제이션은 프레임 버퍼에 저장될 픽셀을 구하는 과정이다. 여러 개의 픽셀로 구성되는 폴리곤을 렌더링하기 위해서 스캔라인 방식 또는 반 평면 함수를 이용한 타일 트래버설 방식 등이 사용되고 있다. 본 논문에서 기반으로 하고 있는 타일 트래버설 방식은 스캔라인 방식에 비해 메모리 효율 및 텍스쳐 캐쉬의 지역성에서 이점을 가지고 있으나 복잡한 탐색 과정 때문에 파이프라인 구조로 구현하기는 어렵다. 본 논문에서 제안하는 구조는 분기 예측 기법을 적용하여 트래버설 과정에서의 분기로 인해 발생되는 파이프라인 지연을 기존의 트래버설 구조에 비해 약 30% 정도 줄임으로써 고성능 3차원 그래픽 가속기에 적합한 스캔 컨버젼 유닛을 제안하였다
PDF

Efficient Techniques for Software RAID of SAN based on shared Storage systems (SAN기반 공유 저장 장치 시스템에서 고성능 소프트웨어 RAID를 위한 기법)

김경호;황주영;안철우;박규호
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.10c
- /
- pp.55-57
- /
- 2001
본 논문에서는 최근 이슈가 되고 있는 Fibre Channel과 같은 네트워크를 이용한 공유 디스크 시스템에서 다수의 호스트가 소프웨어 RAID\ulcorner 4 흑은 5를 이용할 때, 디스크 공유에 의해 발생하는 패리티 블록 일관성 문제를 다룬다. 본 논문에서는 XDWRITE을 이용해 Software HAID에서의 일관성 문제 해결과 디스크 1/0수 감소에 대해 설명한다. 또한, 버퍼캐쉬에서 파일 블록 쓰기 발생시 원본의 복사본을 이용하여 패리티 계산을 위한 디스크 읽기의 수를 감소 시키는 방법에 대해 제안한다. 본 논문에서 제안한 방식은 기존의 방식인 보다 1.5배에서 2배 디스크 1/0의 수를 감소시키는 것을 알 수 있다.
PDF

Search Result 70, Processing Time 0.04 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)