Search | Korea Science

Energy-Efficient Instruction Cache Hierarchy for Embedded Processors (임베디드 프로세서를 위한 에너지 효율의 명령어 캐쉬 계층 구조)

Kang, Jin-Ku;Lee, In-Hwan
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.10a
- /
- pp.257-260
- /
- 2006
계층적 메모리 구조는 성능 향상 이외에도 하위 캐쉬로의 접근을 줄임으로서 전체적인 소비 전력 효율을 높이는 방법으로 사용될 수 있다. 본 논문에서는 임베디드 프로세서의 대표적인 StrongARM의 단일 계층 구조를 대상으로 프로세서에 근접한 명령어 캐쉬를 새로 추가하여 첫 번째와 두 번째 계층의 명령어 캐쉬 크기에 따라 변화하는 소비 전력을 모의실험을 통해 측정하고 두 계층의 명령어 캐쉬 크기에 따른 상호 관계에 대해 알아본다. 직접 사상과 32B의 블록 크기를 갖는 L0 명령어 캐쉬를 삽입하여 에너지 효율이 가장 높은 크기를 찾아보고 효율적 크기에서 소비전력을 측정한 결과 온 칩 구조로 가정한 프로세서 전체의 소비 전력이 최대 약 65%로 감소됨을 볼 수 있으며, L1 명령어 캐쉬가 두 배씩 증가함에 따라 에너지 효율적인 L0 명령어 캐쉬의 크기 또한 두 배씩 증가함을 알 수 있다.
PDF

A Selective Recovery Mechanism of Control-Flow Independent Instructions (제어 독립적인 명령어의 선택적 복구 메커니즘)

윤성룡;신영호;조영일
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10c
- /
- pp.715-717
- /
- 2002
최신의 프로세서는 분기명령에 의한 파이프라인 지연을 피하기 위해 분기 예상 기법을 사용하고 있다. 그러나 예측기에서 예상이 잘못된 경우에는 예상한 분기 방향의 명령어들을 무효화시키고 올바른 분기 방향의 명령어들을 다시 반입하여 수행시키므로 서 수행 사이클과 하드웨어 자원을 낭비하게된다. 본 논문에서는 컴파일 시 프로파일링을 통한 정적인 방법과 프로그램상의 제어 흐름을 통해 동적으로 제어 독립적인 명령어를 탐지해서 분기 명령어의 잘못된 예상으로 인해 무효화되는 명령어를 효과적으로 감소시켜 프로세서의 성능을 향상시키는 메커니즘을 제안한다. SPECint95 벤치마크 프로그램에 대해 기존의 방법과 본 논문에서 제안한 방법 사이의 사이클 당 수행된 명령어 수를 분석한 결과, 4-이슈 프로세서에서 2%-7%, 8-이슈 프로세서에서 4%-l5%, 16-이슈 프로세서에서 18%-28%의 성능 향상을 보이고 있다.
PDF

KOMPSAT-2 원격명령어와 텔레메트리 분석

이진호;이나영;이상률;이주진
- Bulletin of the Korean Space Science Society
- /
- 2004.04a
- /
- pp.73-73
- /
- 2004
KOMPSAT-2 위성에서 사용되는 원격명령어와 텔레메트리는 국제표준규격인 CCSDS format을 따르고 있다. 이들 원격명령어와 텔레메트리는 KOMPSAT-1의 heritage에 따라 구성되었으나 테스트 단계를 거치는 동안 여러 가지 형식의 원격명령어와 텔레메트리가 추가되었으며 각 유닛의 프로세서와 탑재컴퓨터간의 충돌을 피하기 위해 그 구현 및 전달 방식도 보다 복잡해졌다. 본 논문에서는 KOMPSAT-2에서 사용되고 있는 원격명령어와 텔레메트리의 각 타입을 분석하고 유닛 별로 구현 및 전달 방식이 어떻게 달라지는 지 보여준다.
PDF

Development of a Method Dynamic Invocation Component for Network Program (네트워크 프로그램용 메소드 동적 호출 컴포넌트 개발)

신봉준;정문상;홍순구
- Proceedings of the Korea Association of Information Systems Conference
- /
- 2004.11a
- /
- pp.29-36
- /
- 2004
많은 기능을 수행하는 네트워크 프로그램은 그 기능 만큼의 명령어들과 명령어 인자들을 주고 받게 되다. 수신된 명령어에 대한 처리는 $"IF\~ELSE"$ 같은 순차적인 비교구문을 사용하거나 자바 RMI같은 원격 메소드 호출방식을 사용하고 있다. 그러나 많은 명령어들을 매번 순차적인 방식으로 비교하는 것과 원격 메소드를 설계하는 방식은 그 구현 및 유지보수에 많은 어려움을 유발하고 있다. 본 논문의 목적은 명령어 수신부와 실행부에 대한 컴포넌트를 개발하여 프로그램 개발 및 유지보수에 들어가는 노력을 줄이고 프로그램의 수행성능을 향상시키기 위한 컴포넌트 개발에 있다.
PDF

Design of Compiler & Variable-Length Instructions for SIMD Structured Shader (가변길이 SIMD구조 쉐이더 명령어 및 컴파일러 설계)

Kwak, Jae-Chang;Park, Tae-Ryoung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.12
- /
- pp.2691-2697
- /
- 2010
Shader instructions and Compiler are designed for supporting 3D graphic shader 3.0 API. Variable-length instructions are proposed to reduce the size of hardware of graphic processor in SIMD structure by shortening the length of instructions. The designed shader compiler supports variable and two phased structured instructions, and can be programmable at ESSL level. Conformance Test proposed by Khronos group is accomplished to verify the design result of instructions and complier. The test result shows overall average 37% performance improvement at the 16 functions of basic GL shader.
https://doi.org/10.6109/jkiice.2010.14.12.2691 인용 PDF KSCI

Pair Register Allocation Algorithm for 16-bit Instruction Set Architecture (ISA) Processor (16비트 명령어 기반 프로세서를 위한 페어 레지스터 할당 알고리즘)

Lee, Ho-Kyoon;Kim, Seon-Wook;Han, Young-Sun
- The KIPS Transactions:PartA
- /
- v.18A no.6
- /
- pp.265-270
- /
- 2011
Even though 32-bit ISA based microprocessors are widely used more and more, 16-bit ISA based processors are still being frequently employed for embedded systems. Intel 8086, 80286, Motorola 68000, and ADChips AE32000 are the representatives of the 16-bit ISA based processors. However, due to less expressiveness of the 16-bit ISA from its narrow bit width, we need to execute more 16-bit instructions for the same implementation compared to 32-bit instructions. Because the number of executed instructions is a very important factor in performance, we have to resolve the problem by improving the expressiveness of the 16-bit ISA. In this paper, we propose a new pair register allocation algorithm to enhance an original graph-coloring based register allocation algorithm. Also, we explain about both the performance result and further research directions.
https://doi.org/10.3745/KIPSTA.2011.18A.6.265 인용 PDF KSCI

The Instruction Flash memory system with the high performance dual buffer system (명령어 플래시 메모리를 위한 고성능 이중 버퍼 시스템 설계)

Jung, Bo-Sung;Lee, Jung-Hoon
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.2
- /
- pp.1-8
- /
- 2011
NAND type Flash memory has performing much researches for a hard disk substitution due to its low power consumption, cheap prices and a large storage. Especially, the NAND type flash memory is using general buffer systems of a cache memory for improving overall system performance, but this has shown a tendency to emphasize in terms of data. So, our research is to design a high performance instruction NAND type flash memory structure by using a buffer system. The proposed buffer system in a NAND flash memory consists of two parts, i.e., a fully associative temporal buffer for branch instruction and a fully associative spatial buffer for spatial locality. The spatial buffer with a large fetching size turns out to be effective serial instructions, and the temporal buffer with a small fetching size can achieve effective branch instructions. According to the simulation results, we can reduce average miss ratios by around 77% and the average memory access time can achieve a similar performance compared with the 2-way, victim and fully associative buffer with two or four sizes.
https://doi.org/10.9708/jksci.2011.16.2.001 인용 PDF KSCI

Energy-aware Instruction Cache Design using Backward Branch Information for Embedded Processors (임베디드 시스템에서 후방 분기 명령어 정보를 이용한 저전력 명령어 캐쉬 설계 기법)

Yang, Na-Ra;Kim, Jong-Myon;Kim, Cheol-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.13 no.6
- /
- pp.33-39
- /
- 2008
Energy efficiency should be considered together with performance when designing embedded processors. This paper proposes a new energy-aware instruction cache design using backward branch information to reduce the energy consumption in an embedded processor, since instruction caches consume a significant fraction of the on-chip energy. Proposed instruction cache is composed of two caches: a large main instruction cache and a small loop instruction cache. Proposed technique enables the selective access between the main instruction cache and the loop instruction cache to reduce the number of accesses to the main instruction cache, leading to good energy efficiency. Analysis results show that the proposed instruction cache reduces the energy consumption by 20% on the average, compared to the traditional instruction cache.
PDF

A Study on the Prediction Accuracy Bounds of Instruction Prefetching (명령어 선인출 예측 정확도의 한계에 관한 연구)

Kim, Seong-Baeg;Min, Sang-Lyul;Kim, Chong-Sang
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.8
- /
- pp.719-729
- /
- 2000
Prefetching aims at reducing memory latency by fetching, in advance, data that are likely to be requested by the processor in a near future. The effectiveness of prefetching is determined by how accurate the prediction on the needed instructions and data is. Most previous studies on prefetching were limited to proposing a particular prefetch scheme and its performance evaluation, paying little attention to theoretical aspects of prefetching. This paper focuses on the theoretical aspects of instruction prefetching. For this purpose, we propose a clairvoyant prefetch model that makes use of perfect history information. Based on this theoretical model, we analyzed upper limits on the prefetch prediction accuracies of the SPEC benchmarks. The results show that the prefetch prediction accuracy is very high when there is no cache. However, as the size of the instruction cache increases, the prefetch prediction accuracy drops drastically. For example, in the case of the spice benchmark, the prefetch prediction accuracy drops from 53% to 39% when the cache size increases from 2Kbyte to 16Kbyte (assuming 16byte block size). These results indicate that as the cache size increases, most localities are captured by the cache and that instruction prefetching based on the information extracted from the references that missed in the cache suffers from prediction inaccuracies
PDF

Fast implementation of HEVC inverse DCT using AVX2 instructions (AVX2 명령어를 이용한 HEVC 역 이산여현변환 고속화)

Kim, Woori;Jo, Hyunho;Ahn, Yong-Jo;Sim, Dong-Gyu
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2014.06a
- /
- pp.206-208
- /
- 2014
본 논문에서는 HEVC (High Efficiency Video Coding)의 IDCT (Inverse Discrete Cosine Transform) 모듈을 AVX2 (Advanced Vector Extensions 2) 명령어 셋을 사용하여 고속화하는 방법을 제안한다. 제안하는 방법은 4 개의 $4{\times}4$ 블록을 AVX2 레지스터에 로드 한 후, 동시에 AVX2 명령어 셋을 통해 한 번에 IDCT 를 수행한다. 제안하는 방법은 $4{\times}4$ 블록 단위로 순차적으로 SIMD(Single Instruction Multiple Data) 명령어 셋을 통해 IDCT 를 수행하는 방법에 비해 명령어 단위의 병렬화 성능을 극대화한다. 실험 결과, HEVC 디코더의 $4{\times}4$ IDCT 에 SIMD 명령어 셋을 적용한 경우 기존의 HM-12.1 에 비해 평균 3.35 배 수행 속도를 향상 시킨 반면, 제안하는 방법은 HM12.1에 비해 평균 9.50 배 수행 속도를 향상 시켰다.
PDF

Search Result 927, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)