• Title/Summary/Keyword: Instruction set

Search Result 305, Processing Time 0.026 seconds

Code Size Reduction and Execution performance Improvement with Instruction Set Architecture Design based on Non-homogeneous Register Partition (코드감소와 성능향상을 위한 이질 레지스터 분할 및 명령어 구조 설계)

  • Kwon, Young-Jun;Lee, Hyuk-Jae
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.12
    • /
    • pp.1575-1579
    • /
    • 1999
  • Embedded processors often accommodate two instruction sets, a standard instruction set and a compressed instruction set. With the compressed instruction set, code size can be reduced while instruction count (and consequently execution time) can be increased. To achieve code size reduction without significant increase of execution time, this paper proposes a new compressed instruction set architecture, called TOE (Two Operations Execution). The proposed instruction set format includes the parallel bit that indicates an instruction can be executed simultaneously with the next instruction. To add the parallel bit, TOE instruction format reduces the destination register field. The reduction of the register field limits the number of registers that are accessible by an instruction. To overcome the limited accessibility of registers, TOE adapts non-homogeneous register partition in which registers are divided into multiple subsets, each of which are accessed by different groups of instructions. With non-homogeneous registers, each instruction can access only a limited number of registers, but an entire program can access all available registers. With efficient non-homogeneous register allocator, all registers can be used in a balanced manner. As a result, the increase of code size due to register spills is negligible. Experimental results show that more than 30% of TOE instructions can be executed in parallel without significant increase of code size when compared to existing Thumb instruction set.

  • PDF

Instruction Flow based Early Way Determination Technique for Low-power L1 Instruction Cache

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.9
    • /
    • pp.1-9
    • /
    • 2016
  • Recent embedded processors employ set-associative L1 instruction cache to improve the performance. The energy consumption in the set-associative L1 instruction cache accounts for considerable portion in the embedded processor. When an instruction is required from the processor, all ways in the set-associative instruction cache are accessed in parallel. In this paper, we propose the technique to reduce the energy consumption in the set-associative L1 instruction cache effectively by accessing only one way. Gshare branch predictor is employed to predict the instruction flow and determine the way to fetch the instruction. When the branch prediction is untaken, next instruction in a sequential order can be fetched from the instruction cache by accessing only one way. According to our simulations with SPEC2006 benchmarks, the proposed technique requires negligible hardware overhead and shows 20% energy reduction on average in 4-way L1 instruction cache.

Techniques for special instruction generation for DSP ASIP (DSP영 ASIP을 위한 특수 명령어 생성 기법)

  • 김홍철;황승호
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.7
    • /
    • pp.1-10
    • /
    • 1998
  • The first thing in designing application-specific instruction set processor is having instruction set closely matching hardware characteristics. This instruction set design problem can be more complicated when cobined with implementation method selection problem of each instruction. Our processor model supports two kinds of instructions-primitive or special instructions. Primitive instructions are implemented using common multifunctional hardware such as ALU. Special instructions require a set of dedicated hardware, which actually functions as a coprocessor to the main processor. In this case, special instructions and primitive instructions can be executed independently. In this paper, we present novel algorithm for genrating special instructions for given application. Parallelism between special instructions and primitive instructions is also considered during the performance estimation stage of generated special instructions.

  • PDF

A Novel Instruction Set for Packet Processing of Network ASIP (패킷 프로세싱을 위한 새로운 명령어 셋에 관한 연구)

  • Chung, Won-Young;Lee, Jung-Hee;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.9B
    • /
    • pp.939-946
    • /
    • 2009
  • In this paper, we propose a new network ASIP(Application Specific Instruction-set Processor) which was designed for simulation models by a machine descriptions language LISA(Language for Instruction Set Architecture). This network ASIP is aimed for an exclusive engine undertaking packet processing in a router. To achieve the purpose, we added a new necessary instruction set for processing a general ASIP based on MIPS(Microprocessor without Interlock Pipeline Stages) architecture in high speed. The new instructions can be divided into two groups: a classification instruction group and a modification instruction group, and each group is to be processed by its own functional unit in an execution stage. The functional unit was optimized for area and speed through Verilog HDL, and the result after synthesis was compared with the area and operation delay time. Moreownr, it was allocated to the Macro function ana low-level standardized programming language C using CKF(Compiler Known Function). Consequently, we verified performance improvement achieved by analysis and comparison of execution cycles of application programs.

Parallel Branch Instruction Extension for Thumb-2 Instruction Set Architecture (Thumb-2 명령어 집합 구조의 병렬 분기 명령어 확장)

  • Kim, Dae-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.7
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, the parallel branch instruction is proposed which executes a branch instruction and the frequently used instruction simultaneously to improve the performance of Thumb-2 instruction set architecture. In the proposed approach, new 32-bit parallel branch instructions are introduced which combine 16-bit branch instruction with each of the frequently used 16-bit LOAD, ADD, MOV, STORE, and SUB instructions, respectively. To provide the encoding space of the new instructions, the register field in less frequently executed instructions is reduced, and the new instructions are encoded by using the saved bits. Experiments show that the proposed approach improves performance by an average of 8.0% when compared to the conventional approach.

A study on the Cycle-Accurate Retargetable Micro-Architecture Simulation Framework (사이클 정확도의 재목적화 가능한 마이크로아키텍쳐 시뮬레이션 프레임워크에 관한 연구)

  • Yang, Hoon-Mo;Lee, Moon-Key
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.643-646
    • /
    • 2005
  • This paper presents CARMA (Cycle-Accurate Retargetable Micro-Architecture) as efficient framework for SoC-centric pipelined instruction-set architectures. It is based on ADL (Architecture Description Language) and provides more concise and manifest semantics to describe behavior of instruction set by mixing efficiency of instruction-set simulators and flexibility of RTL simulators. It exploits new timing model method based on process scheduling so it can support general timing model with cycle accuracy for large-scaled architectures usually used in SoC multimedia chip-set. According to experiments, the proposed framework was shown to be 5.5 times faster than HDL and 2.5 times faster than System-C in simulation speed so it is applicable for complex instruction-set pipelined architectures.

  • PDF

A Study of the Automation of Factory through the Development of UniSet (UniSet개발을 통한 공장자동화에 관한 연구)

  • Park, K.H.;Kim, S.C.
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.14 no.2
    • /
    • pp.84-91
    • /
    • 1997
  • This paper reports the effort for developing this new Unified Manufacturing Instruction Set and its environment, called here UniSet, to deal with difficulties in set up and operation of Flexible Manufacturing Cells, UniSet has been developed as a non-exclusive unified manufacturing instruction set based on com- parisons of the prevailing machine tool and programming primitives. UniSet allows programmers to deal with only one instruction set, if they so desire, in a single coherent enviroment, rather than numerous machine programming languges. The software system is coded in an Object-Oriented Programming (OOP) language, Smalltalk, and derives its paradigm from the OO philosophy. Test results are also includ- ed to demonstrate the applicability of the approach employed.

  • PDF

A Design of Dual-Phase Instructions for a effective Logarithm and Exponent Arithmetic (효율적인 로그와 지수 연산을 위한 듀얼 페이즈 명령어 설계)

  • Kim, Chi-Yong;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.14 no.2
    • /
    • pp.64-68
    • /
    • 2010
  • This paper proposes efficient log and exponent calculation methods using a dual phase instruction set without additional ALU unit for a mobile enviroment. Using the Dual Phase Instruction set, it extracts exponent and mantissa from expression of floating point and calculates 24bit single precision floating point of log approximation using the Taylor series expansion algorithm. And with dual phase instruction set, it reduces instruction excution cycles. The proposed Dual Phase architecture reduces the performance degradation and maintain smaller size.

A Low Cost Instruction Set for Bit Stream Process (비트열 처리를 위한 저비용 명령어 세트)

  • Ham, Dong-Hyeon;Lee, Hyoung-Pyo;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.2
    • /
    • pp.41-47
    • /
    • 2008
  • Most of media compression CODECs adopts the variable length coding method. This paper proposes special registers and instruction set for bit stream process in order to accelerate the decoding process of the variable length code. The instruction set shares the conventional data path to minimize additional costs. And bit stream is read from the memory instead of the special port. Therefore the instruction set minimizes the change of the processor, and is adopted without any additional input controller and buffer, and accelerate decoding process of variable length code. The data path of the instruction set needs additional 65 bits memory and 344 equivalent gates, 0.19 ns delay under TSMC $0.25{\mu}m$ technology. The instruction set reduced the execution time of the variable length code decoding process in H.264/AVC by about 55%.

Design of an Automatic Generation System for Cycle-accurate Instruction-set Simulators for DSP Processors (DSP 프로세서용 인스트럭션 셋 시뮬레이터 자동생성기의 설계에 관한 연구)

  • Hong, Sung-Min;Park, Chang-Soo;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9A
    • /
    • pp.931-939
    • /
    • 2007
  • This paper describes the system which automatically generates instruction-set simulators cores using the SMDL. SMDL describes structure and instruction-set information of a target DSP machine. Analyzing behavioral information of each pipeline stage of all instructions on a target ASIPS, the proposed system automatically generates a cycle-accurate instruction set simulator in C++ for a target processor. The proposed system has been tested by generating instruction-set simulators for ARM9E-S, ADSP-TS20x, and TMS320C2x architectures. Experiments were performed by checking the functions of the $4{\times}4$ matrix multiplication, 16-bit IIR filter, 32-bit multiplication, and the FFT using the generated simulators. Experimental results show the functional accuracy of the generated simulators.