• Title/Summary/Keyword: Algorithm Instruction

Search Result 155, Processing Time 0.022 seconds

Design of a Parallel Pipelined Processor Architecture (병렬 파이프라인 프로세서 아키덱처의 설계)

  • 이상정;김광준
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.3
    • /
    • pp.11-23
    • /
    • 1995
  • In this paper, a parallel pipelined processor model which acts as a small VLIW processor architecture and a scheduling algorithm for extracting instruction-level parallelism on this architecture are proposed. The proposed model has a dual-instruction mode which has maximum 4 basic operations being executed in parallel. By combining these basic operations, variable instruction set can be designed for various applications. The scheduling algorithm schedules basic operations for parallel execution and removes pipeline hazards by examining data dependency and resource conflict relations. In order to examine operation and evaluate the performance,a C compiler and a simulator are developed. By simulating various test programs with the compiler and the simulator, the characteristics and the performance result of the proposed architecture are measured.

  • PDF

Techniques for special instruction generation for DSP ASIP (DSP영 ASIP을 위한 특수 명령어 생성 기법)

  • 김홍철;황승호
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.7
    • /
    • pp.1-10
    • /
    • 1998
  • The first thing in designing application-specific instruction set processor is having instruction set closely matching hardware characteristics. This instruction set design problem can be more complicated when cobined with implementation method selection problem of each instruction. Our processor model supports two kinds of instructions-primitive or special instructions. Primitive instructions are implemented using common multifunctional hardware such as ALU. Special instructions require a set of dedicated hardware, which actually functions as a coprocessor to the main processor. In this case, special instructions and primitive instructions can be executed independently. In this paper, we present novel algorithm for genrating special instructions for given application. Parallelism between special instructions and primitive instructions is also considered during the performance estimation stage of generated special instructions.

  • PDF

A Design of Dual-Phase Instructions for a effective Logarithm and Exponent Arithmetic (효율적인 로그와 지수 연산을 위한 듀얼 페이즈 명령어 설계)

  • Kim, Chi-Yong;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.14 no.2
    • /
    • pp.64-68
    • /
    • 2010
  • This paper proposes efficient log and exponent calculation methods using a dual phase instruction set without additional ALU unit for a mobile enviroment. Using the Dual Phase Instruction set, it extracts exponent and mantissa from expression of floating point and calculates 24bit single precision floating point of log approximation using the Taylor series expansion algorithm. And with dual phase instruction set, it reduces instruction excution cycles. The proposed Dual Phase architecture reduces the performance degradation and maintain smaller size.

Design of a Graphic Processor for Multimedia Data Processing (멀티미디어 데이타 처리를 위한 그래픽 프로세서 설계)

  • 고익상;한우종;선우명동
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.10
    • /
    • pp.56-65
    • /
    • 1999
  • This paper presents an architecture and its instruction set for a graphic coprocessor(GCP) which can be used for a multimedia server. The proposed instruction set employs parallel architecture concepts, such as SIMD and Superscalar. GCP consists of a scheduler and four functional units. The scheduler solves an instruction bottleneck problem causing by sharing with four general processors(GPs). GCP can execute up to 4 instructions in parallel. It consists of about 56,000 gates and operates at 30 MHz clock frequency due to speed limitation of SOG technology. GCP meets the real-time DCT algorithm requirement of the CIF image format and can process up to 63 frames/sec for the DCT Algorithm and 21 frames/sec for the Full Block matching Algorithm of the CIF image format.

  • PDF

Design of Chip Set for CDMA Mobile Station

  • Yeon, Kwang-Il;Yoo, Ha-Young;Kim, Kyung-Soo
    • ETRI Journal
    • /
    • v.19 no.3
    • /
    • pp.228-241
    • /
    • 1997
  • In this paper, we present a design of modem and vocoder digital signal processor (DSP) chips for CDMA mobile station. The modem chip integrates CDMA reverse link modulator, CDMA forward link demodulator and Viterbi decoder. This chip contains 89,000 gates and 29 kbit RAMs, and the chip size is $10 mm{\times}10.1 mm$ which is fabricated using a $0.8{\mu}m$ 2 metal CMOs technology. To carry out the system-level simulation, models of the base station modulator, the fading channel, the automatic gain control loop, and the microcontroller were developed and interfaced with a gate-level description of the modem application specific integrated circuit (ASIC). The Modem chip is now successfully working in the real CDMA mobile station on its first fab-out. A new DSP architecture was designed to implement the Qualcomm code exited linear prediction (QCELP) vocoder algorithm in an efficient way. The 16 bit vocoder DSP chip has an architecture which supports direct and immediate addressing modes in one instruction cycle, combined with a RISC-type instruction set. This turns out to be effective for the implementation of vocoder algorithm in terms of performance and power consumption. The implementation of QCELP algorithm in our DSP requires only 28 million instruction per second (MIPS) of computation and 290 mW of power consumption. The DSP chip contains 32,000 gates, 32K ($2k{\times}16\;bit$) RAM, and 240k ($10k{\times}24\;bit$) ROM. The die size is $8.7\;mm{\times}8.3\;mm$ and chip is fabricated using $0.8\;{\mu}m$ CMOS technology.

  • PDF

An Aggressive Register Allocation Algorithm for EPIC Architectures (EPIC 아키텍쳐를 위한 적극적 레지스터 할당 알고리듬)

  • Choe, Jun-Gi;Lee, Sang-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.497-511
    • /
    • 1999
  • Recently, many parallel processing technologies were developed, ILP(Instruction level Parallelism) processor's performance have been growed very rapidly. especially, EPIC(Explicitly Parallel Instruction computing) architectures attempt to enhance the performance in the predicated execution and speculative execution with the hardware. In this paper to improve the code scheduling possibility by applying to the characteristics of EPIC architectures, a new register allocation algorithm is proposed. And we proves that proposed register allocation algorithm is more efficient scheme than the conventional scheme when predicated execution is applied to our scheme by experiments. In experimental results, it shows much more performance enhancement, about 19% in proposed scheme than the conventional scheme. So, our scheme is verified that it is an effective register allocation method.

  • PDF

A Predicate-Sensitive Scheduling Algorithm in Instruction-Level Parallelism Processors (ILP 프로세서를 위한 조건실행 지원 스케쥴링 알고리즘)

  • Yoo, Byung-Kang;Lee, Sang-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.1
    • /
    • pp.202-214
    • /
    • 1998
  • Exploitation of instruction-level parallelism(ILP) is an effective mechanism for improving the performance of modern super-scalar and VLIW processors. Various software techniques can be applied to increase ILP. Among these techniques, predicated execution is the one that increases the degree of ILP by allowing instructions from different basic blocks to be converted to a single basic block by removing branch instructions. In this paper, a global predicate-sensitive scheduling algorithm is proposed to improve the performance for ILP processors that support predicated execution. In order to examine the performance of proposed algorithm, a C compiler and a simulator are developed. By simulating various benchmark programs with the compiler and the simulator, the performance results of this algorithm are measured and the effectiveness of the algorithm is verified. As a result of measure performance with I, 2, 4 issue execution, this study was confirmed average performance by 20% or more.

  • PDF

A Test Algorithm for Instruction Decoding Function of MC 68000$\mu$P (MC68000$\mu$P의 명령어디코오딩 기능에 관한 시험알고리즘)

  • 김종호;안광선
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.22 no.6
    • /
    • pp.124-132
    • /
    • 1985
  • The functional testing of microprocessor comes to be time - consuming task with the progress of technology of LSl/VLSl . In this paper, we present an efficient method to test instruction decoding function of MC 68000 that is the reason of complicated functional testing. This method is based on the analysis of operation word that is instruction dccoding information available to user with microprocessor's manual. Thc instruction is partitioned into representative instructions and party instructions. Then 332 minimum test instruction pairs are chosen from 69 basic instructions for detecting of instruction decoding function faults and test procedure for these is discussed.

  • PDF

Scalar First Replacement Strategy for Reference Prediction Table Used in Prefetching Streaming Data (스트리밍 데이터의 선인출에 사용되는 참조예측표의 스칼라 우선 교체 전략)

  • Lim, Chul-hoo;Chon, Young-Suk;Kim, Suk-il;Jeon, Joong-nam
    • The KIPS Transactions:PartA
    • /
    • v.11A no.3
    • /
    • pp.163-172
    • /
    • 2004
  • Multimedia applications tend to access their data as a streaming pattern with regular intervals. This characteristic can be utilized in prefetching the multimedia data into cache memory so as to reduce their execution speeds. The reference-prediction prefetch algorithm predicts the memory address that seems to be used in the next time based on the previous history of memory references stored in the prediction reference table. This paper proposes a strategy to manipulate the reference prediction table which contains all of the data reference instructions to scalar and streaming data. We have recognized that the scalar reference instructions do not contribute to the data prefetching algorithm. Therefore, when replacing an element in the reference prediction table, the proposed algorithm preferentially selects the scalar reference instruction before the stream reference instruction. It makes the stream reference instruction to stay for a long time compared to the FIFO replacement policy, and eventually improves the performance of data prefetching.

Comparison of Nios II Core-based Accelerators (Niod II 코어기반 가속기 비교)

  • Song, Gi-Yong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.639-645
    • /
    • 2015
  • Checksum and residue checking accelerators were implemented on a Nios II core-based platform according to component method, in which the corresponding hardware was implemented with HDL coding, a custom instruction method, in which the instruction set of the processor was extended, and the C2H method, in which the corresponding logic was automatically created by the C2H compiler. The processing results from each accelerator for each algorithm were then examined and compared. The results of the comparison showed that the accelerator implemented with the C2H method is the fastest in terms of the execution time, and the accelerator with custom instruction requires the least add-on from the viewpoint of add-on hardware.