• Title/Summary/Keyword: Instruction-set Architecture

Search Result 87, Processing Time 0.029 seconds

Using a H/W ADL-based Compiler for Fixed-point Audio Codec Optimization thru Application Specific Instructions (응용프로그램에 특화된 명령어를 통한 고정 소수점 오디오 코덱 최적화를 위한 ADL 기반 컴파일러 사용)

  • Ahn Min-Wook;Paek Yun-Heung;Cho Jeong-Hun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.4 s.101
    • /
    • pp.275-288
    • /
    • 2006
  • Rapid design space exploration is crucial to customizing embedded system design for exploiting the application behavior. As the time-to-market becomes a key concern of the design, the approach based on an application specific instruction-set processor (ASIP) is considered more seriously as one alternative design methodology. In this approach, the instruction set architecture (ISA) for a target processor is frequently modified to best fit the application with regard to code size and speed. Two goals of this paper is to introduce our new retargetable compiler and how it has been used in ASIP-based design space exploration for a popular digital signal processing (DSP) application. Newly developed retargetable compiler provides not only the functionality of previous retargetable compilers but also visualizes the features of the application program and profiles it so that it can help architecture designers and application programmers to insert new application specific instructions into target architecture for performance increase. Given an initial RISC-style ISA for the target processor, we characterized the application code and incrementally updated the ISA with more application specific instructions to give the compiler a better chance to optimize assembly code for the application. We get 32% performance increase and 20% program size reduction using 6 audio codec specific instructions from retargetable compiler. Our experimental results manifest a glimpse of evidence that a higgly retargetable compiler is essential to rapidly prototype a new ASIP for a specific application.

A Custom Code Generation Technique for ASIPs from High-level Language (고급 언어에서 ASIP을 위한 전용 부호 생성 기술 연구)

  • Alam, S.M. Shamsul;Choi, Goangseog
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.3
    • /
    • pp.31-43
    • /
    • 2015
  • In this paper, we discuss a code generation technique for custom transport triggered architecture (TTA) from a high-level language structure. This methodology is implemented by using TTA-based Co-design Environment (TCE) tool. The results show how the scheduler exploits instruction level parallelism in the custom target architecture and source program. Thus, the scheduler generates parallel TTA instructions using lower cycle counts than the sequential scheduling algorithm. Moreover, we take Tensilica tool to make a comparison with TCE. Because of the efficiency of TTA, TCE takes less execution cycles compared to Tensilica configurations. Finally, this paper shows that it requires only 7 cycles to generate the parallel TTA instruction set for implementing Cyclic Redundancy Check (CRC) applications as an input design, and presents the code generation technique to move complexity from the processor software to hardware architecture. This method can be applicable lots of channel Codecs like CRC and source Codecs like High Efficiency Video Coding (HEVC).

Design of a Graphic Processor for Multimedia Data Processing (멀티미디어 데이타 처리를 위한 그래픽 프로세서 설계)

  • 고익상;한우종;선우명동
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.10
    • /
    • pp.56-65
    • /
    • 1999
  • This paper presents an architecture and its instruction set for a graphic coprocessor(GCP) which can be used for a multimedia server. The proposed instruction set employs parallel architecture concepts, such as SIMD and Superscalar. GCP consists of a scheduler and four functional units. The scheduler solves an instruction bottleneck problem causing by sharing with four general processors(GPs). GCP can execute up to 4 instructions in parallel. It consists of about 56,000 gates and operates at 30 MHz clock frequency due to speed limitation of SOG technology. GCP meets the real-time DCT algorithm requirement of the CIF image format and can process up to 63 frames/sec for the DCT Algorithm and 21 frames/sec for the Full Block matching Algorithm of the CIF image format.

  • PDF

Design of Chip Set for CDMA Mobile Station

  • Yeon, Kwang-Il;Yoo, Ha-Young;Kim, Kyung-Soo
    • ETRI Journal
    • /
    • v.19 no.3
    • /
    • pp.228-241
    • /
    • 1997
  • In this paper, we present a design of modem and vocoder digital signal processor (DSP) chips for CDMA mobile station. The modem chip integrates CDMA reverse link modulator, CDMA forward link demodulator and Viterbi decoder. This chip contains 89,000 gates and 29 kbit RAMs, and the chip size is $10 mm{\times}10.1 mm$ which is fabricated using a $0.8{\mu}m$ 2 metal CMOs technology. To carry out the system-level simulation, models of the base station modulator, the fading channel, the automatic gain control loop, and the microcontroller were developed and interfaced with a gate-level description of the modem application specific integrated circuit (ASIC). The Modem chip is now successfully working in the real CDMA mobile station on its first fab-out. A new DSP architecture was designed to implement the Qualcomm code exited linear prediction (QCELP) vocoder algorithm in an efficient way. The 16 bit vocoder DSP chip has an architecture which supports direct and immediate addressing modes in one instruction cycle, combined with a RISC-type instruction set. This turns out to be effective for the implementation of vocoder algorithm in terms of performance and power consumption. The implementation of QCELP algorithm in our DSP requires only 28 million instruction per second (MIPS) of computation and 290 mW of power consumption. The DSP chip contains 32,000 gates, 32K ($2k{\times}16\;bit$) RAM, and 240k ($10k{\times}24\;bit$) ROM. The die size is $8.7\;mm{\times}8.3\;mm$ and chip is fabricated using $0.8\;{\mu}m$ CMOS technology.

  • PDF

MultiRing An Efficient Hardware Accelerator for Design Rule Checking (멀티링 설계규칙검사를 위한 효과적인 하드웨어 가속기)

  • 노길수;경종민
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.6
    • /
    • pp.1040-1048
    • /
    • 1987
  • We propose a hardware architecture called Multiring which is applicable for various geometrical operations on rectilinear objects such as design rule checking in VLSI layout and many image processing operations including noise suppression and coutour extraction. It has both a fast execution speed and extremely high flexibility. The whole architecture is mainly divided into four parts` I/O between host and Multiring, ring memory, linear processor array and instruction decoder. Data transmission between host and Multiring is bit serial thereby reducing the bandwidth requirement for teh channel and the number of external pins, while each row data in the bit map stored in ring memory is processed in the corresponding processor in full parallelism. Each processor is simultaneously configured by the instruction decoder/controller to perform one of the 16 basic instructions such as Boolean (AND, OR, NOT, and Copy), geometrical(Expand and Shrink), and I/O operations each ring cycle, which gives Multiring maximal flexibility in terms of design rule change or the instruction set enhancement. Correct functional behavior of Multiring was confirmed by successfully running a software simulator having one-to-one structural correspondence to the Multiring hardware.

  • PDF

A study on the efficient method of constrained iterative regular expression pattern matching (제약 반복적인 정규표현식 패턴 매칭의 효율적인 방법에 관한 연구)

  • Seo, Byung-Suk
    • Design & Manufacturing
    • /
    • v.16 no.3
    • /
    • pp.34-38
    • /
    • 2022
  • Regular expression pattern matching is widely used in applications such as computer virus vaccine, NIDS and DNA sequencing analysis. Hardware-based pattern matching is used when high-performance processing is required due to time constraints. ReCPU, SMPU, and REMP, which are processor-based regular expression matching processors, have been proposed to solve the problem of the hardware-based method that requires resynthesis whenever a pattern is updated. However, these processor-based regular expression matching processors inefficiently handle repetitive operations of regular expressions. In this paper, we propose a new instruction set to improve the inefficient repetitive operations of ReCPU and SMPU. We propose REMPi, a regular expression matching processor that enables efficient iterative operations based on the REMP instruction set. REMPi improves the inefficient method of processing a particularly short sub-pattern as a repeat operation OR, and enables processing with a single instruction. In addition, by using a down counter and a counter stack, nested iterative operations are also efficiently processed. REMPi was described with Verilog and synthesized on Intel Stratix IV FPGA.

Efficient Loop Accelerator for Motion Estimation Specific Instruction-set Processor (움직임 추정 전용 프로세서를 위한 효율적인 루프 가속기)

  • Ha, Jae Myung;Jung, Ho Sun;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.7
    • /
    • pp.159-166
    • /
    • 2013
  • This paper proposes an efficient loop accelerator for a motion estimation specific instruction-set processor. ME algorithms in nature contain complex and multiple loop operations. To support efficient hardware (HW) loop operations, this paper introduces four loop instructions and their specific HW architecture. The simulation results show that the proposed loop accelerator can reduce about 29% average instruction cycles for ME early-termination schemes compared with typical implementation having a combination of compare and conditional jump instructions. The proposed loop accelerator of the motion estimation specific instruction-set processor can significantly reduce the number of program memory accesses and greatly save power consumption. Hence, it can be quite suitable for low power and flexible ME implementation.

Design of an ARM7 Core with a Singed Integer Division Instruction (Signed Integer Division 명령어를 추가한 ARM7 Core 설계)

  • 오민석;조태헌;남기훈;이광엽
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1391-1394
    • /
    • 2003
  • 본 논문은 ARM7 TDMI 마이크로프로세서의 연산기능 중 구현되지 알은 나눗셈 연산 기능을 추가로 구현하였다. 이를 위해 ARM ISA(Instruction Set Architecture)에 부호를 고려한 나눗셈 명령어인 'SDIV' 명령어를 추가로 정의하였으며, 나눗셈 알고리즘 Signed Nonrestoring Division을 수행할 수 있도록 ARM7 TDMI 마이크로프로세서의 Data Path를 재 설계하였다. 제안된 방법의 타당성을 검증하기 위하여 현재 ARM7 TDMI 마이크로프로세서의 정수 나눗셈 연산처리 방법과 제안된 구조에서의 정수 나눗셈 연산 처리 방법을 비교하였으며, 그 겉과 수행 cycle의 수가 40%로 감소되는 것을 확인하였다

  • PDF

The Design of A Program Counter Unit for RISC Processors (RISC 프로세서의 프로그램 카운터 부(PCU)의 설계)

  • 홍인식;임인칠
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.7
    • /
    • pp.1015-1024
    • /
    • 1990
  • This paper proposes a program counter unit(PCU) on the pipelined architecture of RISC (Reduced Instruction Set Computer) type high performance processors, PCU is used for supplying instruction addresses to memory units(Instruction Cache) efficiently. A RISC processor's PCU has to compute the instruction address within required intervals continnously. So, using the method of self-generated incrementor, is more efficient than the conventional one's using ALU or private adder. The proposed PCU is designed to have the fast +4(Byte Address) operation incrementor that has no carry propagation delay. Design specifications are taken by analyzing the whole data path operation of target processor's default and exceptional mode instructions. CMOS and wired logic circuit technologic are used in PCU for the fast operation which has small layout area and power dissipation. The schematic capture and logic, timing simulation of proposed PCU are performed on Apollo W/S using Mentor Graphics CAD tooks.

  • PDF

The Design of low-cost SIMD MAC/MAS for Embedded Systems (임베디드 시스템을 위한 저비용 SIMD MAC/MAS 블록 설계)

  • Lee Yong Joo;Jung Jin Woo;Lee Yong Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.10C
    • /
    • pp.1460-1468
    • /
    • 2004
  • In this paper, we developed a low-area and low-cost SIMD MAC/MAS(Single Instruction Multiple Data Multiply and ACcumulate/Multiply And Subtract) for multimedia that is used much in real life. We compared the result of this research with a previously developed more large and high performance SIMD MAC/MAS. This paper is consist of 5 parts, which are an introduction, the contents of designing SIMD MAC/MAS hardware, a special qualities for previous works, the result of synthesis and conclusion. The design result reduced by size 32% of whole hardware than 64 bit SIMD MAC/MAS block of designed for high performance. This improved ISA (Instruction Set Architecture) to be suitable to embedded DSP(Digital Signal Processor), and shortened bit range of 64-bit data to 32-bit and implement more optimally.