• Title/Summary/Keyword: Instruction set design

Search Result 119, Processing Time 0.037 seconds

A study on the efficient method of constrained iterative regular expression pattern matching (제약 반복적인 정규표현식 패턴 매칭의 효율적인 방법에 관한 연구)

  • Seo, Byung-Suk
    • Design & Manufacturing
    • /
    • v.16 no.3
    • /
    • pp.34-38
    • /
    • 2022
  • Regular expression pattern matching is widely used in applications such as computer virus vaccine, NIDS and DNA sequencing analysis. Hardware-based pattern matching is used when high-performance processing is required due to time constraints. ReCPU, SMPU, and REMP, which are processor-based regular expression matching processors, have been proposed to solve the problem of the hardware-based method that requires resynthesis whenever a pattern is updated. However, these processor-based regular expression matching processors inefficiently handle repetitive operations of regular expressions. In this paper, we propose a new instruction set to improve the inefficient repetitive operations of ReCPU and SMPU. We propose REMPi, a regular expression matching processor that enables efficient iterative operations based on the REMP instruction set. REMPi improves the inefficient method of processing a particularly short sub-pattern as a repeat operation OR, and enables processing with a single instruction. In addition, by using a down counter and a counter stack, nested iterative operations are also efficiently processed. REMPi was described with Verilog and synthesized on Intel Stratix IV FPGA.

MultiRing An Efficient Hardware Accelerator for Design Rule Checking (멀티링 설계규칙검사를 위한 효과적인 하드웨어 가속기)

  • 노길수;경종민
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.6
    • /
    • pp.1040-1048
    • /
    • 1987
  • We propose a hardware architecture called Multiring which is applicable for various geometrical operations on rectilinear objects such as design rule checking in VLSI layout and many image processing operations including noise suppression and coutour extraction. It has both a fast execution speed and extremely high flexibility. The whole architecture is mainly divided into four parts` I/O between host and Multiring, ring memory, linear processor array and instruction decoder. Data transmission between host and Multiring is bit serial thereby reducing the bandwidth requirement for teh channel and the number of external pins, while each row data in the bit map stored in ring memory is processed in the corresponding processor in full parallelism. Each processor is simultaneously configured by the instruction decoder/controller to perform one of the 16 basic instructions such as Boolean (AND, OR, NOT, and Copy), geometrical(Expand and Shrink), and I/O operations each ring cycle, which gives Multiring maximal flexibility in terms of design rule change or the instruction set enhancement. Correct functional behavior of Multiring was confirmed by successfully running a software simulator having one-to-one structural correspondence to the Multiring hardware.

  • PDF

Using a H/W ADL-based Compiler for Fixed-point Audio Codec Optimization thru Application Specific Instructions (응용프로그램에 특화된 명령어를 통한 고정 소수점 오디오 코덱 최적화를 위한 ADL 기반 컴파일러 사용)

  • Ahn Min-Wook;Paek Yun-Heung;Cho Jeong-Hun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.4 s.101
    • /
    • pp.275-288
    • /
    • 2006
  • Rapid design space exploration is crucial to customizing embedded system design for exploiting the application behavior. As the time-to-market becomes a key concern of the design, the approach based on an application specific instruction-set processor (ASIP) is considered more seriously as one alternative design methodology. In this approach, the instruction set architecture (ISA) for a target processor is frequently modified to best fit the application with regard to code size and speed. Two goals of this paper is to introduce our new retargetable compiler and how it has been used in ASIP-based design space exploration for a popular digital signal processing (DSP) application. Newly developed retargetable compiler provides not only the functionality of previous retargetable compilers but also visualizes the features of the application program and profiles it so that it can help architecture designers and application programmers to insert new application specific instructions into target architecture for performance increase. Given an initial RISC-style ISA for the target processor, we characterized the application code and incrementally updated the ISA with more application specific instructions to give the compiler a better chance to optimize assembly code for the application. We get 32% performance increase and 20% program size reduction using 6 audio codec specific instructions from retargetable compiler. Our experimental results manifest a glimpse of evidence that a higgly retargetable compiler is essential to rapidly prototype a new ASIP for a specific application.

Design and Implementation of a DSP Chip for Portable Multimedia Applications (휴대 멀티미디어 응용을 위한 DSP 칩 설계 및 구현)

  • 윤성현;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.12
    • /
    • pp.31-39
    • /
    • 1998
  • This paper presents the design and implementation of a new multimedia fixed-point DSP (MDSP) core for portable multimedia applications. The MDSP instruction set is designed through the analysis of multimedia algorithms and DSP instruction sets. The MDSP architecture employs parallel processing techniques, such as SIMD and vector processing as well as DSP techniques. The instruction set can handle various data formats and MDSP can perform two MAC operations in parallel. The switching network and packing network can increase the performance by overlapping data rearrangement cycles with computation cycles. We have designed Verilog HDL models and the 0.6 $\mu\textrm{m}$ Samsung KG75000 SOG library is used. The total gate count is 68,831 and the clock frequency is 30 MHz.

  • PDF

Design of Chip Set for CDMA Mobile Station

  • Yeon, Kwang-Il;Yoo, Ha-Young;Kim, Kyung-Soo
    • ETRI Journal
    • /
    • v.19 no.3
    • /
    • pp.228-241
    • /
    • 1997
  • In this paper, we present a design of modem and vocoder digital signal processor (DSP) chips for CDMA mobile station. The modem chip integrates CDMA reverse link modulator, CDMA forward link demodulator and Viterbi decoder. This chip contains 89,000 gates and 29 kbit RAMs, and the chip size is $10 mm{\times}10.1 mm$ which is fabricated using a $0.8{\mu}m$ 2 metal CMOs technology. To carry out the system-level simulation, models of the base station modulator, the fading channel, the automatic gain control loop, and the microcontroller were developed and interfaced with a gate-level description of the modem application specific integrated circuit (ASIC). The Modem chip is now successfully working in the real CDMA mobile station on its first fab-out. A new DSP architecture was designed to implement the Qualcomm code exited linear prediction (QCELP) vocoder algorithm in an efficient way. The 16 bit vocoder DSP chip has an architecture which supports direct and immediate addressing modes in one instruction cycle, combined with a RISC-type instruction set. This turns out to be effective for the implementation of vocoder algorithm in terms of performance and power consumption. The implementation of QCELP algorithm in our DSP requires only 28 million instruction per second (MIPS) of computation and 290 mW of power consumption. The DSP chip contains 32,000 gates, 32K ($2k{\times}16\;bit$) RAM, and 240k ($10k{\times}24\;bit$) ROM. The die size is $8.7\;mm{\times}8.3\;mm$ and chip is fabricated using $0.8\;{\mu}m$ CMOS technology.

  • PDF

Porting LLVM Compiler to a Custom Processor Architecture Using Synopsys Processor Designer

  • Jung, Hyungyun;Shin, Jangseop;Heo, Ingoo;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.53-56
    • /
    • 2014
  • Application specific instruction-set processor (ASIP) is a suitable design choice for system designers who seek both flexibility to handle various applications in the domain together with the performance. Successful development of an ASIP, however, requires a software development kit (SDK) to be provided along with the processor. Synopsys Processor Designer is an ASIP development tool, which takes as input a set of files written in a high-level architecture description language called LISA (Language for Instruction Set Architecture), and generates SDK as well as RTL. Recently, they have added support for the generation of LLVM compiler backend, though some manual work is required. In this paper, we introduce some details in porting LLVM compiler to a custom processor architecture in Synopsys Processor Designer.

AMEX: Extending Addressing Mode of 16-bit Thumb Instruction Set Architecture (AMEX: 16비트 Thumb 명령어 집합 구조의 주소 지정 방식 확장)

  • Kim, Dae-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.11
    • /
    • pp.1-10
    • /
    • 2012
  • In this paper, the extension of the addressing mode in the 16-bit Thumb instruction set architecture is proposed to improve the performance of 16-bit Thumb code. The key idea of the proposed approach is the introduction of new addressing modes for more frequent instructions by using the saved bits from the reduction of the register fields in less frequently used instructions. The proposed approach adopts efficient addressing modes from the 32-bit ARM architecture, which is the superset of the 16-bit Thumb architecture. To speed up access to a data list, scaled register offset addressing mode and post-indexed addressing mode are introduced for load and store instructions. Experiments show that the proposed approach improves performance by an average of 8.5% when compared to the conventional approach.

An ASIP Design for Deblocking Filter of H.264/AVC (H.264/AVC 표준의 디블록킹 필터를 가속하기 위한 ASIP 설계)

  • Lee, Hyoung-Pyo;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.3
    • /
    • pp.142-148
    • /
    • 2008
  • Though a deblocking filter of H.264/AVC provides enhanced image quality by removing blocking artifact on block boundary, the complex filtering operation on this process is a dominant factor of the whole decoding time. In this paper, we designed an ASIP to accelerate deblocking filter operation with the proposed instruction set. We designed a processor based on a MIPS structure with LISA, simulated a deblocking later model, and compared the execution time on the proposed instruction set. In addition, we generated HDL model of the processor through CoWare's Processor Designer and synthesized with TSMC 0.25um CMOS cell library by Synopsys Design Compiler. As the result of the synthesis, the area and delay time increased 7.5% and 3.2%, respectively. However, due to the proposed instruction set, total execution performance is improved by 18.18% on average.

Design of an ARM7 Core with a Singed Integer Division Instruction (Signed Integer Division 명령어를 추가한 ARM7 Core 설계)

  • 오민석;조태헌;남기훈;이광엽
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1391-1394
    • /
    • 2003
  • 본 논문은 ARM7 TDMI 마이크로프로세서의 연산기능 중 구현되지 알은 나눗셈 연산 기능을 추가로 구현하였다. 이를 위해 ARM ISA(Instruction Set Architecture)에 부호를 고려한 나눗셈 명령어인 'SDIV' 명령어를 추가로 정의하였으며, 나눗셈 알고리즘 Signed Nonrestoring Division을 수행할 수 있도록 ARM7 TDMI 마이크로프로세서의 Data Path를 재 설계하였다. 제안된 방법의 타당성을 검증하기 위하여 현재 ARM7 TDMI 마이크로프로세서의 정수 나눗셈 연산처리 방법과 제안된 구조에서의 정수 나눗셈 연산 처리 방법을 비교하였으며, 그 겉과 수행 cycle의 수가 40%로 감소되는 것을 확인하였다

  • PDF

Design of an AE32000-compatible 32-bit EISC Microprocessor (AE32000 호환 32-비트 EISC 마이크로프로세서 설계)

  • 곽기영;박진국;이두영;이범근;정연모
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.700-702
    • /
    • 2002
  • 본 논문은 16-비트 고정된 명령어 형식을 갖는 32-비트 EISC(Extendable Instruction Set Computer) 코어 구현에 대하여 기술하였다. EISC구조는 코드 밀도가 높은 확장 오퍼랜드(operand) 형식을 사용하여 메모리 크기를 줄일 수 있으므로 ASIC 구현시 저전력 시스템 및 소형화된 임베디드 시스템을 위한 프로세서 구현을 가능하게 한다. 설계된 프로세서는 AE32000 명령어 셋과 호환이 가능하도록 설계되었으며 5단 파이프라인을 적용하여 프로세서의 성능을 높였다. 또한 BTB(Branch Target Buffer)를 사용하여 분기 지연을 줄여 낮은 CPI(Clock Per Instruction)을 유지하게 하였다.