• Title/Summary/Keyword: instruction selection algorithm

Search Result 10, Processing Time 0.024 seconds

Register Pressure Aware Code Selection Algorithm for Multi-Output Instructions (Register Pressure를 고려한 다중 출력 명령어를 위한 개선된 코드 생성 방법)

  • Youn, Jong-Hee M.;Paek, Yun-Heung;Ko, Kwang-Man
    • The KIPS Transactions:PartA
    • /
    • v.19A no.1
    • /
    • pp.45-50
    • /
    • 2012
  • The demand for faster execution time and lower energy consumption has compelled architects of embedded processors to customize it to the needs of their target applications. These processors consequently provide a rich set of specialized instructions in order to enable programmers to access these features. Such an instruction is typically a $multi$-$output$ $instruction$ (MOI), which outputs multiple results parallely in order to exploit inherent underlying hardware parallelism. Earlier study has exhibited that MOIs help to enhance performance in aspect of instruction counts and code size. However the earlier algorithm does not consider the register pressure. So, some selected MOIs introduce register spill/reload code that increases the code size and instruction count. To attack this problem, we introduce a novel iterated instruction selection algorithm based on the register pressure of each selected MOIs. The experimental results show the suggested algorithm achieves 3% code-size reduction and 2.7% speed-up on average.

Study on LLVM application in Parallel Computing System (병렬 컴퓨팅 시스템에서 LLVM 응용 연구)

  • Cho, Jungseok;Cho, Doosan;Kim, Yongyeon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.395-399
    • /
    • 2019
  • In order to support various parallel computing systems, it is necessary to extend LLVM IR to more efficiently support vector / matrix and to design LLVM IR to machine code as a new algorithm. As shown in the IR example, RISC instruction generation is naturally generated because the RISC instruction is basically composed of the RISC instruction, and the vector instruction is also not supported. There is a need for new IR structures, command generation algorithms and related extensions to support vector / matrix more robustly. To do this, it is important to map each instruction in the LLVM IR to the appropriate instruction in the target architecture (vector / matrix) (instruction selection algorithm). It is necessary to understand the meaning of LLVM IR command, to compare the meaning of each instruction of the target architecture with syntax, and to select the instruction that matches the pattern to make mapping efficient.

Profile Guided Selection of ARM and Thumb Instructions at Function Level (함수 수준에서 프로파일 정보를 이용한 ARM과 Thumb 명령어의 선택)

  • Soh Changho;Han Taisook
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.227-235
    • /
    • 2005
  • In the embedded system domain, both memory requirement and energy consumption are great concerns. To save memory and energy, the 32 bit ARM processor supports the 16 bit Thumb instruction set. For a given program, the Thumb code is typically smaller than the ARM code. However, the limitations of the Thumb instruction set can often lead to generation of poorer quality code. To generate codes with smaller size but a little slower execution speed, Krishnaswarmy suggests a profiling guided selection algorithm at module level for generating mixed ARM and Thumb codes for application programs. The resulting codes of the algorithm give significant code size reductions with a little loss in performance. When the instruction set is selected at module level, some functions, which should be compiled in Thumb mode to reduce code size, are compiled to ARM code. It means we have additional code size reduction chance. In this paper, we propose a profile guided selection algorithm at function level for generating mixed ARM and Thumb codes for application programs so that the resulting codes give additional code size reductions without loss in performance compared to the module level algorithm. We can reduce 2.7% code size additionally with no performance penalty

Techniques for special instruction generation for DSP ASIP (DSP영 ASIP을 위한 특수 명령어 생성 기법)

  • 김홍철;황승호
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.7
    • /
    • pp.1-10
    • /
    • 1998
  • The first thing in designing application-specific instruction set processor is having instruction set closely matching hardware characteristics. This instruction set design problem can be more complicated when cobined with implementation method selection problem of each instruction. Our processor model supports two kinds of instructions-primitive or special instructions. Primitive instructions are implemented using common multifunctional hardware such as ALU. Special instructions require a set of dedicated hardware, which actually functions as a coprocessor to the main processor. In this case, special instructions and primitive instructions can be executed independently. In this paper, we present novel algorithm for genrating special instructions for given application. Parallelism between special instructions and primitive instructions is also considered during the performance estimation stage of generated special instructions.

  • PDF

A Code Scheduling Algorithm for Pipelined Architecture (파이프라인 아키텍쳐를 위한 코드 스케쥴링 알고리듬)

  • 김은성;임인칠
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.7
    • /
    • pp.746-758
    • /
    • 1988
  • This paper proposes a code scheduling algorithm which gives a software solution to the pipeline interlock. This algorithm provides a heuristic solution by recordering the instructions, instead of using hardware interlock mechanism when pipeline interlock prevents the execution of a machine instruction in a pipelined architecture. Program code size and overall execution time can be reduced due to the increased flexibility in the selection of instructions, which is possible from the alleviated ordering restriction on the use of conflict resources.

  • PDF

Application Specific Processor Design for H.264 Decoder with a Configurable Embedded Processor

  • Han, Jin-Ho;Lee, Mi-Young;Bae, Young-Hwan;Cho, Han-Jin
    • ETRI Journal
    • /
    • v.27 no.5
    • /
    • pp.491-496
    • /
    • 2005
  • An application specific processor for an H.264 decoder with a configurable embedded processor is designed in this research. The motion compensation, inverse integer transform, inverse quantization, and entropy decoding algorithm of H.264 decoder software are optimized. We improved the performance of the processor with instruction-level hardware optimization, which is tailored to configurable embedded processor architecture. The optimized instructions for video processing can be used in other video compression standards such as MPEG 1, 2, and 4. A significant performance improvement is achieved with high flexibility. Experimental results show that we could achieve 300% performance for the H.264 baseline profile level 2 decoder.

  • PDF

An Optimal Instruction Fetch Strategy for SMT Processors (SMT 프로세서에 최적화된 명령어 페치 전략에 관한 연구)

  • 홍인표;문병인;김문경;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.5C
    • /
    • pp.512-521
    • /
    • 2002
  • Recently, conventional superscalar RISC processors arrive their performance limit, and many researches on the next-generation architecture are concentrated on SMT(Simultaneous Multi-Threading). In SMT processors, multiple threads are executed simultaneously and share hardware resources dynamically. In this case, it is more important to supply instructions from multiple threads to processor core efficiently than ever. Because SMT architecture shows higher IPC(Instructions per cycle) than superscalar architecture, performance is influenced by fetch bandwidth and the size of fetch queue. Moreover, to use TLP(Thread Level Parallelism) efficiently, fetch thread selection algorithm and fetch bandwidth for each selected threads must be carefully designed. Thus, in this paper, the performance values influenced by these factors are analyzed. Based on the results, an optimal instruction fetch strategy for SMT processors is proposed.

Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload

  • Kakavand, Mohsen;Mustapha, Norwati;Mustapha, Aida;Abdullah, Mohd Taufik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3884-3910
    • /
    • 2016
  • Intrusion Detection System (IDS) in general considers a big amount of data that are highly redundant and irrelevant. This trait causes slow instruction, assessment procedures, high resource consumption and poor detection rate. Due to their expensive computational requirements during both training and detection, IDSs are mostly ineffective for real-time anomaly detection. This paper proposes a dimensionality reduction technique that is able to enhance the performance of IDSs up to constant time O(1) based on the Principle Component Analysis (PCA). Furthermore, the present study offers a feature selection approach for identifying major components in real time. The PCA algorithm transforms high-dimensional feature vectors into a low-dimensional feature space, which is used to determine the optimum volume of factors. The proposed approach was assessed using HTTP packet payload of ISCX 2012 IDS and DARPA 1999 dataset. The experimental outcome demonstrated that our proposed anomaly detection achieved promising results with 97% detection rate with 1.2% false positive rate for ISCX 2012 dataset and 100% detection rate with 0.06% false positive rate for DARPA 1999 dataset. Our proposed anomaly detection also achieved comparable performance in terms of computational complexity when compared to three state-of-the-art anomaly detection systems.

Real-time Optimization of H.264 Software Encoder on Embedded DSP System (임베디드 DSP 기반 시스템을 위한 H.264 소프트웨어 부호기의 실시간 최적화)

  • Roh, Si-Bong;Ahn, Hee-June;Lee, Myeong-Jin;Oh, Hyuk-Jun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.10C
    • /
    • pp.983-991
    • /
    • 2009
  • While H.264/AVC is in wide use for multimedia applications such as DMB and IPTV service, we have limited usage cases for embedded real-time applications due to its high computational demand. The paper provides judicious guide line for optimization method selection, by presenting the detailed experiments data through the development process of a real time H.264 software encoder on embedded DSP. The experimental analysis includes an intensive profiling analysis, fast algorithm application, optimal memory assignment, and intrinsic-based instruction selection. We have realized a real-time software that encodes CIF resolution videos 15 fps on TMS320DM64x processors.

Design and Implementation of an e-NIE Learning Model for Technical High Schools (공업계 고등학교를 위한 전자신문활용교육 학습 모형의 설계 및 구현)

  • Kang Oh-Han;Lee Gyoung-Hwan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.11 no.2
    • /
    • pp.18-28
    • /
    • 2006
  • We consider a Direct Input Output Manufacturing System(DIOMS) which has a munber of machine centers placed along a built-in Automated Storage/Retrieval System(AS/RS). The Storage/Retrieval (S/R) machine handles parts placed on pallets for the operational aspect of DIOMS and determines the optimal operating policy by combining computer simulation and genetic algorithm. The operational problem includes: input sequencing control, dispatching rule of the S/R machine, machine center-based part type selection rule, and storage assignment policy. For each operating policy, several different policies are considered based on the known research results. In this paper, using the computer simulation and genetic algorithm we suggest a method which gives the optimal configuration of operating policies within reasonable computation time.

  • PDF