• Title/Summary/Keyword: Algorithm Instruction

Search Result 155, Processing Time 0.032 seconds

A Efficient Calculation for log and exponent with A Dual Phase Instruction Architecture (효율적인 로그와 지수 연산을 위한 듀얼 페이즈 명령어 구조)

  • Kim, Jun-Seo;Lee, Kwang-Yeob;Kwak, Jae-Chang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.320-323
    • /
    • 2010
  • This paper proposes efficient log and exponent calculation methods using a dual phase instruction set without additional ALU unit for a mobile enviroment. Using the Dual Phase Instruction set, it extracts exponent and mantissa from expression of floating point and calculates 24bit single precision floating point of log approximation using the Taylor series expansion algorithm. And with dual phase instruction set, it reduces instruction excution cycles. The proposed Dual Phase architecture reduces the performance degradation and maintain smaller size.

  • PDF

An Implementation of Efficient Quicksort Utilizing SIMD-Based VBP Technique (SIMD 기반의 VBP 기법을 적용한 효율적인 퀵정렬의 구현)

  • Hong, Gilseok;Kim, Hongyeon;Kang, Seonghyeon;Min, Jun-Ki
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.498-503
    • /
    • 2017
  • SIMD (Single Instruction Multiple Data) is a representative parallelization architecture that processes multiple data loaded in a SIMD register with a single instruction. Quicksort is a sorting algorithm that picks an element as a pivot from the array and reorders the array such that all elements having the values less than the pivot value are located in the left side on the pivot as well as all elements having the value greater than the pivot value are located in the right side on the pivot and then the algorithm performs the same task on both sublist recursively. In this paper, we propose an efficient Quicksort algorithm applying the SIMD instructions which minimally invokes conditional branches to avoid the performance degradation incurred by branch misprediction in a pipeline architecture. In addition, we improve the performance of the Quicksort algorithm by fetching data into a SIMD register as a byte unit to apply VBP (Vertical Bit Parallel) and the early pruning technique.

Profile Guided Selection of ARM and Thumb Instructions at Function Level (함수 수준에서 프로파일 정보를 이용한 ARM과 Thumb 명령어의 선택)

  • Soh Changho;Han Taisook
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.227-235
    • /
    • 2005
  • In the embedded system domain, both memory requirement and energy consumption are great concerns. To save memory and energy, the 32 bit ARM processor supports the 16 bit Thumb instruction set. For a given program, the Thumb code is typically smaller than the ARM code. However, the limitations of the Thumb instruction set can often lead to generation of poorer quality code. To generate codes with smaller size but a little slower execution speed, Krishnaswarmy suggests a profiling guided selection algorithm at module level for generating mixed ARM and Thumb codes for application programs. The resulting codes of the algorithm give significant code size reductions with a little loss in performance. When the instruction set is selected at module level, some functions, which should be compiled in Thumb mode to reduce code size, are compiled to ARM code. It means we have additional code size reduction chance. In this paper, we propose a profile guided selection algorithm at function level for generating mixed ARM and Thumb codes for application programs so that the resulting codes give additional code size reductions without loss in performance compared to the module level algorithm. We can reduce 2.7% code size additionally with no performance penalty

Fine Grain Real-Time Code Scheduling Using an Adaptive Genetic Algorithm (적합 유전자 알고리즘을 이용한 실시간 코드 스케쥴링)

  • Chung, Tai-Myoung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1481-1494
    • /
    • 1997
  • In hard real-time systems, a timing fault may yield catastrophic results. Dynamic scheduling provides the flexibility to compensate for unexpected events at runtime; however, scheduling overhead at runtime is relatively large, constraining both the accuracy of the timing and the complexity of the scheduling analysis. In contrast, static scheduling need not have any runtime overhead. Thus, it has the potential to guarantee the precise time at which each instruction implementing a control action will execute. This paper presents a new approach to the problem of analyzing high-level language code, augmented by arbitrary before and after timing constraints, to provide a valid static schedule. Our technique is based on instruction-level complier code scheduling and timing analysis, and can ensure the timing of control operations to within a single instruction clock cycle. Because the search space for a valid static schedule is very large, a novel adaptive genetic search algorithm was developed.

  • PDF

Application Specific Processor Design for H.264 Decoder with a Configurable Embedded Processor

  • Han, Jin-Ho;Lee, Mi-Young;Bae, Young-Hwan;Cho, Han-Jin
    • ETRI Journal
    • /
    • v.27 no.5
    • /
    • pp.491-496
    • /
    • 2005
  • An application specific processor for an H.264 decoder with a configurable embedded processor is designed in this research. The motion compensation, inverse integer transform, inverse quantization, and entropy decoding algorithm of H.264 decoder software are optimized. We improved the performance of the processor with instruction-level hardware optimization, which is tailored to configurable embedded processor architecture. The optimized instructions for video processing can be used in other video compression standards such as MPEG 1, 2, and 4. A significant performance improvement is achieved with high flexibility. Experimental results show that we could achieve 300% performance for the H.264 baseline profile level 2 decoder.

  • PDF

Further Specialization of Clustered VLIW Processors: A MAP Decoder for Software Defined Radio

  • Ituero, Pablo;Lopez-Vallejo, Marisa
    • ETRI Journal
    • /
    • v.30 no.1
    • /
    • pp.113-128
    • /
    • 2008
  • Turbo codes are extensively used in current communications standards and have a promising outlook for future generations. The advantages of software defined radio, especially dynamic reconfiguration, make it very attractive in this multi-standard scenario. However, the complex and power consuming implementation of the maximum a posteriori (MAP) algorithm, employed by turbo decoders, sets hurdles to this goal. This work introduces an ASIP architecture for the MAP algorithm, based on a dual-clustered VLIW processor. It displays the good performance of application specific designs along with the versatility of processors, which makes it compliant with leading edge standards. The machine deals with multi-operand instructions in an innovative way, the fetching and assertion of data is serialized and the addressing is automatized and transparent for the programmer. The performance-area trade-off of the proposed architecture achieves a throughput of 8 cycles per symbol with very low power dissipation.

  • PDF

Instruction-corruption-less Binary Modification Mechanism for Static Stack Protections (이진 조작을 통한 정적 스택 보호 시 발생하는 명령어 밀림현상 방지 기법)

  • Lee, Young-Rim;Kim, Young-Pil;Yoo, Hyuck
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.71-75
    • /
    • 2008
  • Many sensor operating systems have memory limitation constraint; therefore, stack memory areas of threads resides in a single memory space. Because most target platforms do not have hardware MMY (Memory Management Unit), it is difficult to protect each stack area. The method to solve this problem is to exchange original stack handling instructions in binary code for wrapper routines to protect stack area. In this exchanging phase, instruction corruption problem occurs due to difference of each instruction length between stack handling instructions and branch instructions. In this paper, we propose the algorithm to call a target routine without instruction corruption problem. This algorithm can reach a target routine by repeating branch instructions to have a short range. Our solution makes it easy to apply security patch and maintain upgrade of software of sensor node.

Efficient ARIA Cryptographic Extension to a RISC-V Processor (RISC-V 프로세서상에서의 효율적인 ARIA 암호 확장 명령어)

  • Lee, Jin-jae;Park, Jong-uk;Kim, Min-jae;Kim, Ho-won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.309-322
    • /
    • 2021
  • In this study, an extension instruction set for high-speed operation of the ARIA block cipher algorithm on RISC-V processor is added to support high-speed cryptographic operation on low performance IoT devices. We propose the efficient ARIA cryptographic instruction set which runs on a conventional 32-bit processor. Compared to the existing software cryptographic operation, there is a significant performance improvement with proposed instruction set.

A Code Optimization Algorithm of RISC Pipelined Architecture (RISC 파이프라인 아키텍춰의 코드 최적화 알고리듬)

  • 김은성;임인칠
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.8
    • /
    • pp.937-949
    • /
    • 1988
  • This paper proposes a code optimization algorithm for dealing with hazards which are occurred in pipelined architecture due to resource dependence between executed instructions. This algorithm solves timing hazard which results from resource conflict between concurrently executing instructions, and sequencing hazard due to the delay time for branch target decision by reconstructing of instruction sequence without pipeline interlock. The reconstructed codes can be generated efficiently by considering timing hazard and sequencing hazard simultaneously. And dynamic execution time of program is improved by considering structral hazard which can be existed when pipeline is controlled dynamically.

  • PDF

A Code Scheduling Algorithm for Pipelined Architecture (파이프라인 아키텍쳐를 위한 코드 스케쥴링 알고리듬)

  • 김은성;임인칠
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.7
    • /
    • pp.746-758
    • /
    • 1988
  • This paper proposes a code scheduling algorithm which gives a software solution to the pipeline interlock. This algorithm provides a heuristic solution by recordering the instructions, instead of using hardware interlock mechanism when pipeline interlock prevents the execution of a machine instruction in a pipelined architecture. Program code size and overall execution time can be reduced due to the increased flexibility in the selection of instructions, which is possible from the alleviated ordering restriction on the use of conflict resources.

  • PDF