• Title/Summary/Keyword: Compressed Instruction

Search Result 7, Processing Time 0.02 seconds

Code Size Reduction and Execution performance Improvement with Instruction Set Architecture Design based on Non-homogeneous Register Partition (코드감소와 성능향상을 위한 이질 레지스터 분할 및 명령어 구조 설계)

  • Kwon, Young-Jun;Lee, Hyuk-Jae
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.12
    • /
    • pp.1575-1579
    • /
    • 1999
  • Embedded processors often accommodate two instruction sets, a standard instruction set and a compressed instruction set. With the compressed instruction set, code size can be reduced while instruction count (and consequently execution time) can be increased. To achieve code size reduction without significant increase of execution time, this paper proposes a new compressed instruction set architecture, called TOE (Two Operations Execution). The proposed instruction set format includes the parallel bit that indicates an instruction can be executed simultaneously with the next instruction. To add the parallel bit, TOE instruction format reduces the destination register field. The reduction of the register field limits the number of registers that are accessible by an instruction. To overcome the limited accessibility of registers, TOE adapts non-homogeneous register partition in which registers are divided into multiple subsets, each of which are accessed by different groups of instructions. With non-homogeneous registers, each instruction can access only a limited number of registers, but an entire program can access all available registers. With efficient non-homogeneous register allocator, all registers can be used in a balanced manner. As a result, the increase of code size due to register spills is negligible. Experimental results show that more than 30% of TOE instructions can be executed in parallel without significant increase of code size when compared to existing Thumb instruction set.

  • PDF

The Compressed Instruction Set Architecture for the OpenRISC Processor (OpenRISC 프로세서를 위한 압축 명령어 집합 구조)

  • Kim, Dae-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.10
    • /
    • pp.11-23
    • /
    • 2012
  • To achieve efficient code size reduction, this paper proposes a new compressed instruction set architecture for the OpenRISC architecture. The new instructions and their corresponding formats are designed by the profiling information of the existing instruction usage. New 16-bit instructions and 32-bit instructions are proposed to compressed the existing 32-bit instructions and instruction sequences, respectively. The proposed instructions can be classified into three types. The first is the new 16-bit instructions for the frequent normal 32-bit instructions such as add, load, store, branch, and jump instructions. The second type is the new 32-bit instructions for the consecutive two load instructions, two store instructions, and 32-bit data mov instructions. Finally, two new 32-bit instructions are proposed to compress function prolog and epilog code, respectively. OpenRISC hardware decoder is extended to support the new instructions. Experiments show that the efficiency of code size reduction improves by an average of 30.4% when compared to the OR1200 instruction set architecture without loss of execution performance.

A Program Code Compression Method with Very Fast Decoding for Mobile Devices (휴대장치를 위한 고속복원의 프로그램 코드 압축기법)

  • Kim, Yong-Kwan;Wee, Young-Cheul
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.11
    • /
    • pp.851-858
    • /
    • 2010
  • Most mobile devices use a NAND flash memory as their secondary memory. A compressed code of the firmware is stored in the NAND flash memory of mobile devices in order to reduce the size and the loading time of the firmware from the NAND flash memory to a main memory. In order to use a demand paging properly, a compressed code should be decompressed very quickly. The thesis introduces a new dictionary based compression algorithm for the fast decompression. The introduced compression algorithm uses a different method with the current LZ method by storing the "exclusive or" value of the two instructions when the instruction for compression is not equal to the referenced instruction. Therefore, the thesis introduces a new compression format that minimizes the bit operation in order to improve the speed of decompression. The experimental results show that the decoding time is reduced up to 5 times and the compression ratio is improved up to 4% compared to the zlib. Moreover, the proposed compression method with the fast decoding time leads to 10-20% speed up of booting time compared to the booting time of the uncompressed method.

A Design and Implementation of 32-bit Pipeline RISC-V Processor Supporting Compressed Instructions for Memory Efficiency (메모리 효율성을 높이기 위한 압축 명령어를 지원하는 32-비트 파이프라인 RISC-V프로세서 설계 및 구현)

  • Hyeonjin Sim;Yongwoo Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.3
    • /
    • pp.7-13
    • /
    • 2024
  • With the development of technologies such as the Internet of Things (IoT) and autonomous vehicles, research is being conducted on embedded processors that meet high performance, low power, and memory efficiency. The "C" expansion of the RISC-V processor is required to increase memory efficiency. In this paper, we propose an RV32IC processor and compare the benchmark performance score of the RV32I processor with the code size generated by the GCC compiler. In addition, we propose memory access and combination methods to support 16-bit compression commands, and command extension methods. The proposed RV32IC processor satisfies the maximum operating frequency of 50 MHz on the Artix-7 FPGA. The performance was checked using the benchmark programs of the Dhrystone and Coremark, and the code sizes of the RV32I and RV32IC generated by the GCC compiler were compared. The proposed processor RV32IC decreased DMIPS/MHz by 2.72% and Coremark/MHz by 0.61% compared to RV32I, but Coremark's code size decreased by 14.93%.

  • PDF

THE THERAPEUTIC EFFECT OF FLUORIDE-CONTAINING ADHESIVE TAPE ON DENTIN HYPERSENSITIVITY (불소함유 접착 테이프의 상아질 지각과민증 치료효과)

  • Jang, Hyang-Gil;Lee, Nan-Young;Lee, Sang-Ho
    • Journal of the korean academy of Pediatric Dentistry
    • /
    • v.36 no.3
    • /
    • pp.367-376
    • /
    • 2009
  • In this clinical research, the fluoride tape(SCMC-T-5) using fluoride(NaF) was developed and manufactured and the treatment effect of the fluoride tape in dentin hypersensitivity patients was evaluated and compared with the effect of existing fluoride varnish($CavityShield^{TM}$). Twenty two healthy adult patients(88 teeth) having dentin hypersensitivity participated in this clinical research and they were divided into two groups. The fluoride product was applied according to the manufacturer's instruction and the level of pain in the tooth after giving irritation using compressed air and ice stick was measured just after the application, after 3 days, after a week and after 4 weeks each using visual analog scale(VAS). In the experimental group, compared with the early VAS scores, all other VAS scores showed the significant decreases statistically. In the control group, all VAS scores except the VAS score of 34.091(air) measured 3 days after(using the irritation examination by the compressed air) showed the significant decreases statistically when compared with the early VAS scores. The fluoride tape and fluoride varnish used in this clinical research were able to treat the dentin hypersensitivity effectively.

  • PDF

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

Improved Original Entry Point Detection Method Based on PinDemonium (PinDemonium 기반 Original Entry Point 탐지 방법 개선)

  • Kim, Gyeong Min;Park, Yong Su
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.6
    • /
    • pp.155-164
    • /
    • 2018
  • Many malicious programs have been compressed or encrypted using various commercial packers to prevent reverse engineering, So malicious code analysts must decompress or decrypt them first. The OEP (Original Entry Point) is the address of the first instruction executed after returning the encrypted or compressed executable file back to the original binary state. Several unpackers, including PinDemonium, execute the packed file and keep tracks of the addresses until the OEP appears and find the OEP among the addresses. However, instead of finding exact one OEP, unpackers provide a relatively large set of OEP candidates and sometimes OEP is missing among candidates. In other words, existing unpackers have difficulty in finding the correct OEP. We have developed new tool which provides fewer OEP candidate sets by adding two methods based on the property of the OEP. In this paper, we propose two methods to provide fewer OEP candidate sets by using the property that the function call sequence and parameters are same between packed program and original program. First way is based on a function call. Programs written in the C/C++ language are compiled to translate languages into binary code. Compiler-specific system functions are added to the compiled program. After examining these functions, we have added a method that we suggest to PinDemonium to detect the unpacking work by matching the patterns of system functions that are called in packed programs and unpacked programs. Second way is based on parameters. The parameters include not only the user-entered inputs, but also the system inputs. We have added a method that we suggest to PinDemonium to find the OEP using the system parameters of a particular function in stack memory. OEP detection experiments were performed on sample programs packed by 16 commercial packers. We can reduce the OEP candidate by more than 40% on average compared to PinDemonium except 2 commercial packers which are can not be executed due to the anti-debugging technique.