• Title/Summary/Keyword: Register Access

Search Result 93, Processing Time 0.024 seconds

Performance Enhancement of Embedded Software Using Register Promotion (레지스터 프로모션을 이용한 내장형 소프트웨어의 성능 향상)

  • Lee Jong-Yeol
    • The KIPS Transactions:PartA
    • /
    • v.11A no.5
    • /
    • pp.373-382
    • /
    • 2004
  • In this paper, a register promotion technique that translates memory accesses to register accesses is presented to enhance embedded software performance. In the proposed method, a source code is profiled to generate a memory trace. From the profiling results, target functions with high dynamic call counts are selected, and the proposed register promotion technique is applied only to the target functions to save the compilation time. The memory trace of the target functions is searched for the memory accesses that result in cycle count reduction when replaced by register accesses, and they are translated to register accesses by modifying the intermediate code and allocating promotion registers. The experiments on MediaBench and DSPstone benchmark programs show that the proposed method increases the performance by 14% and 18% on the average for ARM and MCORE, respectively.

Design of Vector Register Architecture in DSP Processor for Efficient Multimedia Processing

  • Wu, Chou-Pin;Wu, Jen-Ming
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.7 no.4
    • /
    • pp.229-234
    • /
    • 2007
  • In this paper, we present an efficient instruction set architecture using vector register file hardware to accelerate operation of general matrix-vector operations in DSP microprocessor. The technique enables in-situ row-access as well as column access to the register files. It can reduce the number of memory access significantly. The technique is especially useful for block-based video signal processing kernels such as FFT/IFFT, DCT/IDCT, and two-dimensional filtering. We have applied the new instruction set architecture to in-loop deblocking filter processing in H.264 decoder. Performance comparisons show that the required load/store operations for the in-loop deblocking filter can be reduced about 42%. The architecture would improve the processing speed, and code density in DSP microprocessor especially for video signal processing substantially.

Code Size Reduction and Execution performance Improvement with Instruction Set Architecture Design based on Non-homogeneous Register Partition (코드감소와 성능향상을 위한 이질 레지스터 분할 및 명령어 구조 설계)

  • Kwon, Young-Jun;Lee, Hyuk-Jae
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.12
    • /
    • pp.1575-1579
    • /
    • 1999
  • Embedded processors often accommodate two instruction sets, a standard instruction set and a compressed instruction set. With the compressed instruction set, code size can be reduced while instruction count (and consequently execution time) can be increased. To achieve code size reduction without significant increase of execution time, this paper proposes a new compressed instruction set architecture, called TOE (Two Operations Execution). The proposed instruction set format includes the parallel bit that indicates an instruction can be executed simultaneously with the next instruction. To add the parallel bit, TOE instruction format reduces the destination register field. The reduction of the register field limits the number of registers that are accessible by an instruction. To overcome the limited accessibility of registers, TOE adapts non-homogeneous register partition in which registers are divided into multiple subsets, each of which are accessed by different groups of instructions. With non-homogeneous registers, each instruction can access only a limited number of registers, but an entire program can access all available registers. With efficient non-homogeneous register allocator, all registers can be used in a balanced manner. As a result, the increase of code size due to register spills is negligible. Experimental results show that more than 30% of TOE instructions can be executed in parallel without significant increase of code size when compared to existing Thumb instruction set.

  • PDF

A New Register Allocation Technique for Performance Enhancement of Embedded Software (내장형 소프트웨어의 성능 향상을 위한 새로운 레지스터 할당 기법)

  • Jong-Yeol, Lee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.10
    • /
    • pp.85-94
    • /
    • 2004
  • In this paper, a register allocation techlique that translates memory accesses to register accesses Is presented to enhance embedded software performance. In the proposed method, a source code is profiled to generate a memory trace. From the profiling results, target functions with high dynamic call counts are selected, and the proposed register allocation technique is applied only to the target functions to save the compilation time. The memory trace of the target functions is searched for the memory accesses that result in cycle count reduction when replaced by register accesses, and they are translated to register accesses by modifying the intermediate code and allocating Promotion registers. The experiments where the performance is measured in terms of the cycle count on MediaBench and DSPstone benchmark programs show that the proposed method increases the performance by 14% and 18% on the average for ARM and MCORE, respectively.

Computing and Reducing Transient Error Propagation in Registers

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.121-130
    • /
    • 2011
  • Recent research indicates that transient errors will increasingly become a critical concern in microprocessor design. As embedded processors are widely used in reliability-critical or noisy environments, it is necessary to develop cost-effective fault-tolerant techniques to protect processors against transient errors. The register file is one of the critical components that can significantly affect microprocessor system reliability, since registers are typically accessed very frequently, and transient errors in registers can be easily propagated to functional units or the memory system, leading to silent data error (SDC) or system crash. This paper focuses on investigating the impact of register file soft errors on system reliability and developing cost-effective techniques to improve the register file immunity to soft errors. This paper proposes the register vulnerability factor (RVF) concept to characterize the probability that register transient errors can escape the register file and thus potentially affect system reliability. We propose an approach to compute the RVF based on register access patterns. In this paper, we also propose two compiler-directed techniques and a hybrid approach to improve register file reliability cost-effectively by lowering the RVF value. Our experiments indicate that on average, RVF can be reduced to 9.1% and 9.5% by the hyperblock-based instruction re-scheduling and the reliability-oriented register assignment respectively, which can potentially lower the reliability cost significantly, without sacrificing the register value integrity.

A Design of General Communication Protocol for Online Game (온라인 게임을 위한 범용 통신 프로토콜 설계)

  • 두길수;정성종;안동언
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.233-236
    • /
    • 2001
  • When we execute a online game in the network, a lots of game informations and user informations are transfered between game server and game clients. This information has deferent data structure and communication protocol. So, game developers will be design the communication protocols and packet structures after analyze the game. And, they will be design the database for user management. It will be lots of overload to the game developers. From a user point of view, they must be register to each game server to use several games. It is a undesible operation for the game users. This problems are solved by using the communication server which operates commonly with several game servers. Game providers can get the benefit of multiple access of users and get freedom from user management. Game users can access the several games which connected to communication server by register the communication server only. In this paper we design the communication packet and propose a communication protocol which operates on the communication server described previous.

  • PDF

A Design and Verification of an Efficient Control Unit for Optical Processor (광프로세서를 위한 효율적인 제어회로 설계 및 검증)

  • Lee Won-Joo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.23-30
    • /
    • 2006
  • This paper presents design andd verification of a circuit that improves the control-operation problems of Stored Program Optical Computer (SPOC), which is an optical computer using $LiNbO_3$ optical switching element. Since the memory of SPOC takes the Delay Line Memory (DLM) architecture and instructions that are needless of operands should go though memory access stages, SPOC memory have problems; it takes immoderate access time and unnecessary operations are executed in Arithmetic Logical Unit (ALU) because desired operations can't be selectively executed. In this paper, improvement on circuit has been achieved by removing the memory access of instructions that are needless of operands by decoding instructions before locating operand. Unnecessary operations have been reduced by sending operands to some specific operational units, not to all the operational units in ALD. We show that total execution time of a program is minimized by using the Dual Instruction Register(DIR) architecture.

Register Pressure Aware Code Selection Algorithm for Multi-Output Instructions (Register Pressure를 고려한 다중 출력 명령어를 위한 개선된 코드 생성 방법)

  • Youn, Jong-Hee M.;Paek, Yun-Heung;Ko, Kwang-Man
    • The KIPS Transactions:PartA
    • /
    • v.19A no.1
    • /
    • pp.45-50
    • /
    • 2012
  • The demand for faster execution time and lower energy consumption has compelled architects of embedded processors to customize it to the needs of their target applications. These processors consequently provide a rich set of specialized instructions in order to enable programmers to access these features. Such an instruction is typically a $multi$-$output$ $instruction$ (MOI), which outputs multiple results parallely in order to exploit inherent underlying hardware parallelism. Earlier study has exhibited that MOIs help to enhance performance in aspect of instruction counts and code size. However the earlier algorithm does not consider the register pressure. So, some selected MOIs introduce register spill/reload code that increases the code size and instruction count. To attack this problem, we introduce a novel iterated instruction selection algorithm based on the register pressure of each selected MOIs. The experimental results show the suggested algorithm achieves 3% code-size reduction and 2.7% speed-up on average.

A Register-Based Caching Technique for the Advanced Performance of Multithreaded Models (다중스레드 모델의 성능 향상을 위한 가용 레지스터 기반 캐슁 기법)

  • Go, Hun-Jun;Gwon, Yeong-Pil;Yu, Won-Hui
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.107-116
    • /
    • 2001
  • A multithreaded model is a hybrid one which combines locality of execution of the von Neumann model with asynchronous data availability and implicit parallelism of the dataflow model. Much researches that have been made toward the advanced performance of multithreaded models are about the cache memory which have been proved to be efficient in the von Neumann model. To use an instruction cache or operand cache, the multithreaded models must have cache memories. If cache memories are added to the multithreaded model, they may have the disadvantage of high implementation cost in the mode. To solve these problems, we did not add cache memory but applied the method of executing the caching by using available registers of the multithreaded models. The available register-based caching method is one that use the registers which are not used on the execution of threads. It may accomplish the same effect as the cache memory. The multithreaded models can compute the number of available registers to be used during the process of the register optimization, and therefore this method can be easily applied on the models. By applying this method, we can also remove the access conflict and the bottleneck of frame memories. When we applied the proposed available register-based caching method, we found that there was an improved performance of the multithreaded model. Also, when the available-register-based caching method is compared with the cache based caching method, we found that there was the almost same execution overhead.

  • PDF

Study for Block Cipher Operating Mode Using Counter (카운터를 사용한 블록암호 운영모드에 관한 연구)

  • Yang, Sang-Keun;Kim, Gil-Ho;Park, Chang-Soo;Cho, Gyeong-Yeon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.243-246
    • /
    • 2008
  • This thesis suggests block cipher operating mode using ASR(Arithmetic Shift Register). ASR is ratted arithmetic shift register which is sequence that is not 0 but initial value $A_0$ multiplies not 0 or 1 but free number D on $GF(2^n)$. This thesis proposes ASR mode which changes output multiplying d and Floating ASR mode which has same function but having strengthened stability altering d. If we use ASR's output as a counter, there's advantage that it has higher stability and better speed than CTR. Also, ASR mode and FASR mode have advantage of Random access which is not being functioned on CTR mode, they can be widely used to any part which Random access is needed.

  • PDF