Search | Korea Science

A Dual Integer Register File Structure for Temperature - Aware Microprocessors (온도 인지 마이크로프로세서를 위한 듀얼 레지스터 파일 구조)

Choi, Jin-Hang;Kong, Joon-Ho;Chung, Eui-Young;Chung, Sung-Woo
- Journal of KIISE:Computer Systems and Theory
- /
- v.35 no.12
- /
- pp.540-551
- /
- 2008
Today's microprocessor designs are not free from temperature as well as power consumption. As processor technology scales down, an on-chip circuitry increases power density, which incurs excessive temperature (hotspot) problem. To tackle thermal problems cost-effectively, Dynamic Thermal Management (DTM) has been suggested: DTM techniques have benefits of thermal reliability and cooling cost. However, they require trade-off between thermal control and performance loss. This paper proposes a dual integer register file structure to minimize the performance degradation due to DTM invocations. In on-chip thermal control, the most important functional unit is an integer register file. It is the hotspot unit because of frequent read and write data accesses. The proposed dual integer register file migrates read data accesses by adding an extra register file, thus reduces per-unit dynamic power dissipation. As a result, the proposed structure completely eliminates localized hotspots in the integer register file, resulting in much less performance degradation by average 13.35% (maximum 18%) improvement compared to the conventional DTM architecture.
PDF KSCI

Accelerating Symmetric and Asymmetric Cryptographic Algorithms with Register File Extension for Multi-words or Long-word Operation (다수 혹은 긴 워드 연산을 위한 레지스터 파일 확장을 통한 대칭 및 비대칭 암호화 알고리즘의 가속화)

Lee Sang-Hoon;Choi Lynn
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.43 no.2 s.308
- /
- pp.1-11
- /
- 2006
In this paper, we propose a new register file architecture called the Register File Extension for Multi-words or Long-word Operation (RFEMLO) to accelerate both symmetric and asymmetric cryptographic algorithms. Based on the idea that most of cryptographic algorithms heavily use multi-words or long-word operations, RFEMLO allows multiple contiguous registers to be specified as a single operand. Thus, a single instruction can specify a SIMD-style multi-word operation or a long-word operation. RFEMLO can be applied to general purpose processors by adding instruction set for multi-words or long-word operands and functional units for additional instruction set. To evaluate the performance of RFEMLO, we use Simplescalar/ARM 3.0 (with gcc 2.95.2) and run detailed simulations on various symmetric and asymmetric cryptographic algorithms. By applying RFEMLO, we could get maximum 62% and 70% reductions in the total instruction count of symmetric and asymmetric cryptographic algorithms respectively. Also, performance results show that a speedup of 1.4 to 2.6 can be obtained in symmetric cryptographic algorithms and a speedup of 2.5 to 3.3 can be obtained for asymmetric cryptographic algorithms when we apply RFEMLO to a processor with an in-order pipeline. We also found that RFEMLO can effectively improve the performance of these cryptographic algorithms with much less cost compared to issue-width increase available in Superscalar implementations. Moreover, the RFEMLO can also be applied to Superscalar processor, leading to additional 83% and 138% performance gain in symmetric and asymmetric cryptographic algorithms.
PDF KSCI

A VLIW Code Generation Technique Utilizing NOP Instruction Slot (NOP 명령어 슬롯을 활용하는 VLIW 코드 생성기법)

문현주;이승수;김석주;김석일
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10c
- /
- pp.615-617
- /
- 2000
본 논문에서는 VLIW 목적코드에 존재하는 NOP 명령어 슬롯에 의미있는 명령어를 중복 삽입하도록 함으로써 원래의 방법에서 존재하였던 자료의존관계를 해소하여 실행시간의 지연을 방지하는 기법을 연구하였다. 이 경우에 하나의 긴 명령어에 동일한 명령어가 둘 이상 포함될 수 있으므로 연산 관계에 이은 쓰기 단계에서 여러개의 명령어가 동일한 레지스터 파일의 주소에 쓰기를 함에 따른 충돌을 피할 수 없다. 본 논문에서는 연산처리 별로 쓰기 단계에서 연산 결과를 레지스터 파일에 쓰도록 허용할 것인지에 대한 정보를 명령어에 포함하는 TiPS 구조와 TiPS 구조에 적합한 목적코드 생성 알고리즘을 제안하였다. 목적코드 생성 알고리즘은 연산처리기별로 연속적으로 실행되는 명령어간의 자료의존관계를 해소하기 위하여 NOP 대신에 다른 연산처리기에서 실행할 명령어를 수행하도록 동일한 명령어를 복사하여 할당할 수 있다. 실험 결과, 명령어 복사 기법은 기존의 기법에 비하여 전체 실행 사이클을 크게 단축시킬 수 있음을 보여주었다.
PDF

Research on Conditional Execution Out-of-order Instruction Issue Microprocessor Using Register Renaming Method (레지스터 리네이밍 방법을 사용하는 조건부 실행 비순차적 명령어 이슈 마이크로프로세서에 관한 연구)

최규백;김문경;홍인표;이용석
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.28 no.9A
- /
- pp.763-773
- /
- 2003
In this paper, we present a register renaming method for conditional execution out-of-order instruction issue microprocessors. Register renaming method reduces false data dependencies (write after read(WAR) and write after write(WAW)). To implement a conditional execution out-of-order instruction issue microprocessor using register renaming, we use a register file which includes both in-order state physical registers and look-ahead state physical registers to share all logical registers. And we design an in-order state indicator, a renaming state indicator, a physical register assigning indicator, a condition prediction buffer and a reorder buffer. As we utilize the above hardwares, we can do register renaming and trace the in-order state. In this paper, we present an improved register renaming method using smaller hardware resources than conventional register renaming method. And this method eliminates an associative lookup and provides a short recovery time.
PDF KSCI

A Real-time Architecture for Viterbi Scoring in HMM-Based Isolated word recognition systems (HMM을 이용한 고립 단어 인신 시스템에서의 Viterbi Scoring을 위한 실시간 VLSI 구조)

윤순영;이황수
- The Journal of the Acoustical Society of Korea
- /
- v.10 no.6
- /
- pp.64-70
- /
- 1991
본논문에서는 Hidden Markov Model 에 기초한 실시간 고립 단어 인식 시스템에서의 Viterbi 알 고리듬을 위한 전용 VLSI 구조를 제안하였다. 제안된 구조는 듀얼포트 레지스터 파일로 입출력 부하를 줄이고 가산-최소/최대 연산부의 병렬 연산 구조를 이용하여 실시간 동작이 가능하도록 설계되었다. 모 델 인자와 상태 변수의 값에 태그들을 덧붙임으로써 이 구조는 대표적인 HMM 구조들을 쉽게 구현할 수 있다.
PDF

A Partial Access Mechanism on a Register for Low-cost Embedded Multimedia ASIP (저비용 내장형 멀티미디어 프로세서를 위한 분할 레지스터 접근 구조)

Joe, Min-Young;Jeong, Ha-Young;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.45 no.9
- /
- pp.50-56
- /
- 2008
In this paper, we propose a partial access mechanism for low cost multimedia processors. Due to the cost increase of adding the SIMD register files and the execution blocks, we experience difficulties applying the SIMD instructions to low cost multimedia embedded processors. The proposed mechanism has the advantages of decreasing the cost burden of the additional hardware and enhancing total performance of the SIMD operation. We designed the ASIP in which the mechanism is applied and compared the latency of the SIMD operation regarding the use of instruction sets in the DSP benchmark. Then, we analyzed the total performance enhancement and the reduction in area burden by synthesizing the ASIP using 0.25um TSMC CMOS technology. As a result, there are approximately a 38% of performance increase and a 13.4% of area increase according to the proposed mechanism simulation.
PDF KSCI

Flexible Register File with a Window Structure (유연한 창문 구조를 갖는 레지스터 파일)

Gi Hyun Jung
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.29B no.7
- /
- pp.1-10
- /
- 1992
This paper gives on overview of register windowing structure and presents advantages and limitations. Based on these advantages and disadvantages, an original approach for the design of large register file is presented, analyzed and compared with existing approaches. The advantages and disadvantages of this new approach to register file design are discussed, and conditions under which it works better than the existing approaches are outlined. Design tradeoffs are examined in an analytic and empirical study, and the results of which are summarized in the conclusion of this paper.
PDF

The Hardware Architecture of Efficient Intra Predictor for H.264/AVC Decoder (H.264/AVC 복호기를 위한 효율적인 인트라 예측기 하드웨어 구조)

Kim, Ok;Ryoo, Kwang-Ki
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.47 no.5
- /
- pp.24-30
- /
- 2010
In this paper, we described intra prediction which is the one of techniques to be used for higher compression performance in H.264/AVC and proposed the design of intra predictor for efficient intra prediction mode processing. The proposed system is consist of processing elements, precomputation processing elements, an intra prediction controller, an internal memory and a register controller. The proposed system needs the reduced the computation cycles by using processing elements and precomputation processing element and also needs the reduced the number of access time to external memory by using internal memory and registers architecture. We designed the proposed system with Verilog-HDL and verified with suitable test vectors which are encoded YUV files. The proposed architecture belongs to the baseline profile of H.264/AVC decoder and is suitable for portable devices such as cellular phone with the size of $176{\times}144$. As a result of experiment, the performance of the proposed intra predictor is about 60% higher than that of the previous one.
PDF KSCI

An Analysis of the Partition Algorithm for Digital System Design (디지털 시스템 설계를 위한 분할 알고리즘의 분석)

최정필;한강룡;황인재;송기용
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2001.06a
- /
- pp.69-72
- /
- 2001
High-level synthesis generates a structural design that implements the given behavior and satisfies design constraints for area, performance, power consumption, packaging, testing and other criteria. Thus, high-level synthesis generates that register-transfer(RT) level structure from algorithm level description. High-level syntehsis consist of compiling, partitioning, scheduling This paper we study the partitioning process, and analysis the min-cut algorithm and simulated annealing algorithm.
PDF

An Efficient Resource-constrained Scheduling Algorithm (효율적 자원제한 스케줄링 알고리즘)

송호정;정회균;황인재;송기용
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2001.06a
- /
- pp.73-76
- /
- 2001
High-level synthesis generates a structural design that implements the given behavior and satisfies design constraints for area, performance, power consumption, packaging, testing and other criteria. Thus, high-level synthesis generates that register-transfer(RT) level structure from algorithm level description. High-level synthesis consist of compiling, partitioning, scheduling. In this paper, we proposed the efficient scheduling algorithm that find the number of the functional unit and scheduling into the minimum control step with silicon area resource constrained.
PDF

Search Result 17, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)