Search | Korea Science

A Wide-Window Superscalar Microprocessor Profiling Performance Model Using Multiple Branch Prediction (대형 윈도우에서 다중 분기 예측법을 이용하는 수퍼스칼라 프로세서의 프로화일링 성능 모델)

Lee, Jong-Bok
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.58 no.7
- /
- pp.1443-1449
- /
- 2009
This paper presents a profiling model of a wide-window superscalar microprocessor using multiple branch prediction. The key idea is to apply statistical profiling technique to the superscalar microprocessor with a wide instruction window and a multiple branch predictor. The statistical profiling data are used to obtain a synthetical instruction trace, and the consecutive multiple branch prediction rates are utilized for running trace-driven simulation on the synthesized instruction trace. We describe our design and evaluate it with the SPEC 2000 integer benchmarks. Our performance model can achieve accuracy of 8.5 % on the average.
PDF KSCI

A Theoretical Superscalar Microprocessor Performance Model with Limited Functional Units Using Instruction Dependencies (한정된 연산유닛에서 명령어 종속성을 이용하는 수퍼스칼라 프로세서의 이론적 성능 모델)

Lee, Jong-Bok
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.59 no.2
- /
- pp.423-428
- /
- 2010
In the initial design phase of superscalar microprocessors, a performance model is necessary. A theoretic performance model is very useful since performance for various architecture parameters can be obtained by simply computing equations, without repeating simulations, Previous studies established theoretic performance models using the relation between the instruction window size and the issue width, with the penalties due to branch mispredictions and cache misses. However, the study was intended for unlimited number of functional units, which is insufficient for the real case application. This paper proposes a superscalar microprocessor theoretical performance model which also works for the limited functional units. To enhance the accuracy of our limited functional unit model, instruction dependency rates are employed. By using trace-driven data of SPEC 2000 integer programs as input, this paper shows that the theoretically computed performance of superscalar microprocessor with limited number of functional units is quite similar to the measured performance.
https://doi.org/10.5370/KIEE.2010.59.2.423 인용 PDF KSCI

A design of floating-point arithmetic unit for superscalar microprocessor (수퍼스칼라 마이크로프로세서용 부동 소수점 연산회로의 설계)

최병윤;손승일;이문기
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.5
- /
- pp.1345-1359
- /
- 1996
This paper presents a floating point arithmetic unit (FPAU) for supescalar microprocessor that executes fifteen operations such as addition, subtraction, data format converting, and compare operation using two pipelined arithmetic paths and new rounding and normalization scheme. By using two pipelined arithmetic paths, each aritchmetic operation can be assigned into appropriate arithmetic path which high speed operation is possible. The proposed normalization an rouding scheme enables the FPAU to execute roundig operation in parallel with normalization and to reduce timing delay of post-normalization. And by predicting leading one position of results using input operands, leading one detection(LOD) operation to normalize results in the conventional arithmetic unit can be eliminated. Because the FPAU can execuate fifteen single-precision or double-precision floating-point arithmetic operations through three-stage pipelined datapath and support IEEE standard 754, it has appropriate structure which can be ingegrated into superscalar microprocessor.
PDF

On-Chip Multiprocessor with Simultaneous Multithreading

Park, Kyoung;Choi, Sung-Hoon;Chung, Yong-Wha;Hahn, Woo-Jong;Yoon, Suk-Han
- ETRI Journal
- /
- v.22 no.4
- /
- pp.13-24
- /
- 2000
As more transistors are integrated onto bigger die, an on-chip multiprocessor will become a promising alternative to the superscalar microprocessor that dominates today's microprocessor marketplace. This paper describes key parts of a new on-chip multiprocessor, called Raptor, which is composed of four 2-way superscalar processor cores and one graphic co-processor. To obtain performance characteristics of Raptor, a program-driven simulator and its programming environment were developed. The simulation results showed that Raptor can exploit thread level parallelism effectively and offer a promising architecture for future on-chip multi-processor designs.
PDF

Performance Evaluation of an On-Chip Multiprocessor for Object Recognition (객체 인식을 위한 다중처리 마이크로프로세서의 성능 평가)

Chung, Yong-Wha;Park, Kyoung;Choi, Sung-Hoon;Hahn, Woo-Jong
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.6
- /
- pp.558-566
- /
- 2000
Object recognition is a challenging application for high-performance computing. Currently, the superscalar architecture dominates todays microprocessor marketplace. As more transistors are integrated onto larger die, however, an on-chip multiprocessor is regarded as a promising alternative to the superscalar microprocessor. This paper examines the behavior of the object recognition on the on-chip multiprocessor, which will be employed in general-purpose parallel machines. To obtain the performance characteristics of the microprocessor, a program-driven simulator and its programming environment were developed. The simulation results showed that the on-chip multiprocessor can exploit thread level parallelisms effectively and offer a promising architecture for the object recognition application.
PDF

The Processor Performance Model Using Statistical Simulation (통계적 모의실험을 이용하는 프로세서의 성능 모델)

Lee Jong-Bok
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.5
- /
- pp.297-305
- /
- 2006
Trace-driven simulation is widely used for measuring the performance of a microprocessor in its initial design phase. However, since it requires much time and disk space, the statistical simulation has been studied as an alternative method. In this paper, statistical simulations are performed for a high performance superscalar microprocessor with a perceptron-based multiple branch predictor. For the verification, various hardware configurations are simulated using SPEC2000 benchmarks programs as input. As a result, we show that the statistical simulation is quite accurate and time saving for the evaluation of microprocessor architectures with multiple branch prediction.
PDF KSCI

Exploiting Parallelism in the Block Encryption Algorithms RC6 and Rijndael (블록 암호화 알고리즘 RC6 및 Rijndael에서의 병렬성 활용)

정용화;정교일;손승원
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.11 no.2
- /
- pp.3-12
- /
- 2001
Currently, the superscalar architecture dominates todays microprocessor marketplase. As, more transistors are integrated onto larger die, however, an on-chip multiprocessor is regarded as a promising alternative to the superscalar microprocessor. This paper examines the behavior of the next generation block encryption algorithms RC6 and Rijndael on the on-chip multiprocessing microprocessor. Based on the simulation results by using a program-driven simulator, the on-chip multiprocessor can exploit thread level parallelism effectively and overcome the limitation of instruction level parallelism in the next generation block encryption algorithms.
https://doi.org/10.13089/JKIISC.2001.11.2.3 인용 PDF HTML

Design of Floating-point Processing Unit for Multi-chip Superscalar Microprocessor (다중 칩 수퍼스칼라 마이크로프로세서용 부동소수점 연산기의 설계)

이영상;강준우
- Proceedings of the IEEK Conference
- /
- 1998.10a
- /
- pp.1153-1156
- /
- 1998
We describe a design of a simple but efficient floatingpoint processing architecture expoiting concurrent execution of scalar instructions for high performance in general-purpose microprocessors. This architecture employs 3 stage pipeline asyncronously working with integer processing unit to regulate instruction flows between two arithmetic units.
PDF

An optimized superscalar instruction issue architecture using the instruction buffer (명령어 버퍼를 이용한 최적화된 수퍼스칼라 명령어 이슈 구조)

문병인;이용환;안상준;이용석
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.34C no.9
- /
- pp.43-52
- /
- 1997
Processors using the superscalar rchitecture can achieve high performance by executing multipel instructions in a clock cycle. It is made possible by having multiple functional units and issuing multiple instructions to functional units simultaneously. But instructions can be dependent on one another and these dependencies prevent some instructions form being issued at the same cycle. In this paper, we designed an issue unit of a superscalar RISC microprocessor that can issue four instructions per cycle. The issue unit receives instructions form a prefetch unit, and issues them in order at a rate of as high as four instructions in one cycle for maximum utilization of functional units. By using an instruction buffer, the unit decouples instruction fetch and issue to improve instruction ussue rate. The issue unit is composed of an instruction buffer and an instruction decoder. The instruction buffer aligns and stores instructions from the prefetch unit, and sends the earliest four available isstructions to the instruction decoder. The instruction decoder decodes instructions, and issues them if they are free form data dependencies and necessary functional units and rgister file prots are available. The issue unit is described with behavioral level HDL (lhardware description language). The result of simulation using C programs shows that instruction issue rate is improved as the instruction buffer size increases, and 12-entry instruction buffer is found to be optimum considering performance and hardware cost of the instruction buffer.
PDF

A design of floating-point multiplier for superscalar microprocessor (수퍼스칼라 마이크로프로세서용 부동 소수점 승산기의 설계)

최병윤;이문기
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.5
- /
- pp.1332-1344
- /
- 1996
This paper presents a pipelined floating point multiplier(FMUL) for superscalar microprocessors that conbines radix-16 recoding scheme based on signed-digit(SD) number system and new rouding and normalization scheme. The new rounding and normalization scheme enable the FMUL to compute sticky bit in parallel with multiple operation and elminate timing delay due to post-normalization. By expoliting SD radix-16 recoding scheme, we can achieves further reduction of silicon area and computation time. The FMUL can execute signle-precision or double-precision floating-point multiply operation through three-stage pipelined datapath and support IEEE standard 754. The algorithm andstructure of the designed multiplier have been successfully verified through Verilog HOL modeling and simulation.
PDF

Search Result 15, Processing Time 0.017 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)