Search | Korea Science

Design of Floating-point Processing Unit for Multi-chip Superscalar Microprocessor (다중 칩 수퍼스칼라 마이크로프로세서용 부동소수점 연산기의 설계)

이영상;강준우
- Proceedings of the IEEK Conference
- /
- 1998.10a
- /
- pp.1153-1156
- /
- 1998
We describe a design of a simple but efficient floatingpoint processing architecture expoiting concurrent execution of scalar instructions for high performance in general-purpose microprocessors. This architecture employs 3 stage pipeline asyncronously working with integer processing unit to regulate instruction flows between two arithmetic units.
PDF

Attributed AND-OR Graph for Synthesis of Superscalar Processor Simulator (슈퍼스칼라 프로세서 시뮬레이터의 생성을 위한 Attributed AND-OR 그래프)

Jun Kyoung Kim;Tag Gon Kim
- Proceedings of the Korea Society for Simulation Conference
- /
- 2003.06a
- /
- pp.73-78
- /
- 2003
This paper proposes the simulator synthesis scheme which is based on the exploration of the total design space in attributed AND-OR graph. Attributed AND-OR graph is a systematic design space representation formalism which enables to represent all the design space by decomposition rule and specialization rule. In addition, attributes attached to the design entity provides flexible modeling. Based on this design space representation scheme, a pruning algorithm which can transform the total design space into sub-design space that satisfies the user requirements is given. We have shown the effectiveness of our framework by (ⅰ) constructing the design space of superscalar processor in attributed AND-OR graph (ⅱ) pruning it to obtain the ARM9 processor architecture. (ⅲ) modeling the components of the architecture and (ⅳ) simulating the ARM9 model.
PDF

Design of Embedded Processor Architecture Applicable to Mobile Multimedia (Mobile Multimedia 지원을 위한 Embedded Processor 구조 설계)

이호석;한진호;배영환;조한진
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.41 no.5
- /
- pp.71-80
- /
- 2004
This paper describes embedded processor architecture design which is applicable to multimedia in mobile platform The main description is based on basic processor architecture and consideration about energy efficiency when used in mobile platform To design processor data path architecture (pipeline, branch prediction, multiple issue superscalar, function unit number) which is optimal to multimedia application and cache hierarchy and its structure, we have nut the simulation with variant architecture using MPEG4 test bench as multimedia application. We analyzed energy efficiency of architecture to check if it is applicable to mobile platform and decide basic processor architecture based on analysis result. The suggested basic processor architecture not only can be applied to mobile platform but also can be applied to basic processor architecture of configurable processor which is designed through automatic design environment.
PDF KSCI

An Optimal Instruction Fetch Strategy for SMT Processors (SMT 프로세서에 최적화된 명령어 페치 전략에 관한 연구)

홍인표;문병인;김문경;이용석
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.27 no.5C
- /
- pp.512-521
- /
- 2002
Recently, conventional superscalar RISC processors arrive their performance limit, and many researches on the next-generation architecture are concentrated on SMT(Simultaneous Multi-Threading). In SMT processors, multiple threads are executed simultaneously and share hardware resources dynamically. In this case, it is more important to supply instructions from multiple threads to processor core efficiently than ever. Because SMT architecture shows higher IPC(Instructions per cycle) than superscalar architecture, performance is influenced by fetch bandwidth and the size of fetch queue. Moreover, to use TLP(Thread Level Parallelism) efficiently, fetch thread selection algorithm and fetch bandwidth for each selected threads must be carefully designed. Thus, in this paper, the performance values influenced by these factors are analyzed. Based on the results, an optimal instruction fetch strategy for SMT processors is proposed.
PDF KSCI

An optimized superscalar instruction issue architecture using the instruction buffer (명령어 버퍼를 이용한 최적화된 수퍼스칼라 명령어 이슈 구조)

문병인;이용환;안상준;이용석
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.34C no.9
- /
- pp.43-52
- /
- 1997
Processors using the superscalar rchitecture can achieve high performance by executing multipel instructions in a clock cycle. It is made possible by having multiple functional units and issuing multiple instructions to functional units simultaneously. But instructions can be dependent on one another and these dependencies prevent some instructions form being issued at the same cycle. In this paper, we designed an issue unit of a superscalar RISC microprocessor that can issue four instructions per cycle. The issue unit receives instructions form a prefetch unit, and issues them in order at a rate of as high as four instructions in one cycle for maximum utilization of functional units. By using an instruction buffer, the unit decouples instruction fetch and issue to improve instruction ussue rate. The issue unit is composed of an instruction buffer and an instruction decoder. The instruction buffer aligns and stores instructions from the prefetch unit, and sends the earliest four available isstructions to the instruction decoder. The instruction decoder decodes instructions, and issues them if they are free form data dependencies and necessary functional units and rgister file prots are available. The issue unit is described with behavioral level HDL (lhardware description language). The result of simulation using C programs shows that instruction issue rate is improved as the instruction buffer size increases, and 12-entry instruction buffer is found to be optimum considering performance and hardware cost of the instruction buffer.
PDF

Study of an In-order SMT Architecture and Grouping Schemes

Moon, Byung-In;Kim, Moon-Gyung;Hong, In-Pyo;Kim, Ki-Chang;Lee, Yong-Surk
- International Journal of Control, Automation, and Systems
- /
- v.1 no.3
- /
- pp.339-350
- /
- 2003
In this paper, we propose a simultaneous multithreading (SMT) architecture that improves instruction throughput by exploiting instruction level parallelism (ILP) and thread level parallelism (TLP). The proposed architecture issues and completes instructions belonging to the same thread in exact program order. The issue and completion policy greatly reduces the design complexity and hardware cost of our architecture, compared with others that employ out-of-order issue and completion. On the other hand, when the instructions belong to different threads, the issue and completion orders for those instructions may not necessarily be identical to the fetch order. The processor issues instructions simultaneously from multiple threads to functional units by exploiting ILP and TLP, and by dynamic resource sharing. That parallel execution notably improves performance and resource utilization with minimal additional hardware cost over the conventional superscalar processors. This paper proposes an SMT architecture with grouping as well as one without grouping. Without grouping, all threads dynamically and flexibly share most resources. On the other hand, in the SMT architecture with grouping, in which resources and threads are divided into several groups for design simplification, resources are shared only among threads belonging to the same group as those resources. Simulation results show that our processors with four and eight threads improve performance by three or more times over the conventional superscalar processor with comparable execution resources and policies, and that reasonable grouping reduces the design complexity of SMT processors with little negative effect on performance.
PDF KSCI

Design of a Graphic Processor for Multimedia Data Processing (멀티미디어 데이타 처리를 위한 그래픽 프로세서 설계)

고익상;한우종;선우명동
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.36C no.10
- /
- pp.56-65
- /
- 1999
This paper presents an architecture and its instruction set for a graphic coprocessor(GCP) which can be used for a multimedia server. The proposed instruction set employs parallel architecture concepts, such as SIMD and Superscalar. GCP consists of a scheduler and four functional units. The scheduler solves an instruction bottleneck problem causing by sharing with four general processors(GPs). GCP can execute up to 4 instructions in parallel. It consists of about 56,000 gates and operates at 30 MHz clock frequency due to speed limitation of SOG technology. GCP meets the real-time DCT algorithm requirement of the CIF image format and can process up to 63 frames/sec for the DCT Algorithm and 21 frames/sec for the Full Block matching Algorithm of the CIF image format.
PDF

Performance Evaluation of an On-Chip Multiprocessor for Object Recognition (객체 인식을 위한 다중처리 마이크로프로세서의 성능 평가)

Chung, Yong-Wha;Park, Kyoung;Choi, Sung-Hoon;Hahn, Woo-Jong
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.6
- /
- pp.558-566
- /
- 2000
Object recognition is a challenging application for high-performance computing. Currently, the superscalar architecture dominates todays microprocessor marketplace. As more transistors are integrated onto larger die, however, an on-chip multiprocessor is regarded as a promising alternative to the superscalar microprocessor. This paper examines the behavior of the object recognition on the on-chip multiprocessor, which will be employed in general-purpose parallel machines. To obtain the performance characteristics of the microprocessor, a program-driven simulator and its programming environment were developed. The simulation results showed that the on-chip multiprocessor can exploit thread level parallelisms effectively and offer a promising architecture for the object recognition application.
PDF

On-Chip Multiprocessor with Simultaneous Multithreading

Park, Kyoung;Choi, Sung-Hoon;Chung, Yong-Wha;Hahn, Woo-Jong;Yoon, Suk-Han
- ETRI Journal
- /
- v.22 no.4
- /
- pp.13-24
- /
- 2000
As more transistors are integrated onto bigger die, an on-chip multiprocessor will become a promising alternative to the superscalar microprocessor that dominates today's microprocessor marketplace. This paper describes key parts of a new on-chip multiprocessor, called Raptor, which is composed of four 2-way superscalar processor cores and one graphic co-processor. To obtain performance characteristics of Raptor, a program-driven simulator and its programming environment were developed. The simulation results showed that Raptor can exploit thread level parallelism effectively and offer a promising architecture for future on-chip multi-processor designs.
PDF

Design of an ALU for SMT Microprocessors (SMT 마이크로프로세서에 적합한 ALU의 설계)

김상철;홍인표;이용석
- Proceedings of the IEEK Conference
- /
- 2003.07d
- /
- pp.1383-1386
- /
- 2003
In this paper, an ALU for Simultaneous Multi-Threading (SMT) microprocessors is designed. The SMT architecture improves notably performance and utilization of processes compared with conventional superscalar architectures by executing instructions from multiple threads at the same time. This ALU adopts data bypassing method to process multi-threads. And it can flush instructions in the same thread that generate exceptions such as branch misprediction. interrupt etc, performance of SMT microprocessors with data bypassing and exception handler can be improved.
PDF

Search Result 39, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)