• Title/Summary/Keyword: enhanced processor-architecture

Search Result 23, Processing Time 0.023 seconds

Enhanced Processor-Architecture for the Faster Processing of Genetic Algorithm (유전 알고리즘 처리속도 향상을 위한 강화 프로세서 구조)

  • Yoon, Han-Ul;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.224-229
    • /
    • 2005
  • Generally, genetic algorithm (GA) has too much time and space complexity when it is running in the typical processor. Therefore, we are forced to use the high-performance and expensive processor by this reason. It also works as a barrier to implement real device, such a small mobile robot, which is required only simple rules. To solve this problem, this paper presents and proposes enhanced processor-architecture for the faster GA processing. A typical processor architecture can be enhanced and specialized by two approaches: one is a sorting network, the other is a residue number system (RNS). A sorting network can improve the time complexity of which needs to compare the populations' fitness. An RNS can reduce the magnitude of the largest bit that dictates the speed of arithmetic operation. Consequently, it can make the total logic size smaller and innovate arithmetic operation speed faster.

Performance Study of Multi-core In-Order Superscalar Processor Architecture (멀티코어 순차 수퍼스칼라 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.5
    • /
    • pp.123-128
    • /
    • 2012
  • In order to overcome the hardware complexity and performance limit problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further enhanced. In this paper, in-order superscalar processor is utilized as the core for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the number of superscalar cores between 2 and 16 and the window size of 4 to 16 extensively. As a result, the 16-core superscalar processor for the window size of 16 results in 8.4 times speed up over the single core superscalar processor. When compared with the same number of cores, the multi-core superscalar processor performance doubles that of the multi-core RISC processor.

A Performance Study of Multi-core Out-of-Order Superscalar Processor Architecture (멀티코어 비순차 수퍼스칼라 프로세서의 성능 연구)

  • Lee, Jong-Bok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.10
    • /
    • pp.1502-1507
    • /
    • 2012
  • In order to overcome the hardware complexity and power consumption problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further increased. In this paper, out-of-order superscalar processor is utilized for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar cores between 2 and 16 extensively. As a result, the 16-core out-of-order superscalar processor for the window size of 16 resulted in 17.4 times speed up over the single-core out-of-order superscalar processor, and 50 times speed up over the single core RISC processor. When compared for the same number of cores on the average, the multi-core out-of-order superscalar processor performance achieved 3.2 times speed up over the multi-core RISC processor and 1.6 times speed up over the multi-core in-order superscalar processor.

PASC Processor Architecture for Enhanced Loop Execution (루프를 효과적으로 처리하는 PASC 프로세서 구조)

  • Ji, Seung-Hyeon;Park, No-Gwang;Jeon, Jung-Nam;Kim, Seok-Il
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.5
    • /
    • pp.1225-1240
    • /
    • 1999
  • This paper proposes PASC(PArtitioned SCHeduler) processor architecture that equips with a number of functional unit and an individual scheduler paris. Every scheduler of the PASC processor can determine whether a unit instruction can be issued to the associated functional unit or it is to be waited until next cycle caused by a resource collision or data dependencies. In the PASC processor, only the functional unit with a resource collision or data dependencies waits by executing a NOP(No OPeration) instruction and the other functional units execute their own instructions. Therefore we can expect the code compaction effect on the PASC processor. Thus, the last instruction of a loop at certain iteration and the very first instruction of the loop at the next iteration can be scheduled simultaneously if the two instructions do not incur any resource collision or data dependencies. Therefore, we can expect that such two instructions without any resource collision and data dependencies are packed into the same very long instruction word and thus, the two instructions are executed concurrently at run time. As a result, we can shorten execution cycles of a loop comparing to the execution of the loop on a traditional VLIW or SVLIW processor architecture. Simulation result also promises faster execution of loops on a PASC processor architecture than those on a VLIW and SVLIW processor architecture.

  • PDF

Superscalar RISC Microprocessor Architecture with enhanced Multimedia Instructions (멀티미디어 명령어를 강화한 수퍼스칼라 RISC 마이크로프로세서 구조)

  • 이용환;문병인;이용석
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.931-934
    • /
    • 1999
  • For applications in multimedia to which genuine RISC microprocessors are not suitably applicable, a new generation of fast and flexible microprocessors is required. In this paper, as a technique of integrating DSP functionality in a general RISC processor, a RISC that can execute DSP extension instructions is developed to improve the performance of multimedia application execution. This processor can execute DSP instructions in parallel with the execution of ALU instructions for efficient and fast execution. In addition, the execution ability of integer instructions is improved by enhancing the RISC core itself.

  • PDF

Architectural Design Issues in a Clockless 32-Bit Processor Using an Asynchronous HDL

  • Oh, Myeong-Hoon;Kim, Young Woo;Kwak, Sanghoon;Shin, Chi-Hoon;Kim, Sung-Nam
    • ETRI Journal
    • /
    • v.35 no.3
    • /
    • pp.480-490
    • /
    • 2013
  • As technology evolves into the deep submicron level, synchronous circuit designs based on a single global clock have incurred problems in such areas as timing closure and power consumption. An asynchronous circuit design methodology is one of the strong candidates to solve such problems. To verify the feasibility and efficiency of a large-scale asynchronous circuit, we design a fully clockless 32-bit processor. We model the processor using an asynchronous HDL and synthesize it using a tool specialized for asynchronous circuits with a top-down design approach. In this paper, two microarchitectures, basic and enhanced, are explored. The results from a pre-layout simulation utilizing 0.13-${\mu}m$ CMOS technology show that the performance and power consumption of the enhanced microarchitecture are respectively improved by 109% and 30% with respect to the basic architecture. Furthermore, the measured power efficiency is about 238 ${\mu}W$/MHz and is comparable to that of a synchronous counterpart.

A VLSI implementation of image processor for facsimile and digital copier (팩시밀리 및 디지털 복사기를 위한 고속 영상 처리기의 VLSI구현)

  • 박창대;정영훈;김형수;김진수;권오준;홍기상;장동구;박기용;김윤수
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.1
    • /
    • pp.105-113
    • /
    • 1998
  • A new image processor is implemented for high-speed digital copiers and facsimiles. The imgage processor performs CCD and CIS interface, pre-processing, enlargement andreduction of gray level image, and various halftoning algorithms. Implemented halftoning algorithms are simple thresholding, fuzzy based mixed mode thresholding, dithering, and edge enhanced error diffusion. The result of binarization is transferred to a printer with serial or paralel output ports. Line by line pipelined data prodessing architecture is employed with time sharing access of the external memory. In receiving mode, it converts the resolution of received binary image for compatibility with conventional facsimile. In copy mode, a line of A3 paper with 400 dpi is processed with in 2.5 ms. The prototype of image processor was implemented usig Laser Programmable Gate Array (LPGA) with 0.8.mu.m technology.

  • PDF

Embedded Multithreading Processor Architecture for Personal Information Devices (개인용 정보 단말장치를 위한 내장형 멀티스레딩 프로세서 구조)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.9
    • /
    • pp.7-13
    • /
    • 2010
  • In this paper, we proposed a processor architecture that is suitable for next generation embedded applications, especially for personal information devices such as smart phones, tablet PC. Latest high performance embedded processors are developed to achieve high clock speed. Because increasing performance makes design more difficult and induces large overhead, architectural evolution in embedded processor field is necessary. Among more enhanced processor types, out-of-order superscalar cannot be a candidate for embedded applications due to its excessive complexity and relatively low performance gain compared to its overhead. Therefore, new architecture with moderate complexity must be designed. In this paper, we developed a low-cost SMT architecture model and compared its performance to other architectures including scalar, superscalar and multiprocessor. Because current personal information devices have a tendency to execute multiple tasks simultaneously, SMT or CMP can be a good choice. And our simulation result shows that the efficiency of SMT is the best among the architectures considered.

Low Cost Hardware Engine of Atomic Pipeline Broadcast Based on Processing Node Status (프로세서 노드 상황을 고려하는 저비용 파이프라인 브로드캐스트 하드웨어 엔진)

  • Park, Jongsu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.8
    • /
    • pp.1109-1112
    • /
    • 2020
  • This paper presents a low cost hardware message passing engine of enhanced atomic pipelined broadcast based on processing node status. In this algorithm, the previous atomic pipelined broadcast algorithm is modified to reduce the waiting time until next broadcast communication. For this, the processor change the transmission order of processing nodes based on the nodes' communication channel. Also, the hardware message passing engine architecture of the proposed algorithm is modified to be adopted to multi-core processor. The synthesized logic area of the proposed hardware message passing engine was reduced by about 16%, compared by the pre-existing hardware message passing engine.

A Study on the Design of FFT Processor for UWB Ultrafast Wireless Communication Systems (UWB 초고속 무선통신 시스템을 위한 FFT 프로세서 설계에 관한 연구)

  • Lee, Sang-Il;Chun, Young-Il
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.12
    • /
    • pp.2140-2145
    • /
    • 2008
  • We design and synthesize a 128-point FFT processor for multi-band OFDM, which can be applied to a UWB transceiver. The structure of a 128-point FFT processor is based on a Radix-2 FFT algorithm and a R2SDF pipeline architecture. The algorithm is efficiently modeled in VHDL and the result is simulated using Modelsim. Finally, they are synthesized on Xilinx Vertex-II FPGA, and an operational frequency of 18.7MHz has been obtained. It is expected that the proposed 128-point FFT processor can be applied to an entire FFT block as one of parallel processed FFTs. In order to obtain the enhanced maximum frequency of operation, we design the FFT module consisting of four 128-point FFT processors for parallel process. As a result, we achieve the performance requirement of computing the FFT module in multi-band OFDM symbol timing in 90nm ASIC process.