• Title/Summary/Keyword: Instruction-set Architecture

Search Result 88, Processing Time 0.024 seconds

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF

Analysis of Syllabi for Landscape Architectural Design Courses as Project-Based Classes and Improvement Strategies (프로젝트 기반 수업으로서의 조경설계 교과목 수업계획서 분석과 개선방안)

  • Kim, Ah-Yeon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.44 no.1
    • /
    • pp.51-65
    • /
    • 2016
  • A syllabus can be considered to be a masterplan for good educational results. This study tries to diagnose the current status of landscape architectural design education and suggest improvement strategies for better landscape design courses through the analysis of the syllabi of mid-level landscape design studio classes collected from the four-year undergraduate programs. The findings and suggestions are as follows. First, it is necessary to take advantage of a syllabus as a contract as well as a plan and a learning tool. Second, it is crucial to make more detailed statement from the perspectives of learners. Third, more customized components for design courses should be developed; the syllabus should give the structure of a design class as an integration and synthesis of other courses. Fourth, it is necessary to increase the interrelationship and relevance among the components, especially between course objectives and evaluation criteria, and course activities and references. Fifth, a syllabus needs to function as a communication tool in a flexible manner. Sixth, a syllabus needs to give a comprehensive information about the site and the design project. Finally, instructors need to introduce a set of detailed evaluation rubrics or criteria acceptable to students in order to increase the fairness and transparency of the evaluation.

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.

The Design and implementation of parallel processing system using the $Nios^{(R)}$ II embedded processor ($Nios^{(R)}$ II 임베디드 프로세서를 사용한 병렬처리 시스템의 설계 및 구현)

  • Lee, Si-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.11
    • /
    • pp.97-103
    • /
    • 2009
  • In this thesis, we discuss the implementation of parallel processing system which is able to get a high degree of efficiency(size, cost, performance and flexibility) by using $Nios^{(R)}$ II(32bit RISC(Reduced Instruction Set Computer) processor) embedded processor in DE2-$70^{(R)}$ reference board. The designed Parallel processing system is master-slave, shared memory and MIMD(Mu1tiple Instruction-Multiple Data stream) architecture with 4-processor. For performance test of system, N-point FFT is used. The result is represented speed-up as follow; in the case of using 2-processor(core), speed-up is shown as average 1.8 times as 1-processor's. When 4-processor, the speed-up is shown as average 2.4 times as it's.

An Efficient Hardware-Software Co-Implementation of an H.263 Video Codec (하드웨어 소프트웨어 통합 설계에 의한 H.263 동영상 코덱 구현)

  • 장성규;김성득;이재헌;정의철;최건영;김종대;나종범
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.4B
    • /
    • pp.771-782
    • /
    • 2000
  • In this paper, an H.263 video codec is implemented by adopting the concept of hardware and software co-design. Each module of the codec is investigated to find which approach between hardware and software is better to achieve real-time processing speed as well as flexibility. The hardware portion includes motion-related engines, such as motion estimation and compensation, and a memory control part. The remaining portion of theH.263 video codec is implemented in software using a RISC processor. This paper also introduces efficient design methods for hardware and software modules. In hardware, an area-efficient architecture for the motion estimator of a multi-resolution block matching algorithm using multiple candidates and spatial correlation in motion vector fields (MRMCS), is suggested to reduce the chip size. Software optimization techniques are also explored by using the statistics of transformed coefficients and the minimum sum of absolute difference (SAD)obtained from the motion estimator.

  • PDF

An Efficient Programmable Memory BIST for Dual-Port Memories (이중 포트 메모리를 위한 효율적인 프로그램 가능한 메모리 BIST)

  • Park, Young-Kyu;Han, Tae-Woo;Kang, Sung-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.55-62
    • /
    • 2012
  • The development of memory design and process technology enabled the production of high density memory. As the weight of embedded memory within aggregate Systems-On-Chips(SoC) gradually increases to 80-90% of the number of total transistors, the importance of testing embedded dual-port memories in SoC increases. This paper proposes a new micro-code based programmable memory Built-In Self-Test(PMBIST) architecture for dual-port memories that support test various test algorithms. In addition, various test algorithms including March based algorithms and dual-port memory test algorithms are efficiently programmed through the proposed algorithm instruction set. This PMBIST has an optimized hardware overhead, since test algorithm can be implemented with the minimum bits by the optimized algorithm instructions.

Code Size Reduction Through Efficient use of Multiple Load/store Instructions (복수의 메모리 접근 명령어의 효율적인 이용을 통한 코드 크기의 감소)

  • Ahn Minwook;Cho Doosan;Paek Yunheung;Cho Jeonghun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.8
    • /
    • pp.819-833
    • /
    • 2005
  • Code size reduction is ever becoming more important for compilers targeting embedded processors because these processors are often severely limited by storage constraints and thus the reduced code size can have a positively significant Impact on their performance. Various code size reduction techniques have different motivations and a variety of application contexts utilizing special hardware features of their target processors. In this work, we propose a novel technique that fully utilizes a set of hardware instructions, called the multiple load/store (MLS), that are specially featured for reducing code size by minimizing the number of memory operations in the code. To take advantage of this feature, many microprocessors support the MLS instructions, whereas no existing compilers fully exploit the potential benefit of these instructions but only use them for some limited cases. This is mainly because optimizing memory accesses with MLS instructions for general cases is an NP-hard problem that necessitates complex assignments of registers and memory off-sets for variables in a stack frame. Our technique uses a couple of heuristics to efficiently handle this problem in a polynomial time bound.

IEEE std. 1500 based an Efficient Programmable Memory BIST (IEEE 1500 표준 기반의 효율적인 프로그램 가능한 메모리 BIST)

  • Park, Youngkyu;Choi, Inhyuk;Kang, Sungho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.2
    • /
    • pp.114-121
    • /
    • 2013
  • As the weight of embedded memory within Systems-On-Chips(SoC) rapidly increases to 80-90% of the number of total transistors, the importance of testing embedded memory in SoC increases. This paper proposes IEEE std. 1500 wrapper based Programmable Memory Built-In Self-Test(PMBIST) architecture which can support various kinds of test algorithm. The proposed PMBIST guarantees high flexibility, programmability and fault coverage using not only March algorithms but also non-March algorithms such as Walking and Galloping. The PMBIST has an optimal hardware overhead by an optimum program instruction set and a smaller program memory. Furthermore, the proposed fault information processing scheme guarantees improvement of the memory yield by effectively supporting three types of the diagnostic methods for repair and diagnosis.