• Title/Summary/Keyword: On-Chip Memory

Search Result 296, Processing Time 0.024 seconds

Adaptive Memory Controller for High-performance Multi-channel Memory

  • Kim, Jin-ku;Lim, Jong-bum;Cho, Woo-cheol;Shin, Kwang-Sik;Kim, Hoshik;Lee, Hyuk-Jun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.6
    • /
    • pp.808-816
    • /
    • 2016
  • As the number of CPU/GPU cores and IPs in SOC increases and applications require explosive memory bandwidth, simultaneously achieving good throughput and fairness in the memory system among interfering applications is very challenging. Recent works proposed priority-based thread scheduling and channel partitioning to improve throughput and fairness. However, combining these different approaches leads to performance and fairness degradation. In this paper, we analyze the problems incurred when combining priority-based scheduling and channel partitioning and propose dynamic priority thread scheduling and adaptive channel partitioning method. In addition, we propose dynamic address mapping to further optimize the proposed scheme. Combining proposed methods could enhance weighted speedup and fairness for memory intensive applications by 4.2% and 10.2% over TCM or by 19.7% and 19.9% over FR-FCFS on average whereas the proposed scheme requires space less than TCM by 8%.

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF

Architecture design for speeding up Multi-Access Memory System(MAMS) (Multi-Access Memory System(MAMS)의 속도 향상을 위한 아키텍처 설계)

  • Ko, Kyung-sik;Kim, Jae Hee;Lee, S-Ra-El;Park, Jong Won
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.6
    • /
    • pp.55-64
    • /
    • 2017
  • High-capacity, high-definition image applications need to process considerable amounts of data at high speed. Accordingly, users of these applications demand a high-speed parallel execution system. To increase the speed of a parallel execution system, Park (2004) proposed a technique, called MAMS (Multi-Access Memory System), to access data in several execution units without the conflict of parallel processing memories. Since then, many studies on MAMS have been conducted, furthering the technique to MAMS-PP16 and MAMS-PP64, among others. As a memory architecture for parallel processing, MAMS must be constructed in one chip; therefore, a method to achieve the identical functionality as the existing MAMS while minimizing the architecture needs to be studied. This study proposes a method of miniaturizing the MAMS architecture in which the architectures of the ACR (Address Calculation and Routing) circuit and MMS (Memory Module Selection) circuit, which deliver data in memories to parallel execution units (PEs), do not use the MMS circuit, but are constructed as one shift and conditional statements whose number is the same as that of memory modules inside the ACR circuit. To verify the performance of the realized architecture, the study conducted the processing time of the proposed MAMS-PP64 through an image correlation test, the results of which demonstrated that the ratio of the image correlation from the proposed architecture was improved by 1.05 on average.

A Study on the Design and Fabrication of Content Addressable Memory (연상메모리 설계 및 제작에 관한 연구)

  • 박상봉;박노경;차균현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.2
    • /
    • pp.145-154
    • /
    • 1991
  • In this dissertation, the same reading and writing operation of general SRAM, the algonthm and hardware of 8 bit $\times$16 word CAM(Content Addressable Memory) which carry out the parallel that search is presented. The designed CAM chip consists of five functional blocks (CAM cell array, Address Deceden, Address Encoden. Data Selector, Sense Amplifier). The smulation is performed using logic smmulator on Apollo workstation and PSPICE eitcut simulation on PC/AT. The designed CAM was fabricated by 3um CMOS N Well process (ETRI) design nitles and testing was performed.

  • PDF

A Memory Intensive Real-time 3x3 Neighborhood processor for Image Processing (Memory Intensive 실시간 영상신호처리용 3 $\times$ 3 Neighborhood VLSI 처리기)

  • 김진홍;남철우;우성일;김용태
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.6
    • /
    • pp.963-971
    • /
    • 1990
  • This paper proposes a memory intensive VLSI architecture for the realization of real-time 3x3 neighborhood processor based on the distributed arithmetic. The proposed architecture is characterized by a bit serial and multi-kernel parallel processing which exploits the pixel kernel parallelism and concurrency. The chip implements 8 neighborhood processing elements in parallel with efficirnt input and output modules which operate concurrently. Besides the a4chitectural design of a neighborhood processor, the design methodology using module generator concept has been considered and MOGOT(MOdule Generator Oriented VLSI design Tool) has been constructed based on the workstation. Based on these design environments MOGOT, it has been shown that the main part of the suggested architecture can be designed efficiently using 2\ulcorner double metal CMOS technology. It includes design of input delay and data conversion module, look-up table for inner product operation, carry save accumulator, output data converter and delay module, and control module.

  • PDF

Implementation of FPGA Verification System with Slave FIFO Interface and FX3 USB 3 Bridge Chip (FX3 USB 3 브릿지 칩과 slave FIFO 인터페이스를 사용하는 FPGA 검증 시스템 구현)

  • Choi, Byeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.259-266
    • /
    • 2021
  • USB bus not only works with convenience but also transmits data fast and becomes a standard peripheral interface between FPGA development board and personal computer. In this paper FPGA verification system with slave FIFO interface for Cypress FX3 USB 3 bridge chip was implemented. The designed slave FIFO interface consists of host interface module based on FIFO structure, master bus controller and command decoder and supports streaming communication interface for FX3 bridge chip and memory-mapped input and output interface for user design circuit. The ZestSC3 board with Cypress FX3 USB 3 bridge chip and Xilinx Artix FPGA(XC7A35T-1C5G3241) was used to implement FPGA verification system. It was verified that the FPGA verification system for user design circuit operated correctly under various clock frequencies using GUI software developed by visual C# and C++ DLL. The designed slave FIFO interface for FPGA verification system has modular structure and can be applicable to the different user designs with memory-mapped I/O interface.

Design of Miniaturized Wireless Sensor Node Using System-on-Chip (SoC를 이용한 소형 무선 센서 노드 설계)

  • Kim, Hyun-Joong;Yang, Hyun-Ho
    • Proceedings of the KAIS Fall Conference
    • /
    • 2009.12a
    • /
    • pp.190-193
    • /
    • 2009
  • The most essential element in wireless sensor network is wireless sensor node which collects environmental information and transmits it to the user application systems. Recently, due to the technological advancement, wireless sensor nodes are become smaller, more intelligent and less power consuming. Especially, SoC(System-on-Chip) technology, which unifies the MCU, RF module, memory and other element inside one chip, plays an important part for miniaturization of sensor node, hence reduces the manufacturing expenses. In this paper, we have designed a miniaturized wireless sensor node for wireless sensor network using commercial SoC technology and discussed about some application scenario and additional considerations.

  • PDF

A Genetic Algorithm for Directed Graph-based Supply Network Planning in Memory Module Industry

  • Wang, Li-Chih;Cheng, Chen-Yang;Huang, Li-Pin
    • Industrial Engineering and Management Systems
    • /
    • v.9 no.3
    • /
    • pp.227-241
    • /
    • 2010
  • A memory module industry's supply chain usually consists of multiple manufacturing sites and multiple distribution centers. In order to fulfill the variety of demands from downstream customers, production planners need not only to decide the order allocation among multiple manufacturing sites but also to consider memory module industrial characteristics and supply chain constraints, such as multiple material substitution relationships, capacity, and transportation lead time, fluctuation of component purchasing prices and available supply quantities of critical materials (e.g., DRAM, chip), based on human experience. In this research, a directed graph-based supply network planning (DGSNP) model is developed for memory module industry. In addition to multi-site order allocation, the DGSNP model explicitly considers production planning for each manufacturing site, and purchasing planning from each supplier. First, the research formulates the supply network's structure and constraints in a directed-graph form. Then, a proposed genetic algorithm (GA) solves the matrix form which is transformed from the directed-graph model. Finally, the final matrix, with a calculated maximum profit, can be transformed back to a directed-graph based supply network plan as a reference for planners. The results of the illustrative experiments show that the DGSNP model, compared to current memory module industry practices, determines a convincing supply network planning solution, as measured by total profit.

An Efficient Architecture of The MF-VLD (MF-VLD에 대한 효율적인 하드웨어 구조)

  • Suh, Ki-Bum
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.11
    • /
    • pp.57-62
    • /
    • 2011
  • In this paper, an efficient architecture for MFVLD(Multi-Format Variable Length Decoder) which can process H.264, MPEG-2, MPEG-4, AVS, VC-1 bitstream is proposed. The proposed MF-VLD is designed to be adapted to the MPSOC (Multi-processor System on Chip) architecture, uses bit-plane algorithm for the processing of inverse quantized data to reduce the width of AHB bus. External SDRAM is used to minimize the internal memory size. In this architecture, the adding or removing each variable length decoder can be easily done by using multiplexor. The designed MF-VLD can be operated in 200MHz at 0.18um process. The gate size is 657K gate and internal memory size is 27Kbyte.

The Hardware Design of Real-time Image Processing System-on-chip for Visual Auxiliary Equipment (시각보조기기를 위한 실시간 영상처리 SoC 하드웨어 설계)

  • Jo, Heungsun;Kim, Jiho;Shin, Hyuntaek;Im, Junseong;Ryoo, Kwangki
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.1525-1527
    • /
    • 2013
  • 본 논문에서는 저시력자의 개선된 독서 환경을 제공하는 시각보조기기를 위한 실시간 영상처리 SoC(System on Chip) 하드웨어 구조 설계에 대해서 기술한다. 기존의 시각보조기기는 화면 영상이 실제 움직임보다 늦게 출력되는 잔상 현상이 발생하며, 색 변환 기능도 제한적이다. 따라서 본 논문에서 제안하는 실시간 영상처리 SoC 하드웨어 구조는 데이터 연산을 최소화함으로써 잔상 현상이 감소되며, 저시력자를 위한 다양한 색상 모드를 지원한다. 제안하는 영상처리 SoC 하드웨어 구조는 Core-A 모듈, Memory Controller 모듈, AMBA AHB bus 모듈, ISP(Image Signal Processing) 모듈, TFT-LCD Controller 모듈, VGA Controller 모듈, CIS Controller 모듈, UART 모듈, Block Memory 모듈로 구성된다. 시각보조기기를 위한 실시간 영상처리 SoC 하드웨어 구조는 Virtex4 XC4VLX80 FPGA 디바이스를 이용하여 검증하였으며, TSMC 180nm 셀 라이브러리로 합성한 결과 동작주파수는 54MHz, 게이트 수 197k이다.