• Title/Summary/Keyword: On-Chip Memory

Search Result 296, Processing Time 0.021 seconds

An On-chip Cache and Main Memory Compression System Optimized by Considering the Compression rate Distribution of Compressed Blocks (압축블록의 압축률 분포를 고려해 설계한 내장캐시 및 주 메모리 압축시스템)

  • Yim, Keun-Soo;Lee, Jang-Soo;Hong, In-Pyo;Kim, Ji-Hong;Kim, Shin-Dug;Lee, Yong-Surk;Koh, Kern
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.125-134
    • /
    • 2004
  • Recently, an on-chip compressed cache system was presented to alleviate the processor-memory Performance gap by reducing on-chip cache miss rate and expanding memory bandwidth. This research Presents an extended on-chip compressed cache system which also significantly expands main memory capacity. Several techniques are attempted to expand main memory capacity, on-chip cache capacity, and memory bandwidth as well as reduce decompression time and metadata size. To evaluate the performance of our proposed system over existing systems, we use execution-driven simulation method by modifying a superscalar microprocessor simulator. Our experimental methodology has higher accuracy than previous trace-driven simulation method. The simulation results show that our proposed system reduces execution time by 4-23% compared with conventional memory system without considering the benefits obtained from main memory expansion. The expansion rates of data and code areas of main memory are 57-120% and 27-36%, respectively.

On-Demand Remote Software Code Execution Unit Using On-Chip Flash Memory Cloudification for IoT Environment Acceleration

  • Lee, Dongkyu;Seok, Moon Gi;Park, Daejin
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.191-202
    • /
    • 2021
  • In an Internet of Things (IoT)-configured system, each device executes on-chip software. Recent IoT devices require fast execution time of complex services, such as analyzing a large amount of data, while maintaining low-power computation. As service complexity increases, the service requires high-performance computing and more space for embedded space. However, the low performance of IoT edge devices and their small memory size can hinder the complex and diverse operations of IoT services. In this paper, we propose a remote on-demand software code execution unit using the cloudification of on-chip code memory to accelerate the program execution of an IoT edge device with a low-performance processor. We propose a simulation approach to distribute remote code executed on the server side and on the edge side according to the program's computational and communicational needs. Our on-demand remote code execution unit simulation platform, which includes an instruction set simulator based on 16-bit ARM Thumb instruction set architecture, successfully emulates the architectural behavior of on-chip flash memory, enabling embedded devices to accelerate and execute software using remote execution code in the IoT environment.

SEC-DED-DAEC codes without mis-correction for protecting on-chip memories (오정정 없이 온칩 메모리 보호를 위한 SEC-DED-DAEC 부호)

  • Jun, Hoyoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.10
    • /
    • pp.1559-1562
    • /
    • 2022
  • As electronic devices technology scales down into the deep-submicron to achieve high-density, low power and high performance integrated circuits, multiple bit upsets by soft errors have become a major threat to on-chip memory systems. To address the soft error problem, single error correction, double error detection and double adjacent error correction (SEC-DED-DAEC) codes have been recently proposed. But these codes do not troubleshoot mis-correction problem. We propose the SEC-DED_DAEC code with without mis-correction. The decoder for proposed code is implemented as hardware and verified. The results show that there is no mis-correction in the proposed codes and the decoder can be employed on-chip memory system.

Error correction codes to manage multiple bit upset in on-chip memories (온칩 메모리 내 다중 비트 이상에 대처하기 위한 오류 정정 부호)

  • Jun, Hoyoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1747-1750
    • /
    • 2022
  • As shrinking the semiconductor process into the deep sub-micron to achieve high-density, low power and high performance integrated circuits, MBU (multiple bit upset) by soft errors is one of the major challenge of on-chip memory systems. To address the MBU, single error correction, double error detection and double adjacent error correction (SEC-DED-DAEC) codes have been recently proposed. But these codes do not resolve mis-correction. We propose the SEC-DED-DAEC-TAED(triple adjacent error detection) code without mis-corrections. The generated H-matrix by the proposed heuristic algorithm to accomplish the proposed code is implemented as hardware and verified. The results show that there is no mis-correction in the proposed codes and the 2-stage pipelined decoder can be employed on-chip memory system.

Programmable Digital On-Chip Terminator

  • Kim, Su-Chul;Kim, Nam-Seog;Kim, Tae-Hyung;Cho, Uk-Rae;Byun, Hyun-Guen;Kim, Suki
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1571-1574
    • /
    • 2002
  • This paper describes a circuit and its operations of a programmable digital on-chip terminator designed with CMOS circuits which are used in high speed I/O interface. The on-chip terminator matches external reference resistor with the accuracy of ${\pm}$ 4.1% over process, voltage and temperature variation. The digital impedance codes are generated in programmable impedance controller (PIC), and the codes are sent to terminator transistor arrays at input pads serially to reduce the number of signal lines. The transistor array is thermometer-coded to reduce impedance glitches during code update and it is segmented to two different blocks of thermometer-coded transistor arrays to reduce the number of transistors. The terminator impedance is periodically updated during hold time to minimize inter-symbol interferences.

  • PDF

WARP: Memory Subsystem Effective for Wrapping Bursts of a Cache

  • Jang, Wooyoung
    • ETRI Journal
    • /
    • v.39 no.3
    • /
    • pp.428-436
    • /
    • 2017
  • State-of-the-art processors require increasingly complicated memory services for high performance and low power consumption. In particular, they request transfers within a burst in a wrap-around order to minimize the miss penalty of a cache. However, synchronous dynamic random access memories (SDRAMs) do not always generate transfers in the wrap-round order required by the processors. Thus, a memory subsystem rearranges the SDRAM transfers in the wrap-around order, but the rearrangement process may increase memory latency and waste the bandwidth of on-chip interconnects. In this paper, we present a memory subsystem that is effective for the wrapping bursts of a cache. The proposed memory subsystem makes SDRAMs generate transfers in an intermediate order, where the transfers are rearranged in the wrap-around order with minimal penalties. Then, the transfers are delivered with priority, depending on the program locality in space. Experimental results showed that the proposed memory subsystem minimizes the memory performance loss resulting from wrapping bursts and, thus, improves program execution time.

High-Speed Low-Power Global On-Chip Interconnect Based on Delayed Symbol Transmission

  • Park, Kwang-Il;Koo, Ja-Hyuck;Shin, Won-Hwa;Jun, Young-Hyun;Kong, Bai-Sun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.12 no.2
    • /
    • pp.168-174
    • /
    • 2012
  • This paper describes a novel global on-chip interconnect scheme, in which a one UI-delayed symbol as well as the current symbol is sent for easing the sensing operation at receiver end. With this approach, the voltage swing on the channel for reliable sensing can be reduced, resulting in performance improvement in terms of power consumption, peak current, and delay spread due to PVT variations, as compared to the conventional repeater insertion schemes. Evaluation for on-chip interconnects having various lengths in a 130 nm CMOS process indicated that the proposed on-chip interconnect scheme achieved a power reduction of up to 71.3%. The peak current during data transmission and the delay spread due to PVT variations were also reduced by as much as 52.1% and 65.3%, respectively.

Run-time Memory Optimization Algorithm for the DDMB Architecture (DDMB 구조에서의 런타임 메모리 최적화 알고리즘)

  • Cho, Jeong-Hun;Paek, Yun-Heung;Kwon, Soo-Hyun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.5 s.102
    • /
    • pp.413-420
    • /
    • 2006
  • Most vendors of digital signal processors (DSPs) support a Harvard architecture, which has two or more memory buses, one for program and one or more for data and allow the processor to access multiple words of data from memory in a single instruction cycle. We already addressed how to efficiently assign data to multi-memory banks in our previous work. This paper reports on our recent attempt to optimize run-time memory. The run-time environment for dual data memory banks (DBMBs) requires two run-time stacks to control activation records located in two memory banks corresponding to calling procedures. However, activation records of two memory banks for a procedure are able to have different size. As a consequence, dual run-time stacks can be unbalanced whenever a procedure is called. This unbalance between two memory banks causes that usage of one memory bank can exceed the extent of on-chip memory area although there is free area in the other memory bank. We attempt balancing dual run-time slacks to enhance efficiently utilization of on-chip memory in this paper. The experimental results have revealed that although our algorithm is relatively quite simple, it still can utilize run-time memories efficiently; thus enabling our compiler to run extremely fast, yet minimizing the usage of un-time memory in the target code.

Research on the Main Memory Access Count According to the On-Chip Memory Size of an Artificial Neural Network (인공 신경망 가속기 온칩 메모리 크기에 따른 주메모리 접근 횟수 추정에 대한 연구)

  • Cho, Seok-Jae;Park, Sungkyung;Park, Chester Sungchung
    • Journal of IKEEE
    • /
    • v.25 no.1
    • /
    • pp.180-192
    • /
    • 2021
  • One widely used algorithm for image recognition and pattern detection is the convolution neural network (CNN). To efficiently handle convolution operations, which account for the majority of computations in the CNN, we use hardware accelerators to improve the performance of CNN applications. In using these hardware accelerators, the CNN fetches data from the off-chip DRAM, as the massive computational volume of data makes it difficult to derive performance improvements only from memory inside the hardware accelerator. In other words, data communication between off-chip DRAM and memory inside the accelerator has a significant impact on the performance of CNN applications. In this paper, a simulator for the CNN is developed to analyze the main memory or DRAM access with respect to the size of the on-chip memory or global buffer inside the CNN accelerator. For AlexNet, one of the CNN architectures, when simulated with increasing the size of the global buffer, we found that the global buffer of size larger than 100kB has 0.8x as low a DRAM access count as the global buffer of size smaller than 100kB.

A Memory-efficient Hand Segmentation Architecture for Hand Gesture Recognition in Low-power Mobile Devices

  • Choi, Sungpill;Park, Seongwook;Yoo, Hoi-Jun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.3
    • /
    • pp.473-482
    • /
    • 2017
  • Hand gesture recognition is regarded as new Human Computer Interaction (HCI) technologies for the next generation of mobile devices. Previous hand gesture implementation requires a large memory and computation power for hand segmentation, which fails to give real-time interaction with mobile devices to users. Therefore, in this paper, we presents a low latency and memory-efficient hand segmentation architecture for natural hand gesture recognition. To obtain both high memory-efficiency and low latency, we propose a streaming hand contour tracing unit and a fast contour filling unit. As a result, it achieves 7.14 ms latency with only 34.8 KB on-chip memory, which are 1.65 times less latency and 1.68 times less on-chip memory, respectively, compare to the best-in-class.