• Title/Summary/Keyword: On-Chip Memory

Search Result 296, Processing Time 0.027 seconds

Analysis of Power Noises by Chip-to-Chip Power Coupling on High-Speed Memory Modules (고속 메모리 모듈에서 칩 간의 파워커플링에 의한 파워 잠음 분석)

  • 위재경
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.10
    • /
    • pp.31-39
    • /
    • 2004
  • This paper illustrates the noise characteristics under chip's core operations according to types of packages and modules for DDR DRAM For analyzing this, the impedance profiles and power noises are analyzed with DRAM chips having commercial TSOP package and commercial FBGA package on TSOP-based DIMM and FBGA-based DIMH In controversy with common concepts, we find that the noise-isolation characteristics of FBGA package are more weak and sensitive on transferred noises than those of the TSOP package. In addition, the simulated results show that the decoupling capacitor locations of modules are more important to control the self and transfer noise characteristics than the lead inductance of the packages. Therefore, satisfying the target spec of the noise suppression and isolation can be achieved through the design of power distribution systems only with considering not only the package types but also the whole module system.

Performance Analysis of Flash Translation Layer using TPC-C Benchmark (플래시 변환 계층에 대한 TPC-C 벤치마크를 통한 성능분석)

  • Park, Sung-Hwan;Jang, Ju-Yeon;Suh, Young-Ju;Park, Won-Joo;Park, Sang-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.2
    • /
    • pp.201-205
    • /
    • 2008
  • The flash memory is widely used as a main storage of embedded devices. It is adopted as a storage of database as growing the capacity of the flash memory. We run TPC-C benchmark on various FTL algorithms. But, the database shows poor performance on flash memory because the characteristic of I/O requests is full random. In this paper, we show the performance of all existing FTL algorithms is very poor. Especially, the FTL algorithm known as good at small mobile equipment shows worst performance. In addition, the chip-inter leaving which is a technique to improve the performance of the flash memory doesn't work well. In this paper, we inform you the reason that we need a new FTL algorithm and the direction for the database in the future.

Wafer-Level Three-Dimensional Monolithic Integration for Intelligent Wireless Terminals

  • Gutmann, R.J.;Zeng, A.Y.;Devarajan, S.;Lu, J.Q.;Rose, K.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.4 no.3
    • /
    • pp.196-203
    • /
    • 2004
  • A three-dimensional (3D) IC technology platform is presented for high-performance, low-cost heterogeneous integration of silicon ICs. The platform uses dielectric adhesive bonding of fully-processed wafer-to-wafer aligned ICs, followed by a three-step thinning process and copper damascene patterning to form inter-wafer interconnects. Daisy-chain inter-wafer via test structures and compatibility of the process steps with 130 nm CMOS sal devices and circuits indicate the viability of the process flow. Such 3D integration with through-die vias enables high functionality in intelligent wireless terminals, as vertical integration of processor, large memory, image sensors and RF/microwave transceivers can be achieved with silicon-based ICs (Si CMOS and/or SiGe BiCMOS). Two examples of such capability are highlighted: memory-intensive Si CMOS digital processors with large L2 caches and SiGe BiCMOS pipelined A/D converters. A comparison of wafer-level 3D integration 'lith system-on-a-chip (SoC) and system-in-a-package (SiP) implementations is presented.

A Logic-compatible Embedded DRAM Utilizing Common-body Toggled Capacitive Cross-talk

  • Cheng, Weijie;Das, Hritom;Chung, Yeonbae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.6
    • /
    • pp.781-792
    • /
    • 2016
  • This paper presents a new approach to enhance the data retention of logic-compatible embedded DRAMs. The memory bit-cell in this work consists of two logic transistors implemented in generic triple-well CMOS process. The key idea is to use the parasitic junction capacitance built between the common cell-body and the data storage node. For each write access, a voltage transition on the cell-body couples up the data storage levels. This technique enhances the data retention and the read performance without using additional cell devices. The technique also provides much strong immunity from the write disturbance in the nature. Measurement results from a 64-kbit eDRAM test chip implemented in a 130 nm logic CMOS technology demonstrate the effectiveness of the proposed circuit technique. The refresh period for 99.9% bit yield measures $600{\mu}s$ at 1.1 V and $85^{\circ}C$, enhancing by % over the conventional design approach.

Design and Implementation of a Main-Memory Database System for Real-time Mobile GIS Application (실시간 모바일 GIS 응용 구축을 위한 주기억장치 데이터베이스 시스템 설계 및 구현)

  • Kang, Eun-Ho;Yun, Suk-Woo;Kim, Kyung-Chang
    • The KIPS Transactions:PartD
    • /
    • v.11D no.1
    • /
    • pp.11-22
    • /
    • 2004
  • As random access memory chip gets cheaper, it becomes affordable to realize main memory-based database systems. Consequently, reducing cache misses emerges as the most important issue in current main memory databases, in which CPU speeds have been increasing at 60% per year, compared to the memory speeds at 10% per you. In this paper, we design and implement a main-memory database system for real-time mobile GIS. Our system is composed of 5 modules: the interface manager provides the interface for PDA users; the memory data manager controls spatial and non-spatial data in main-memory using virtual memory techniques; the query manager processes spatial and non-spatial query : the index manager manages the MR-tree index for spatial data and the T-tree index for non-spatial index : the GIS server interface provides the interface with disk-based GIS. The MR-tree proposed propagates node splits upward only if one of the internal nodes on the insertion path has empty space. Thus, the internal nodes of the MR-tree are almost 100% full. Our experimental study shows that the two-dimensional MR-tree performs search up to 2.4 times faster than the ordinary R-tree. To use virtual memory techniques, the memory data manager uses page tables for spatial data, non- spatial data, T-tree and MR-tree. And, it uses indirect addressing techniques for fast reloading from disk.

Compact Field Remapping for Dynamically Allocated Structures (동적으로 할당된 구조체를 위한 압축된 필드 재배치)

  • Kim, Jeong-Eun;Han, Hwan-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.1003-1012
    • /
    • 2005
  • The most significant difference of embedded systems from general purpose systems is that embedded systems are allowed to use only limited resources including battery and memory. Especially, the number of applications increases which deal with multimedia data. In those systems with high data computations, the delay of memory access is one of the major bottlenecks hurting the system performance. As a result, many researchers have investigated various techniques to reduce the memory access cost. Most programs generally have locality in memory references. Temporal locality of references means that a resource accessed at one point will be used again in the near future. Spatial locality of references is that likelihood of using a resource gets higher if resources near it were just accessed. The latest embedded processors usually adapt cache memory to exploit these two types of localities. Processors access faster cache memory than off-chip memory, reducing the latency. In this paper we will propose the enhanced dynamic allocation technique for structure-type data in order to eliminate unused memory space and to reduce both the cache miss rate and the application execution time. The proposed approach aggregates fields from multiple records dynamically allocated and consecutively remaps them on the memory space. Experiments on Olden benchmarks show $13.9\%$ L1 cache miss rate drop and $15.9\%$ L2 cache miss drop on average, compared to the previously proposed techniques. We also find execution time reduced by $10.9\%$ on average, compared to the previous work.

Design of AMBA AX I Slave Unit for Pipelined Arithmetic Unit (파이프라인 구조 연산회로를 위한 AMBA AXI Slave 설계)

  • Choi, Byeong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.712-713
    • /
    • 2011
  • In this paper, the AMBA AXI slave unit that can verify the pipelined arithmetic unit is proposed and the 2-stage 16-bit pipelined multiplier is introduced as design example. The proposed AXI slave unit consists of input buffer block memory, control registers, pipelined arithmetic unit, control unit, output buffer block memory, and AXI slave interface unit. The main operational procedures are divided into the following steps, such as burst-mode input data loading for the input buffer memory, programming of control registers, arithmetic operations for block data in the input buffer memory, and burst-mode output data unloading from output buffer memory to host processor. Because the proposed AXI slave unit is general structure, it can be efficiently applicable to AMBA AXI and AHB slave unit with pipelined arithmetic unit.

  • PDF

Performance Evaluation and Optimization of Dual-Port SDRAM Architecture for Mobile Embedded Systems (모바일 내장형 시스템을 위한 듀얼-포트SDRAM의 성능 평가 및 최적화)

  • Yang, Hoe-Seok;Kim, Sung-Chan;Park, Hae-Woo;Kim, Jin-Woo;Ha, Soon-Hoi
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.5
    • /
    • pp.542-546
    • /
    • 2008
  • Recently dual-port SDRAM (DPSDRAM) architecture tailored for dual-processor based mobile embedded systems has been announced where a single memory chip plays the role of the local memories and the shared memory for both processors. In order to maintain memory consistency from simultaneous accesses of both ports, every access to the shared memory should be protected by a synchronization mechanism, which can result in substantial access latency. We propose two optimization techniques by exploiting the communication patterns of target applications: lock-priority scheme and static-copy scheme. Further, by dividing the shared bank into multiple blocks, we allow simultaneous accesses to different blocks thus achieve considerable performance gain. Experiments on a virtual prototyping system show a promising result - we could achieve about 20-50% performance gain compared to the base DPSDRAM architecture.

Design of a 32-Bit eFuse OTP Memory for PMICs (PMIC용 32bit eFuse OTP 설계)

  • Kim, Min-Sung;Yoon, Keon-Soo;Jang, Ji-Hye;Jin, Liyan;Ha, Pan-Bong;Kim, Young-Hee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.10
    • /
    • pp.2209-2216
    • /
    • 2011
  • In this paper, we design a 32-bit eFuse OTP memory for PMICs using MagnaChip's $0.18{\mu}m$ process. We solve a problem of an electrical shortage between an eFuse link and the VSS of a p-substrate in programming by placing an n-well under the eFuse link. Also, we propose a WL driver circuit which activates the RWL (read word-line) or WWL (write word-line) of a dual-port eFuse OTP memory cell selectively when a decoded WERP (WL enable for read or program) signal is inputted to the eFuse OTP memory directly. Furthermore, we reduce the layout area of the control circuit by removing a delay chain in the BL precharging circuit. We'can obtain an yield of 100% at a program voltage of 5.5V on 94 manufactured sample dies when measured with memory tester equipment.

Efficient Interface circuits of Embedded Memory for RISC-based DSP Microprocessor (RICS-based DSP의 효율적인 임베디드 메모리 인터페이스)

  • Kim, You-Jin;Cho, Kyoung-Rok;Kim, Sung-Sik;Cheong, Eui-Seok
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.9
    • /
    • pp.1-12
    • /
    • 1999
  • In this paper, we designed an embedded processor with 128Kbytes EPROM and 4Kbytes SRAM based on GMS30C2132 which RISC processor with DSP functions. And a new architecture of bus sharing to control the embedded memory and external memory unit i proposed aiming at one-cycle access between memories and CPU. For embedded 128Kbytes EPROM, we designed the new expansion interface for data size at data ordering with memory organization and the efficient interface for test. The embedded SRAM supports an extended stack area high speed DSP operation, instruction cache and variable data-length control which is accessed with 4K modulo addressing schemes. The proposed new architecture and circuits reduced the memory access cycle time from 40ns and improved operation speed 2-times for program benchmark test. The chip is occupied $108.68mm^2$ using $0.6{\mu}m$ CMOS technology.

  • PDF