• Title/Summary/Keyword: On-Chip Memory

Search Result 298, Processing Time 0.028 seconds

Real-time Implementation of Speech and Channel Coder on a DSP Chip for Radio Communication System (무선통신 적용을 위한 단일 DSP칩상의 음성/채널 부호화기 실시간 구현)

  • Kim Jae-Won;Sohn Dong-Chul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.6
    • /
    • pp.1195-1201
    • /
    • 2005
  • This paper deals with procedures and results for teal time implementation of G.729 speech coder and channel coder including convolution codec, viterbi decoder, and interleaver using a fixed point DSP chip for radio communication systems. We described the method for real-time implementation based on integer simulation results and explained the implemented results by quality performance and required complexity for real-time operation. The required complexity was 24MIPS and 9MIPS in computational load, and 12K words and 4K words in execution code length for speech and channel. The functional evaluation was performed into two steps. The one was bit exact comparison with a fixed point C code, the other was executed by actual speech samples and error test vectors. Unlik other results such as individual implementation, We implemented speech and channel coders on a DSP chip with 160MIPS computation capability and 64 K words memory on chip. This results outweigh the conventional methods in the point of system complexity and implementation cost for radio communication system.

Memory data layout and DMA transfer technique research For efficient data transfer of CNN accelerator (CNN 가속기의 효율적인 데이터 전송을 위한 메모리 데이터 레이아웃 및 DMA 전송기법 연구)

  • Cho, Seok-Jae;Park, Sungkyung;Park, Chester Sungchung
    • Journal of IKEEE
    • /
    • v.24 no.2
    • /
    • pp.559-569
    • /
    • 2020
  • One of the deep-running algorithms, CNN's artificial intelligence application uses off-chip memory to store data on the Convolution Layer. DMA can reduce processor load at every data transfer. It can also reduce application performance degradation by varying the order in which data from the Convolution layer is transmitted to the global buffer of the accelerator. For basic layouts with continuous memory addresses, SG-DMA showed about 3.4 times performance improvement in pre-setting DMA compared to using ordinaly DMA, and for Ideal layouts with discontinuous memory addresses, the ordinal DMA was about 1396 cycles faster than SG-DMA. Experiments have shown that a combination of memory data layout and DMA can reduce the DMA preset load by about 86 percent.

Analysis of Power Noises by Chip-to-Chip Power Coupling on High-Speed Memory Modules (고속 메모리 모듈에서 칩 간의 파워커플링에 의한 파워 잠음 분석)

  • 위재경
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.10
    • /
    • pp.31-39
    • /
    • 2004
  • This paper illustrates the noise characteristics under chip's core operations according to types of packages and modules for DDR DRAM For analyzing this, the impedance profiles and power noises are analyzed with DRAM chips having commercial TSOP package and commercial FBGA package on TSOP-based DIMM and FBGA-based DIMH In controversy with common concepts, we find that the noise-isolation characteristics of FBGA package are more weak and sensitive on transferred noises than those of the TSOP package. In addition, the simulated results show that the decoupling capacitor locations of modules are more important to control the self and transfer noise characteristics than the lead inductance of the packages. Therefore, satisfying the target spec of the noise suppression and isolation can be achieved through the design of power distribution systems only with considering not only the package types but also the whole module system.

Performance Analysis of Flash Translation Layer using TPC-C Benchmark (플래시 변환 계층에 대한 TPC-C 벤치마크를 통한 성능분석)

  • Park, Sung-Hwan;Jang, Ju-Yeon;Suh, Young-Ju;Park, Won-Joo;Park, Sang-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.2
    • /
    • pp.201-205
    • /
    • 2008
  • The flash memory is widely used as a main storage of embedded devices. It is adopted as a storage of database as growing the capacity of the flash memory. We run TPC-C benchmark on various FTL algorithms. But, the database shows poor performance on flash memory because the characteristic of I/O requests is full random. In this paper, we show the performance of all existing FTL algorithms is very poor. Especially, the FTL algorithm known as good at small mobile equipment shows worst performance. In addition, the chip-inter leaving which is a technique to improve the performance of the flash memory doesn't work well. In this paper, we inform you the reason that we need a new FTL algorithm and the direction for the database in the future.

Wafer-Level Three-Dimensional Monolithic Integration for Intelligent Wireless Terminals

  • Gutmann, R.J.;Zeng, A.Y.;Devarajan, S.;Lu, J.Q.;Rose, K.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.4 no.3
    • /
    • pp.196-203
    • /
    • 2004
  • A three-dimensional (3D) IC technology platform is presented for high-performance, low-cost heterogeneous integration of silicon ICs. The platform uses dielectric adhesive bonding of fully-processed wafer-to-wafer aligned ICs, followed by a three-step thinning process and copper damascene patterning to form inter-wafer interconnects. Daisy-chain inter-wafer via test structures and compatibility of the process steps with 130 nm CMOS sal devices and circuits indicate the viability of the process flow. Such 3D integration with through-die vias enables high functionality in intelligent wireless terminals, as vertical integration of processor, large memory, image sensors and RF/microwave transceivers can be achieved with silicon-based ICs (Si CMOS and/or SiGe BiCMOS). Two examples of such capability are highlighted: memory-intensive Si CMOS digital processors with large L2 caches and SiGe BiCMOS pipelined A/D converters. A comparison of wafer-level 3D integration 'lith system-on-a-chip (SoC) and system-in-a-package (SiP) implementations is presented.

A Logic-compatible Embedded DRAM Utilizing Common-body Toggled Capacitive Cross-talk

  • Cheng, Weijie;Das, Hritom;Chung, Yeonbae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.6
    • /
    • pp.781-792
    • /
    • 2016
  • This paper presents a new approach to enhance the data retention of logic-compatible embedded DRAMs. The memory bit-cell in this work consists of two logic transistors implemented in generic triple-well CMOS process. The key idea is to use the parasitic junction capacitance built between the common cell-body and the data storage node. For each write access, a voltage transition on the cell-body couples up the data storage levels. This technique enhances the data retention and the read performance without using additional cell devices. The technique also provides much strong immunity from the write disturbance in the nature. Measurement results from a 64-kbit eDRAM test chip implemented in a 130 nm logic CMOS technology demonstrate the effectiveness of the proposed circuit technique. The refresh period for 99.9% bit yield measures $600{\mu}s$ at 1.1 V and $85^{\circ}C$, enhancing by % over the conventional design approach.

Design and Implementation of a Main-Memory Database System for Real-time Mobile GIS Application (실시간 모바일 GIS 응용 구축을 위한 주기억장치 데이터베이스 시스템 설계 및 구현)

  • Kang, Eun-Ho;Yun, Suk-Woo;Kim, Kyung-Chang
    • The KIPS Transactions:PartD
    • /
    • v.11D no.1
    • /
    • pp.11-22
    • /
    • 2004
  • As random access memory chip gets cheaper, it becomes affordable to realize main memory-based database systems. Consequently, reducing cache misses emerges as the most important issue in current main memory databases, in which CPU speeds have been increasing at 60% per year, compared to the memory speeds at 10% per you. In this paper, we design and implement a main-memory database system for real-time mobile GIS. Our system is composed of 5 modules: the interface manager provides the interface for PDA users; the memory data manager controls spatial and non-spatial data in main-memory using virtual memory techniques; the query manager processes spatial and non-spatial query : the index manager manages the MR-tree index for spatial data and the T-tree index for non-spatial index : the GIS server interface provides the interface with disk-based GIS. The MR-tree proposed propagates node splits upward only if one of the internal nodes on the insertion path has empty space. Thus, the internal nodes of the MR-tree are almost 100% full. Our experimental study shows that the two-dimensional MR-tree performs search up to 2.4 times faster than the ordinary R-tree. To use virtual memory techniques, the memory data manager uses page tables for spatial data, non- spatial data, T-tree and MR-tree. And, it uses indirect addressing techniques for fast reloading from disk.

Compact Field Remapping for Dynamically Allocated Structures (동적으로 할당된 구조체를 위한 압축된 필드 재배치)

  • Kim, Jeong-Eun;Han, Hwan-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.1003-1012
    • /
    • 2005
  • The most significant difference of embedded systems from general purpose systems is that embedded systems are allowed to use only limited resources including battery and memory. Especially, the number of applications increases which deal with multimedia data. In those systems with high data computations, the delay of memory access is one of the major bottlenecks hurting the system performance. As a result, many researchers have investigated various techniques to reduce the memory access cost. Most programs generally have locality in memory references. Temporal locality of references means that a resource accessed at one point will be used again in the near future. Spatial locality of references is that likelihood of using a resource gets higher if resources near it were just accessed. The latest embedded processors usually adapt cache memory to exploit these two types of localities. Processors access faster cache memory than off-chip memory, reducing the latency. In this paper we will propose the enhanced dynamic allocation technique for structure-type data in order to eliminate unused memory space and to reduce both the cache miss rate and the application execution time. The proposed approach aggregates fields from multiple records dynamically allocated and consecutively remaps them on the memory space. Experiments on Olden benchmarks show $13.9\%$ L1 cache miss rate drop and $15.9\%$ L2 cache miss drop on average, compared to the previously proposed techniques. We also find execution time reduced by $10.9\%$ on average, compared to the previous work.

Design of AMBA AX I Slave Unit for Pipelined Arithmetic Unit (파이프라인 구조 연산회로를 위한 AMBA AXI Slave 설계)

  • Choi, Byeong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.712-713
    • /
    • 2011
  • In this paper, the AMBA AXI slave unit that can verify the pipelined arithmetic unit is proposed and the 2-stage 16-bit pipelined multiplier is introduced as design example. The proposed AXI slave unit consists of input buffer block memory, control registers, pipelined arithmetic unit, control unit, output buffer block memory, and AXI slave interface unit. The main operational procedures are divided into the following steps, such as burst-mode input data loading for the input buffer memory, programming of control registers, arithmetic operations for block data in the input buffer memory, and burst-mode output data unloading from output buffer memory to host processor. Because the proposed AXI slave unit is general structure, it can be efficiently applicable to AMBA AXI and AHB slave unit with pipelined arithmetic unit.

  • PDF

Performance Evaluation and Optimization of Dual-Port SDRAM Architecture for Mobile Embedded Systems (모바일 내장형 시스템을 위한 듀얼-포트SDRAM의 성능 평가 및 최적화)

  • Yang, Hoe-Seok;Kim, Sung-Chan;Park, Hae-Woo;Kim, Jin-Woo;Ha, Soon-Hoi
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.5
    • /
    • pp.542-546
    • /
    • 2008
  • Recently dual-port SDRAM (DPSDRAM) architecture tailored for dual-processor based mobile embedded systems has been announced where a single memory chip plays the role of the local memories and the shared memory for both processors. In order to maintain memory consistency from simultaneous accesses of both ports, every access to the shared memory should be protected by a synchronization mechanism, which can result in substantial access latency. We propose two optimization techniques by exploiting the communication patterns of target applications: lock-priority scheme and static-copy scheme. Further, by dividing the shared bank into multiple blocks, we allow simultaneous accesses to different blocks thus achieve considerable performance gain. Experiments on a virtual prototyping system show a promising result - we could achieve about 20-50% performance gain compared to the base DPSDRAM architecture.