• Title/Summary/Keyword: memory access reduction

Search Result 72, Processing Time 0.029 seconds

Efficient Flash Memory Access Power Reduction Techniques for IoT-Driven Rare-Event Logging Application (IoT 기반 간헐적 이벤트 로깅 응용에 최적화된 효율적 플래시 메모리 전력 소모 감소기법)

  • Kwon, Jisu;Cho, Jeonghun;Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.2
    • /
    • pp.87-96
    • /
    • 2019
  • Low power issue is one of the most critical problems in the Internet of Things (IoT), which are powered by battery. To solve this problem, various approaches have been presented so far. In this paper, we propose a method to reduce the power consumption by reducing the numbers of accesses into the flash memory consuming a large amount of power for on-chip software execution. Our approach is based on using cooperative logging structure to distribute the sampling overhead in single sensor node to adjacent nodes in case of rare-event applications. The proposed algorithm to identify event occurrence is newly introduced with negative feedback method by observing difference between past data and recent data coming from the sensor. When an event with need of flash access is determined, the proposed approach only allows access to write the sampled data in flash memory. The proposed event detection algorithm (EDA) result in 30% reduction of power consumption compared to the conventional flash write scheme for all cases of event. The sampled data from the sensor is first traced into the random access memory (RAM), and write access to the flash memory is delayed until the page buffer of the on-chip flash memory controller in the micro controller unit (MCU) is full of the numbers of the traced data, thereby reducing the frequency of accessing flash memory. This technique additionally reduces power consumption by 40% compared to flash-write all data. By sharing the sampling information via LoRa channel, the overhead in sampling data is distributed, to reduce the sampling load on each node, so that the 66% reduction of total power consumption is achieved in several IoT edge nodes by removing the sampling operation of duplicated data.

Accelerating Memory Access with Address Phase Skipping in LPDDR2-NVM

  • Park, Jaehyun;Shin, Donghwa;Chang, Naehyuck;Lee, Hyung Gyu
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.6
    • /
    • pp.741-749
    • /
    • 2014
  • Low power double data rate 2 non-volatile memory (LPDDR2-NVM) has been deemed the standard interface to connect non-volatile memory devices such as phase-change memory (PCM) directly to the main memory bus. However, most of the previous literature does not consider or overlook this standard interface. In this paper, we propose address phase skipping by reforming the way of interfacing with LPDDR2-NVM. To verify effectiveness and functionality, we also develop a system-level prototype that includes our customized LPDDR2-NVM controller and commercial PCM devices. Extensive simulations and measurements demonstrate up to a 3.6% memory access time reduction for commercial PCM devices and a 31.7% reduction with optimistic parameters of the PCM research prototypes in industries.

Bit Flip Reduction Schemes to Improve PCM Lifetime: A Survey

  • Han, Miseon;Han, Youngsun
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.5
    • /
    • pp.337-345
    • /
    • 2016
  • Recently, as the number of cores in computer systems has increased, the need for larger memory capacity has also increased. Unfortunately, dynamic random access memory (DRAM), popularly used as main memory for decades, now faces a scalability limitation. Phase change memory (PCM) is considered one of the strong alternatives to DRAM due to its advantages, such as high scalability, non-volatility, low idle power, and so on. However, since PCM suffers from short write endurance, direct use of PCM in main memory incurs a significant problem due to its short lifetime. To solve the lifetime limitation, many studies have focused on reducing the number of bit flips per write request. In this paper, we describe the PCM operating principles in detail and explore various bit flip reduction schemes. Also, we compare their performance in terms of bit reduction rate and lifetime improvement.

Efficient Utilization of Burst Data Transfers of DMA (직접 메모리 접근 장치에서 버스트 데이터 전송 기능의 효과적인 활용)

  • Lee, Jongwon;Cho, Doosan;Paek, Yunheung
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.5
    • /
    • pp.255-264
    • /
    • 2013
  • Resolving of memory access latency is one of the most important problems in modern embedded system design. Recently, tons of studies are presented to reduce and hide the access latency. Burst/page data transfer modes are representative hardware techniques for achieving such purpose. The burst data transfer capability offers an average access time reduction of more than 65 percent for an eight-word sequential transfer. However, solution of utilizing such burst data transfer to improve memory performance has not been accomplished at commercial level. Therefore, this paper presents a new technique that provides the maximum utilization of burst transfer for memory accesses with local variables in code by reorganizing variables placement.

Memory Access Reduction Scheme for H.264/AVC Decoder Motion Compensation (H.264/AVC 디코더의 움직임 보상을 위한 메모리 접근 감소 기법)

  • Park, Kyoung-Oh;Hong, You-Pyo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.4C
    • /
    • pp.349-354
    • /
    • 2009
  • In this paper, a new motion compensation scheme to reduce external memory access frequency which is one of the major bottlenecks for real-time decoding is proposed. Most H.264/AVC decoders store reference pictures in external memories due to the large size and reference blocks are read into the decoder core as needed during decoding. If the reference data access is done for each reference block in decoding sequence, the memory bandwidth can be unacceptable for real-time decoding. This paper presents a memory access scheme for motion compensation to read as many reference data as possible with reduced memory access frequency by analyzing reference data access pattern for each macroblock. Experimental results show that the proposed motion compensation scheme leads to approximately 30% improvement in memory bandwidth requirement.

Reduction of Read Access Latency by Invalid Hint in Directory-Based Cache Coherence Scheme (디렉토리를 이용한 캐쉬 일관성 유지 기법에서 무효화 힌트를 이용한 읽기 접근 시간 감소)

  • Oh, Seung-Taek;Rhee, Yun-Seok;Maeng, Seung-Ryoul;Lee, Joon-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.408-415
    • /
    • 2000
  • Large scale shared memory multiprocessors have suffered from large access latency to shared memory. The large latency partially stems from a feature of directory-based cache coherence schemes which require a shared memory access to be serviced at a home node of the memory block. The home visit results in three or more hops traversal for a memory read access. The traversal becomes much longer as a system scales up. In this paper, we propose a new cache coherence scheme that reduces read access latency. The proposed scheme exploits ideas of invalid hint. Invalid hint for a cache block means which node has invalidated the cache block before. Thus a read access request can be directly sent to and serviced by the node (called owner) without help of a home node. Execution-driven simulation is employed to evaluate performance of the proposed scheme. The simulation results show that read access latency and execution time are reduced.

  • PDF

An Adaptive Prefetching Technique for Software Distributed Shared Memory Systems (소프트웨어 분산공유메모리시스템을 위한 적응적 선인출 기법)

  • Lee, Sang-Kwon;Yun, Hee-Chul;Lee, Joon-Won;Maeng, Seung-Ryoul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.9
    • /
    • pp.461-468
    • /
    • 2001
  • Though shared virtual memory (SVM) system promise low cost solutions for high performance computing they suffer from long memory latencies. These latencies are usually caused by repetitive invalidations on shared data. Since shared data are accessed through synchronization and the patterns by which threads synchronizes are repetitive, a prefetching scheme bases on such repetitiveness would reduce memory latencies. Based on this observation, we propose a prefetching technique which predicts future access behavior by analyzing access history per synchronization variable. Our technique was evaluated on an 8-node SVM system using the SPLASH-2 benchmark. The results show the our technique could achieve 34%~45% reduction in memory access latencies.

  • PDF

Electrical characteristic for Phase-change Random Access Memory according to the $Ge_{1}Se_{1}Te_{2}$ thin film of cell structure (상변화 메모리 응용을 위한 $Ge_{1}Se_{1}Te_{2}$ 박막의 셀 구조에 따른 전기적 특성)

  • Na, Min-Seok;Lim, Dong-Kyu;Kim, Jae-Hoon;Choi, Hyuk;Chung, Hong-Bay
    • Proceedings of the KIEE Conference
    • /
    • 2007.07a
    • /
    • pp.1335-1336
    • /
    • 2007
  • Among the emerging non-volatile memory technologies, phase change memories are the most attractive in terms of both performance and scalability perspectives. Phase-change random access memory(PRAM), compare with flash memory technologies, has advantages of high density, low cost, low consumption energy and fast response speed. However, PRAM device has disadvantages of set operation speed and reset operation power consumption. In this paper, we investigated scalability of $Ge_{1}Se_{1}Te_{2}$ chalcogenide material to improve its properties. As a result, reduction of phase change region have improved electrical properties of PRAM device.

  • PDF

Performance Enhancement of Embedded Software Using Register Promotion (레지스터 프로모션을 이용한 내장형 소프트웨어의 성능 향상)

  • Lee Jong-Yeol
    • The KIPS Transactions:PartA
    • /
    • v.11A no.5
    • /
    • pp.373-382
    • /
    • 2004
  • In this paper, a register promotion technique that translates memory accesses to register accesses is presented to enhance embedded software performance. In the proposed method, a source code is profiled to generate a memory trace. From the profiling results, target functions with high dynamic call counts are selected, and the proposed register promotion technique is applied only to the target functions to save the compilation time. The memory trace of the target functions is searched for the memory accesses that result in cycle count reduction when replaced by register accesses, and they are translated to register accesses by modifying the intermediate code and allocating promotion registers. The experiments on MediaBench and DSPstone benchmark programs show that the proposed method increases the performance by 14% and 18% on the average for ARM and MCORE, respectively.

270 MHz Full HD H.264/AVC High Profile Encoder with Shared Multibank Memory-Based Fast Motion Estimation

  • Lee, Suk-Ho;Park, Seong-Mo;Park, Jong-Won
    • ETRI Journal
    • /
    • v.31 no.6
    • /
    • pp.784-794
    • /
    • 2009
  • We present a full HD (1080p) H.264/AVC High Profile hardware encoder based on fast motion estimation (ME). Most processing cycles are occupied with ME and use external memory access to fetch samples, which degrades the performance of the encoder. A novel approach to fast ME which uses shared multibank memory can solve these problems. The proposed pixel subsampling ME algorithm is suitable for fast motion vector searches for high-quality resolution images. The proposed algorithm achieves an 87.5% reduction of computational complexity compared with the full search algorithm in the JM reference software, while sustaining the video quality without any conspicuous PSNR loss. The usage amount of shared multibank memory between the coarse ME and fine ME blocks is 93.6%, which saves external memory access cycles and speeds up ME. It is feasible to perform the algorithm at a 270 MHz clock speed for 30 frame/s real-time full HD encoding. Its total gate count is 872k, and internal SRAM size is 41.8 kB.