• Title/Summary/Keyword: Efficient Memory

Search Result 1,330, Processing Time 0.026 seconds

Considering Read and Write Characteristics of Page Access Separately for Efficient Memory Management

  • Hyokyung Bahn
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.70-75
    • /
    • 2023
  • With the recent proliferation of memory-intensive workloads such as deep learning, analyzing memory access characteristics for efficient memory management is becoming increasingly important. Since read and write operations in memory access have different characteristics, an efficient memory management policy should take into accountthe characteristics of thesetwo operationsseparately. Although some previous studies have considered the different characteristics of reads and writes, they require a modified hardware architecture supporting read bits and write bits. Unlike previous approaches, we propose a software-based management policy under the existing memory architecture for considering read/write characteristics. The proposed policy logically partitions memory space into the read/write area and the write area by making use of reference bits and dirty bits provided in modern paging systems. Simulation experiments with memory access traces show that our approach performs better than the CLOCK algorithm by 23% on average, and the effect is similar to the previous policy with hardware support.

Efficient Hybrid Transactional Memory Scheme using Near-optimal Retry Computation and Sophisticated Memory Management in Multi-core Environment

  • Jang, Yeon-Woo;Kang, Moon-Hwan;Chang, Jae-Woo
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.499-509
    • /
    • 2018
  • Recently, hybrid transactional memory (HyTM) has gained much interest from researchers because it combines the advantages of hardware transactional memory (HTM) and software transactional memory (STM). To provide the concurrency control of transactions, the existing HyTM-based studies use a bloom filter. However, they fail to overcome the typical false positive errors of a bloom filter. Though the existing studies use a global lock, the efficiency of global lock-based memory allocation is significantly low in multi-core environment. In this paper, we propose an efficient hybrid transactional memory scheme using near-optimal retry computation and sophisticated memory management in order to efficiently process transactions in multi-core environment. First, we propose a near-optimal retry computation algorithm that provides an efficient HTM configuration using machine learning algorithms, according to the characteristic of a given workload. Second, we provide an efficient concurrency control for transactions in different environments by using a sophisticated bloom filter. Third, we propose a memory management scheme being optimized for the CPU cache line, in order to provide a fast transaction processing. Finally, it is shown from our performance evaluation that our HyTM scheme achieves up to 2.5 times better performance by using the Stanford transactional applications for multi-processing (STAMP) benchmarks than the state-of-the-art algorithms.

Memory-Efficient Hypercube Key Establishment Scheme for Micro-Sensor Networks

  • Lhee, Kyung-Suk
    • ETRI Journal
    • /
    • v.30 no.3
    • /
    • pp.483-485
    • /
    • 2008
  • A micro-sensor network is comprised of a large number of small sensors with limited memory capacity. Current key-establishment schemes for symmetric encryption require too much memory for micro-sensor networks on a large scale. In this paper, we propose a memory-efficient hypercube key establishment scheme that only requires logarithmic memory overhead.

  • PDF

Implementation of Integrated CPU-GPU for Efficient Uniform Memory Access Method and Verification System (CPU-GPU간 긴밀성을 위한 효율적인 공유메모리 접근 방법과 검증 시스템 구현)

  • Park, Hyun-moon;Kwon, Jinsan;Hwang, Tae-ho;Kim, Dong-Sun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.57-65
    • /
    • 2016
  • In this paper, we propose a system for efficient use of shared memory between CPU and GPU. The system, called Fusion Architecture, assures consistency of the shared memory and minimizes cache misses that frequently occurs on Heterogeneous System Architecture or Unified Virtual Memory based systems. It also maximizes the performance for memory intensive jobs by efficient allocation of GPU cores. To test between architectures on various scenarios, we introduce the Fusion Architecture Analyzer, which compares OpenMP, OpenCL, CUDA, and the proposed architecture in terms of memory overhead and process time. As a result, Proposed fusion architectures show that the Fusion Architecture runs benchmarks 55% faster and reduces memory overheads by 220% in average.

A Memory-efficient Hand Segmentation Architecture for Hand Gesture Recognition in Low-power Mobile Devices

  • Choi, Sungpill;Park, Seongwook;Yoo, Hoi-Jun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.3
    • /
    • pp.473-482
    • /
    • 2017
  • Hand gesture recognition is regarded as new Human Computer Interaction (HCI) technologies for the next generation of mobile devices. Previous hand gesture implementation requires a large memory and computation power for hand segmentation, which fails to give real-time interaction with mobile devices to users. Therefore, in this paper, we presents a low latency and memory-efficient hand segmentation architecture for natural hand gesture recognition. To obtain both high memory-efficiency and low latency, we propose a streaming hand contour tracing unit and a fast contour filling unit. As a result, it achieves 7.14 ms latency with only 34.8 KB on-chip memory, which are 1.65 times less latency and 1.68 times less on-chip memory, respectively, compare to the best-in-class.

A Memory-Efficient VLC Decoder Architecture for MPEG-2 Application

  • Lee, Seung-Joon;Suh, Ki-bum;Chong, Jong-wha
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.360-363
    • /
    • 1999
  • Video data compression is a major key technology in the field of multimedia applications. Variable-length coding is the most popular data compression technique which has been used in many data compression standards, such as JPEG, MPEG and image data compression standards, etc. In this paper, we present memory efficient VLC decoder architecture for MPEG-2 application which can achieve small memory space and higher throughput. To reduce the memory size, we propose a new grouping, remainder generation method and merged lookup table (LUT) for variable length decoders (VLD's). In the MPEG-2, the discrete cosine transform (DCT) coefficient table zero and one are mapped onto one memory whose space requirement has been minimized by using efficient memory mapping strategy The proposed memory size is only 256 words in spite of mapping two DCT coefficient tables.

  • PDF

Characterizing Memory References for Smartphone Applications and Its Implications

  • Lee, Soyoon;Bahn, Hyokyung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.2
    • /
    • pp.223-231
    • /
    • 2015
  • As smartphones support a variety of applications and their memory demand keeps increasing, the design of an efficient memory management policy is becoming increasingly important. Meanwhile, as nonvolatile memory (NVM) technologies such as PCM and STT-MRAM have emerged as new memory media of smartphones, characterizing memory references for NVM-based smartphone memory systems is needed. For the deep understanding of memory access features in smartphones, this paper performs comprehensive analysis of memory references for various smartphone applications. We first analyze the temporal locality and frequency of memory reference behaviors to quantify the effects of the two properties with respect to the re-reference likelihood of pages. We also analyze the skewed popularity of memory references and model it as a Zipf-like distribution. We expect that the result of this study will be a good guidance to design an efficient memory management policy for future smartphones.

Automated optimization for memory-efficient high-performance deep neural network accelerators

  • Kim, HyunMi;Lyuh, Chun-Gi;Kwon, Youngsu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.505-517
    • /
    • 2020
  • The increasing size and complexity of deep neural networks (DNNs) necessitate the development of efficient high-performance accelerators. An efficient memory structure and operating scheme provide an intuitive solution for high-performance accelerators along with dataflow control. Furthermore, the processing of various neural networks (NNs) requires a flexible memory architecture, programmable control scheme, and automated optimizations. We first propose an efficient architecture with flexibility while operating at a high frequency despite the large memory and PE-array sizes. We then improve the efficiency and usability of our architecture by automating the optimization algorithm. The experimental results show that the architecture increases the data reuse; a diagonal write path improves the performance by 1.44× on average across a wide range of NNs. The automated optimizations significantly enhance the performance from 3.8× to 14.79× and further provide usability. Therefore, automating the optimization as well as designing an efficient architecture is critical to realizing high-performance DNN accelerators.

An efficient Storage Reclamation Algorithm for RISC Parallel Processing (RISC 병렬 처리를 위한 기억공간의 효율적인 활용 알고리즘)

  • 이철원;임인칠
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.28B no.9
    • /
    • pp.703-711
    • /
    • 1991
  • In this paper, an efficient storage reclamation algorithm for RISC parallel processing in the object orented programming environments is presented. The memory management for the dynamic memory allocation and the frequent memory access in object oriented programming is the main factor that decreases RISC parallel processing performance. The proposed algorithm can be efficiently allocated the memory space of RISCy computer which is required the frequent memory access, so it can be increased RISC parallel processing performance. The proposed algorithm is verified the efficiency by implementing C language on SUN SPARC(4.3 BSD UNIX).

  • PDF

A Study on Efficient Use of Dual Data Memory Banks in Flight Control Computers

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.9 no.1
    • /
    • pp.29-34
    • /
    • 2017
  • Over the past several decades, embedded system and flight control computer technologies have been evolved to meet the diverse needs of the mobile device market. Current embedded systems are at the heart of technologies that can take advantage of small-sized specialized hardware while still providing high-efficiency performance at low cost. One of these key technologies is multiple memory banks. For example, a dual memory bank can provide two times more memory bandwidth in the same memory space. This benefit take lower cost to provide the same bandwidth. However, there is still few software technologies to support the efficient use of multiple memory banks. In this study, we present a technique to efficiently exploit multiple memory banks by software support. Specifically, our technique use an interference graph to optimally allocate data to different memory banks by an optimizing compiler. As a result, the execution time can be improved upto 7% with the proposed technique.