• Title/Summary/Keyword: Memory systems

Search Result 2,153, Processing Time 0.023 seconds

Memory Hierarchy Optimization in Embedded Systems using On-Chip SRAM (On-Chip SRAM을 이용한 임베디드 시스템 메모리 계층 최적화)

  • Kim, Jung-Won;Kim, Seung-Kyun;Lee, Jae-Jin;Jung, Chang-Hee;Woo, Duk-Kyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.2
    • /
    • pp.102-110
    • /
    • 2009
  • The memory wall is the growing disparity of speed between CPU and memory outside the CPU chip. An economical solution is a memory hierarchy organized into several levels, such as processor registers, cache, main memory, disk storage. We introduce a novel memory hierarchy optimization technique in Linux based embedded systems using on-chip SRAM for the first time. The optimization technique allocates On-Chip SRAM to the code/data that selected by programmers by using virtual memory systems. Experiments performed with nine applications indicate that the runtime improvements can be achieved by up to 35%, with an average of 14%, and the energy consumption can be reduced by up to 40%, with an average of 15%.

Performane Modeling of Flash Memory Storage Systems Using Simulink (시뮬링크를 이용한 플래시메모리 저장장치 성능 모델링)

  • Min, Hang Jun;Park, Jeong Su;Lee, Joo Il;Min, Sang Lyul;Kim, Kanghee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.5
    • /
    • pp.263-272
    • /
    • 2011
  • The complexity of flash memory based storage systems is high due to diverse host interfaces and other design choices such as mapping granularity, flash memory controller execution models and so on. Thus, it is possible that the actual performance after implementation is not consistent with the target performance. This paper demonstrates that the performance prediction of flash memory based storage systems is possible through performance modeling that takes into account various design parameters. In the performance modeling, the FTL, which is the core element of flash memory based storage systems, is modeled as a set of (copy-on-write) logs and their interactions. Also, the flash memory controller is modeled based on the classification proposed in the design of the Ozone flash controller. In this study, the performance model has been implemented using Simulink and experimental results are presented and analyzed.

An Implementation of Single Stack Multi-threading for Small Embedded Systems

  • Kim, Yong-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.4
    • /
    • pp.1-8
    • /
    • 2016
  • In small embedded systems including IoT devices, memory size is very small and it is important to reduce memory amount for execution of application programs. For multi-threaded applications, stack may consume a large amount of memory because each thread has its own stack of sufficiently large size for worst case. This paper presents an implementation of single stack multi-threading, called SSThread (Single Stack Thread), by sharing a stack for all threads to reduce stack memory size. By using SSThread, multi-threaded applications can be programmed based on normal C language environment and there is no requirement of transporting multi-threading operating systems. It consists of several library functions and various C macro definitions. Even though some functional restrictions in comparison to operating systems supporting complete multi-thread functionalities, it is very useful for small embedded systems with tiny memory size and it is simple to setup programming environment for multi-thread applications.

Performance and Energy Optimization for Low-Write Performance Non-volatile Main Memory Systems (낮은 쓰기 성능을 갖는 비휘발성 메인 메모리 시스템을 위한 성능 및 에너지 최적화 기법)

  • Jung, Woo-Soon;Lee, Hyung-Gyu
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.13 no.5
    • /
    • pp.245-252
    • /
    • 2018
  • Non-volatile RAM devices have been increasingly viewed as an alternative of DRAM main memory system. However some technologies including phase-change memory (PCM) are still suffering from relatively poor write performance as well as limited endurance. In this paper, we introduce a proactive last-level cache management to efficiently hide a low write performance of non-volatile main memory systems. The proposed method significantly reduces the cache miss penalty by proactively evicting the part of cachelines when the non-volatile main memory system is in idle state. Our trace-driven simulation demonstrates 24% performance enhancement, compared with a conventional LRU cache management, on the average.

Page Replacement for Write References in NAND Flash Based Virtual Memory Systems

  • Lee, Hyejeong;Bahn, Hyokyung;Shin, Kang G.
    • Journal of Computing Science and Engineering
    • /
    • v.8 no.3
    • /
    • pp.157-172
    • /
    • 2014
  • Contemporary embedded systems often use NAND flash memory instead of hard disks as their swap space of virtual memory. Since the read/write characteristics of NAND flash memory are very different from those of hard disks, an efficient page replacement algorithm is needed for this environment. Our analysis shows that temporal locality is dominant in virtual memory references but that is not the case for write references, when the read and write references are monitored separately. Based on this observation, we present a new page replacement algorithm that uses different strategies for read and write operations in predicting the re-reference likelihood of pages. For read operations, only temporal locality is used; but for write operations, both write frequency and temporal locality are used. The algorithm logically partitions the memory space into read and write areas to keep track of their reference patterns precisely, and then dynamically adjusts their size based on their reference patterns and I/O costs. Without requiring any external parameter to tune, the proposed algorithm outperforms CLOCK, CAR, and CFLRU by 20%-66%. It also supports optimized implementations for virtual memory systems.

Memory Design for Artificial Intelligence

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.1
    • /
    • pp.90-94
    • /
    • 2020
  • Artificial intelligence (AI) is software that learns large amounts of data and provides the desired results for certain patterns. In other words, learning a large amount of data is very important, and the role of memory in terms of computing systems is important. Massive data means wider bandwidth, and the design of the memory system that can provide it becomes even more important. Providing wide bandwidth in AI systems is also related to power consumption. AlphaGo, for example, consumes 170 kW of power using 1202 CPUs and 176 GPUs. Since more than 50% of the consumption of memory is usually used by system chips, a lot of investment is being made in memory technology for AI chips. MRAM, PRAM, ReRAM and Hybrid RAM are mainly studied. This study presents various memory technologies that are being studied in artificial intelligence chip design. Especially, MRAM and PRAM are commerciallized for the next generation memory. They have two significant advantages that are ultra low power consumption and nearly zero leakage power. This paper describes a comparative analysis of the four representative new memory technologies.

Large-Memory Data Processing on a Remote Memory System using Commodity Hardware (대용량 메모리 데이타 처리를 위한 범용 하드웨어 기반의 원격 메모리 시스템)

  • Jung, Hyung-Soo;Han, Hyuck;Yeom, Heon-Y.
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.445-458
    • /
    • 2007
  • This article presents a novel infrastructure for large-memory database processing using commodity hardware with operating system support. We exploit inexpensive PCs and a high-speed network capable of Remote Direct Memory Access (RDMA) operations to build a new memory hierarchy between fast volatile memory and slow disk storage. The new memory hierarchy guarantees a reasonable response time, and its storage size enables us to run large-memory database systems with little performance degradation. The proposed architecture has two main components: (1) a remote memory system inside the Linux kernel to manage other computers' memory pages efficiently and (2) a remote memory pager responsible for manipulating remote read/write operations on remote memory pages. We insist that the proposed architecture is practical enough to support the rigorous demands of commercial in-memory database systems by demonstrating the performance of publicly available main-memory databases (e.g., MySQL) on our prototyped system. The experimental results show very interesting results from the TPC-C benchmark.

Efficient Management of PCM-based Swap Systems with a Small Page Size

  • Park, Yunjoo;Bahn, Hyokyung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.5
    • /
    • pp.476-484
    • /
    • 2015
  • Due to the recent advances in non-volatile memory technologies such as PCM, a new memory hierarchy of computer systems is expected to appear. In this paper, we explore the performance of PCM-based swap systems and discuss how this system can be managed efficiently. Specifically, we introduce three management techniques. First, we show that the page fault handling time can be reduced by attaching PCM on DIMM slots, thereby eliminating the software stack overhead of block I/O and the context switch time. Second, we show that it is effective to reduce the page size and turn off the read-ahead option under the PCM swap system where the page fault handling time is sufficiently small. Third, we show that the performance is not degraded even with a small DRAM memory under a PCM swap device; this leads to the reduction of DRAM's energy consumption significantly compared to HDD-based swap systems. We expect that the result of this paper will lead to the transition of the legacy swap system structure of "large memory - slow swap" to a new paradigm of "small memory - fast swap."

A File System for Large-scale NAND Flash Memory Based Storage System

  • Son, Sunghoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.9
    • /
    • pp.1-8
    • /
    • 2017
  • In this paper, we propose a file system for flash memory which remedies shortcomings of existing flash memory file systems. Besides supporting large block size, the proposed file system reduces time in initializing file system significantly by adopting logical address comprised of erase block number and bitmap for pages in the block to find a page. The file system is suitable for embedded systems with limited main memory since it has small in-memory data structures. It also provides efficient management of obsolete blocks and free blocks, which contribute to the reduction of file update time. Finally the proposed file system can easily configure the maximum file size and file system size limits, which results in portability to emerging larger flash memories. By conducting performance evaluation studies, we show that the proposed file system can contribute to the performance improvement of embedded systems.

Bandwidth-aware Memory Placement on Hybrid Memories targeting High Performance Computing Systems

  • Lee, Jongmin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Modern computers provide tremendous computing capability and a large memory system. Hybrid memories consist of next generation memory devices and are adopted in high performance systems. However, the increased complexity of the microprocessor makes it difficult to operate the system effectively. In this paper, we propose a simple data migration method called Bandwidth-aware Data Migration (BDM) to efficiently use memory systems for high performance processors with hybrid memory. BDM monitors the status of applications running on the system using hardware performance monitoring tools and migrates the appropriate pages of selected applications to High Bandwidth Memory (HBM). BDM selects applications whose bandwidth usages are high and also evenly distributed among the threads. Experimental results show that BDM improves execution time by an average of 20% over baseline execution.