• Title/Summary/Keyword: Memory Performance

Search Result 3,157, Processing Time 0.03 seconds

Gen-Z memory pool system implementation and performance measurement

  • Kwon, Won-ok;Sok, Song-Woo;Park, Chan-ho;Oh, Myeong-Hoon;Hong, Seokbin
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.450-461
    • /
    • 2022
  • The Gen-Z protocol is a memory semantic protocol between the memory and CPU used in computer architectures with large memory pools. This study presents the implementation of the Gen-Z hardware system configured using Gen-Z specification 1.0 and reports its performance. A hardware prototype of a DDR4 Gen-Z memory pool with an optimized character, a block device driver, and a file system for the Gen-Z hardware was designed. The Gen-Z IP was targeted to the FPGA, and a 512 GB Gen-Z memory pool was configured on an ×86 server. In the experiments, the latency and throughput of the Gen-Z memory were measured and compared with those of the local memory, SATA SSD, and NVMe using character or block device interfaces. The Gen-Z hardware exhibited superior throughput and latency performance compared with SATA SSD and NVMe at block sizes under 4 kB. The MySQL and File IO benchmark of Gen-Z showed good write performance in all block sizes and threads. Besides, it showed low latency in RocksDB's fillseq dbbench using the ext4 direct access filesystem.

Bandwidth-aware Memory Placement on Hybrid Memories targeting High Performance Computing Systems

  • Lee, Jongmin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Modern computers provide tremendous computing capability and a large memory system. Hybrid memories consist of next generation memory devices and are adopted in high performance systems. However, the increased complexity of the microprocessor makes it difficult to operate the system effectively. In this paper, we propose a simple data migration method called Bandwidth-aware Data Migration (BDM) to efficiently use memory systems for high performance processors with hybrid memory. BDM monitors the status of applications running on the system using hardware performance monitoring tools and migrates the appropriate pages of selected applications to High Bandwidth Memory (HBM). BDM selects applications whose bandwidth usages are high and also evenly distributed among the threads. Experimental results show that BDM improves execution time by an average of 20% over baseline execution.

Performance of the Finite Difference Method Using Cache and Shared Memory for Massively Parallel Systems (대규모 병렬 시스템에서 캐시와 공유메모리를 이용한 유한 차분법 성능)

  • Kim, Hyun Kyu;Lee, Hyo Jong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.4
    • /
    • pp.108-116
    • /
    • 2013
  • Many algorithms have been introduced to improve performance by using massively parallel systems, which consist of several hundreds of processors. A typical example is a GPU system of many processors which uses shared memory. In the case of image filtering algorithms, which make references to neighboring points, the shared memory helps improve performance by frequently accessing adjacent pixels. However, using shared memory requires rewriting the existing codes and consequently results in complexity of the codes. Recent GPU systems support both L1 and L2 cache along with shared memory. Since the L1 cache memory is located in the same area as the shared memory, the improvement of performance is predictable by using the cache memory. In this paper, the performance of cache and shared memory were compared. In conclusion, the performance of cache-based algorithm is very similar to the one of shared memory. The complexity of the code appearing in a shared memory system, however, is resolved with the cache-based algorithm.

Analysis on the GPU Performance according to Hierarchical Memory Organization (계층적 메모리 구성에 따른 GPU 성능 분석)

  • Choi, Hongjun;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.3
    • /
    • pp.22-32
    • /
    • 2014
  • Recently, GPGPU has been widely used for general-purpose processing as well as graphics processing by providing optimized hardware for parallel processing. Memory system has big effects on the performance of parallel processing units such as GPU. In the GPU, hierarchical memory architecture is implemented for high memory bandwidth. Moreover, both memory address coalescing and memory request merging techniques are widely used. This paper analyzes the GPU performance according to various memory organizations. According to our simulation results, GPU performance improves by 15.5%, 21.5%, 25.5%, 30.9% as adding 8KB L1, 16KB L1, 32KB L1, 64KB L1 cache, respectively, compared to case without L1 cache. However, experimental results show that some benchmarks decrease performance since memory transaction increases due to data dependency. Moreover, average memory access latency is increased as the depth of hierarchical cache level increases when cache miss occurs significantly.

Design and Implementation of FTL Performance Measurement Tool using Multi Block Erase of Fusion Flash Memory (다중 블록 지우기 기능을 적용한 퓨전 플래시 메모리의 FTL 성능 측정 도구 설계 및 구현)

  • Lee, Dong-Hwan;Cho, Won-Hee;Kim, Deok-Hwan
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.647-648
    • /
    • 2008
  • Traditional FTL and flash file systems based of NAND flash memory may not be adaptively applied to new fusion flash memory which combines the advantages of NAND and NOR flash memory. In this paper, we propose a FTL performance measurement tool using Multi Block Erase function of fusion flash memory. The performance measurement tool shows that multi block erase function can be effectively utilized in performance enhancement of garbage collection for fusion flash memory.

  • PDF

A Study of HDD Performance Improvement through Filter Driver & NAND FLASH Memory (Filter Driver 와 NAND FLASH Memory를 이용한 HDD 장치의 성능 개선에 관한 연구)

  • Kim, Woo-Gil;Kim, Young-Kil
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.58-61
    • /
    • 2010
  • In this paper, we research the method for HDD I/O Performance improvement by Filter Driver &NAND FLASH Memory. We analyze the effect of the operation of the Device Driver & NAND FLASH Memory and propose the method for the HDD I/O Performance improvement.

  • PDF

An Optimized File System for SSD (SSD를 위한 최적화 파일시스템)

  • Park, Je-Ho
    • Journal of the Semiconductor & Display Technology
    • /
    • v.9 no.2
    • /
    • pp.67-72
    • /
    • 2010
  • Recently increasing application of flash memory in mobile and ubiquitous related devices is due to its non-volatility, fast response time, shock resistance and low power consumption. Following this trend, SSD(Solid State Disk) using multiple flash chips, instead of hard-drive based storage system, started to widely used for its advantageous features. However, flash memory based storage subsystem should resolve the performance bottleneck for writing in perspective of speed and lifetime according to its disadvantageous physical property. In order to provide tangible performance, solutions are studied in aspect of reclaiming of invalid regions by decreasing the number of erasures and distributing the erasures uniformly over the whole memory space as much as possible. In this paper, we study flash memory recycling algorithms with multiple management units and demonstrate that the proposed algorithm provides feasible performance. The proposed method utilizes the partitions of the memory space by utilizing threshold values and reconfigures the management units if necessary. The performance of the proposed policies is evaluated through a number of simulation based experiments.

A Study on the Performance Evaluation of Application Transaction in the Main Memory DBMS (메모리 상주 DBMS에서의 응용 트랜잭션 성능평가에 관한 연구)

  • Kim, Hee Wan;Rhee, Hae Kyung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.5 no.4
    • /
    • pp.19-26
    • /
    • 2009
  • Recently, the Main Memory DBMS is gradually being expanded by the appearance of a large capacity of a Main Memory System, the increase in business area where it requires a real time process, and the rise of the users' required level. The Main Memory DBMS, which is able to go through a large capacity data process of the disk-based DBMS and guarantees a high efficiency, has domestically developed and has been put to a practical use. This paper presents an examination of the applied technologies and the limits of Altibase system, which is Main Memory DBMS. Moreover, it evaluated and performed a comparative analysis on the performance level of the Main Memory DBMS and the disk-based DBMS based on the same application. After five trials of the experiment based on the operating application, it was confirmed that the performance level of the Main Memory DBMS is enhanced and is higher by 4.13 to 7.89 times than the disk-based DBMS.

A Memory-Efficient Block-wise MAP Decoder Architecture

  • Kim, Sik;Hwang, Sun-Young;Kang, Moon-Jun
    • ETRI Journal
    • /
    • v.26 no.6
    • /
    • pp.615-621
    • /
    • 2004
  • Next generation mobile communication system, such as IMT-2000, adopts Turbo codes due to their powerful error correction capability. This paper presents a block-wise maximum a posteriori (MAP) Turbo decoding structure with a low memory requirement. During this research, it has been observed that the training size and block size determine the amount of required memory and bit-error rate (BER) performance of the block-wise MAP decoder, and that comparable BER performance can be obtained with much shorter blocks when the training size is sufficient. Based on this observation, a new decoding structure is proposed and presented in this paper. The proposed block-wise decoder employs a decoding scheme for reducing the memory requirement by setting the training size to be N times the block size. The memory requirement for storing the branch and state metrics can be reduced 30% to 45%, and synthesis results show that the overall memory area can be reduced by 5.27% to 7.29%, when compared to previous MAP decoders. The decoder throughput can be maintained in the proposed scheme without degrading the BER performance.

  • PDF

Divided Disk Cache and SSD FTL for Improving Performance in Storage

  • Park, Jung Kyu;Lee, Jun-yong;Noh, Sam H.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.1
    • /
    • pp.15-22
    • /
    • 2017
  • Although there are many efficient techniques to minimize the speed gap between processor and the memory, it remains a bottleneck for various commercial implementations. Since secondary memory technologies are much slower than main memory, it is challenging to match memory speed to the processor. Usually, hard disk drives include semiconductor caches to improve their performance. A hit in the disk cache eliminates the mechanical seek time and rotational latency. To further improve performance a divided disk cache, subdivided between metadata and data, has been proposed previously. We propose a new algorithm to apply the SSD that is flash memory-based solid state drive by applying FTL. First, this paper evaluates the performance of such a disk cache via simulations using DiskSim. Then, we perform an experiment to evaluate the performance of the proposed algorithm.