• 제목/요약/키워드: Memory performance

검색결과 3,115건 처리시간 0.035초

Gen-Z memory pool system implementation and performance measurement

  • Kwon, Won-ok;Sok, Song-Woo;Park, Chan-ho;Oh, Myeong-Hoon;Hong, Seokbin
    • ETRI Journal
    • /
    • 제44권3호
    • /
    • pp.450-461
    • /
    • 2022
  • The Gen-Z protocol is a memory semantic protocol between the memory and CPU used in computer architectures with large memory pools. This study presents the implementation of the Gen-Z hardware system configured using Gen-Z specification 1.0 and reports its performance. A hardware prototype of a DDR4 Gen-Z memory pool with an optimized character, a block device driver, and a file system for the Gen-Z hardware was designed. The Gen-Z IP was targeted to the FPGA, and a 512 GB Gen-Z memory pool was configured on an ×86 server. In the experiments, the latency and throughput of the Gen-Z memory were measured and compared with those of the local memory, SATA SSD, and NVMe using character or block device interfaces. The Gen-Z hardware exhibited superior throughput and latency performance compared with SATA SSD and NVMe at block sizes under 4 kB. The MySQL and File IO benchmark of Gen-Z showed good write performance in all block sizes and threads. Besides, it showed low latency in RocksDB's fillseq dbbench using the ext4 direct access filesystem.

Bandwidth-aware Memory Placement on Hybrid Memories targeting High Performance Computing Systems

  • Lee, Jongmin
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권8호
    • /
    • pp.1-8
    • /
    • 2019
  • Modern computers provide tremendous computing capability and a large memory system. Hybrid memories consist of next generation memory devices and are adopted in high performance systems. However, the increased complexity of the microprocessor makes it difficult to operate the system effectively. In this paper, we propose a simple data migration method called Bandwidth-aware Data Migration (BDM) to efficiently use memory systems for high performance processors with hybrid memory. BDM monitors the status of applications running on the system using hardware performance monitoring tools and migrates the appropriate pages of selected applications to High Bandwidth Memory (HBM). BDM selects applications whose bandwidth usages are high and also evenly distributed among the threads. Experimental results show that BDM improves execution time by an average of 20% over baseline execution.

대규모 병렬 시스템에서 캐시와 공유메모리를 이용한 유한 차분법 성능 (Performance of the Finite Difference Method Using Cache and Shared Memory for Massively Parallel Systems)

  • 김현규;이효종
    • 전자공학회논문지
    • /
    • 제50권4호
    • /
    • pp.108-116
    • /
    • 2013
  • 최근 GPU 시스템과 같은 수백 개의 프로세서로 구성된 대규모 병렬 시스템을 이용하여 성능을 향상시키는 방법들이 많이 개발 되었다. 대표적으로 GPU에서 캐싱(Caching)과 유사한 개념으로 공유 메모리가 사용되었다. 출력 값을 얻기 위해서 이웃 값을 참조하는 이미지 필터와 같은 알고리즘들의 경우 이웃 값의 참조가 빈번하게 발생되므로 공유 메모리를 사용할 경우 성능이 향상되었다. 그러나 공유 메모리를 사용하기 위해서는 기존 코드를 재 구현해야만 하고 이는 코드의 복잡도를 증가시키는 원인이 된다. 최근 GPU 시스템에서는 공유 메모리 뿐 아니라 L1과 L2 캐시 메모리를 지원하도록 하였다. L1 캐시 메모리는 공유 메모리와 동일한 하드웨어에 위치하여 캐시의 사용이 성능향상을 도와줄 것으로 예측된다. 따라서 본 논문에서는 캐시 메모리와 공유 메모리의 성능을 비교하였다. 연구결과 성능 면에서 캐시 메모리를 사용한 알고리즘과 공유메모리를 사용한 알고리즘은 유사하였다. 특히 캐시 메모리를 사용하는 경우 공유메모리 사용 프로그래밍에서 나타나는 코드 복잡도의 증가 문제도 동시에 해결할 수 있었다.

계층적 메모리 구성에 따른 GPU 성능 분석 (Analysis on the GPU Performance according to Hierarchical Memory Organization)

  • 최홍준;김종면;김철홍
    • 한국콘텐츠학회논문지
    • /
    • 제14권3호
    • /
    • pp.22-32
    • /
    • 2014
  • 병렬 연산에 최적화된 하드웨어를 가진 GPU를 그래픽스 작업 이외에 범용 작업에 활용하고자, 최근에 GPGPU 기술이 큰 관심을 받고 있다. GPU와 같은 대용량 병렬처리 장치에서는 메모리 시스템이 성능에 큰 영향을 미치게 된다. GPU에서는 메모리 시스템의 효율성을 향상시키기 위하여, 메모리 대역폭 사용률을 감소시켜주는 계층적 메모리 구조와 메모리를 요청하는 트랜잭션을 줄여주는 메모리 주소 접합과 메모리 요청 합병 등의 기술들을 사용한다. 본 논문에서는 메모리 시스템 효율성 향상을 위해 활용되는 기법들이 GPU 성능에 미치는 영향을 정량적으로 평가하고 분석하기 위해, 다양한 메모리 구조에 대한 실험을 수행한다. 실험 결과에 따르면, 캐쉬를 사용하지 않는 경우에 비해 8KB, 16KB, 32KB, 64KB의 L1 캐쉬를 추가하면 평균적으로 15.5%, 21.5%, 25.5%, 30.9%의 성능이 각각 향상된다. 하지만, 일부 벤치마크 프로그램에서는 데이터 일관성을 유지하기 위하여 메모리 트랜잭션이 증가함에 따라 오히려 성능이 감소하는 결과를 보이기도 한다. 그리고 메모리 요청에 대한 미스가 많이 발생하는 경우에는 캐쉬 레벨이 증가함에 따라 평균 메모리 접근 지연 시간이 증가하기도 한다.

다중 블록 지우기 기능을 적용한 퓨전 플래시 메모리의 FTL 성능 측정 도구 설계 및 구현 (Design and Implementation of FTL Performance Measurement Tool using Multi Block Erase of Fusion Flash Memory)

  • 이동환;조원희;김덕환
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2008년도 하계종합학술대회
    • /
    • pp.647-648
    • /
    • 2008
  • Traditional FTL and flash file systems based of NAND flash memory may not be adaptively applied to new fusion flash memory which combines the advantages of NAND and NOR flash memory. In this paper, we propose a FTL performance measurement tool using Multi Block Erase function of fusion flash memory. The performance measurement tool shows that multi block erase function can be effectively utilized in performance enhancement of garbage collection for fusion flash memory.

  • PDF

Filter Driver 와 NAND FLASH Memory를 이용한 HDD 장치의 성능 개선에 관한 연구 (A Study of HDD Performance Improvement through Filter Driver & NAND FLASH Memory)

  • 김우길;김영길
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2010년도 추계학술대회
    • /
    • pp.58-61
    • /
    • 2010
  • 본 논문에서는 I/O Filter Driver 와 NAND FLASH Memory의 적용을 통한 HDD 저장장치의 느린 I/O 성능을 개선하기 위한 방법에 대해 연구했다. 반도체 부품으로서 빠른 I/O 성능을 보이는 NANDFLASH Memory의 적용과 이를 구동시키기 위한 Filter Driver (Device Driver)를 적용했으며, 이를 통해 HDD 저장장치의 향상된 I/O성능을 분석하고 개선하는 방법을 제안한다.

  • PDF

SSD를 위한 최적화 파일시스템 (An Optimized File System for SSD)

  • 박제호
    • 반도체디스플레이기술학회지
    • /
    • 제9권2호
    • /
    • pp.67-72
    • /
    • 2010
  • Recently increasing application of flash memory in mobile and ubiquitous related devices is due to its non-volatility, fast response time, shock resistance and low power consumption. Following this trend, SSD(Solid State Disk) using multiple flash chips, instead of hard-drive based storage system, started to widely used for its advantageous features. However, flash memory based storage subsystem should resolve the performance bottleneck for writing in perspective of speed and lifetime according to its disadvantageous physical property. In order to provide tangible performance, solutions are studied in aspect of reclaiming of invalid regions by decreasing the number of erasures and distributing the erasures uniformly over the whole memory space as much as possible. In this paper, we study flash memory recycling algorithms with multiple management units and demonstrate that the proposed algorithm provides feasible performance. The proposed method utilizes the partitions of the memory space by utilizing threshold values and reconfigures the management units if necessary. The performance of the proposed policies is evaluated through a number of simulation based experiments.

메모리 상주 DBMS에서의 응용 트랜잭션 성능평가에 관한 연구 (A Study on the Performance Evaluation of Application Transaction in the Main Memory DBMS)

  • 김희완;이혜경
    • 디지털산업정보학회논문지
    • /
    • 제5권4호
    • /
    • pp.19-26
    • /
    • 2009
  • Recently, the Main Memory DBMS is gradually being expanded by the appearance of a large capacity of a Main Memory System, the increase in business area where it requires a real time process, and the rise of the users' required level. The Main Memory DBMS, which is able to go through a large capacity data process of the disk-based DBMS and guarantees a high efficiency, has domestically developed and has been put to a practical use. This paper presents an examination of the applied technologies and the limits of Altibase system, which is Main Memory DBMS. Moreover, it evaluated and performed a comparative analysis on the performance level of the Main Memory DBMS and the disk-based DBMS based on the same application. After five trials of the experiment based on the operating application, it was confirmed that the performance level of the Main Memory DBMS is enhanced and is higher by 4.13 to 7.89 times than the disk-based DBMS.

A Memory-Efficient Block-wise MAP Decoder Architecture

  • Kim, Sik;Hwang, Sun-Young;Kang, Moon-Jun
    • ETRI Journal
    • /
    • 제26권6호
    • /
    • pp.615-621
    • /
    • 2004
  • Next generation mobile communication system, such as IMT-2000, adopts Turbo codes due to their powerful error correction capability. This paper presents a block-wise maximum a posteriori (MAP) Turbo decoding structure with a low memory requirement. During this research, it has been observed that the training size and block size determine the amount of required memory and bit-error rate (BER) performance of the block-wise MAP decoder, and that comparable BER performance can be obtained with much shorter blocks when the training size is sufficient. Based on this observation, a new decoding structure is proposed and presented in this paper. The proposed block-wise decoder employs a decoding scheme for reducing the memory requirement by setting the training size to be N times the block size. The memory requirement for storing the branch and state metrics can be reduced 30% to 45%, and synthesis results show that the overall memory area can be reduced by 5.27% to 7.29%, when compared to previous MAP decoders. The decoder throughput can be maintained in the proposed scheme without degrading the BER performance.

  • PDF

Divided Disk Cache and SSD FTL for Improving Performance in Storage

  • Park, Jung Kyu;Lee, Jun-yong;Noh, Sam H.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제17권1호
    • /
    • pp.15-22
    • /
    • 2017
  • Although there are many efficient techniques to minimize the speed gap between processor and the memory, it remains a bottleneck for various commercial implementations. Since secondary memory technologies are much slower than main memory, it is challenging to match memory speed to the processor. Usually, hard disk drives include semiconductor caches to improve their performance. A hit in the disk cache eliminates the mechanical seek time and rotational latency. To further improve performance a divided disk cache, subdivided between metadata and data, has been proposed previously. We propose a new algorithm to apply the SSD that is flash memory-based solid state drive by applying FTL. First, this paper evaluates the performance of such a disk cache via simulations using DiskSim. Then, we perform an experiment to evaluate the performance of the proposed algorithm.