• 제목/요약/키워드: In-Memory Computing

검색결과 752건 처리시간 0.021초

Hybrid in-memory storage for cloud infrastructure

  • Kim, Dae Won;Kim, Sun Wook;Oh, Soo Cheol
    • 인터넷정보학회논문지
    • /
    • 제22권5호
    • /
    • pp.57-67
    • /
    • 2021
  • Modern cloud computing is rapidly changing from traditional hypervisor-based virtual machines to container-based cloud-native environments. Due to limitations in I/O performance required for both virtual machines and containers, the use of high-speed storage (SSD, NVMe, etc.) is increasing, and in-memory computing using main memory is also emerging. Running a virtual environment on main memory gives better performance compared to other storage arrays. However, RAM used as main memory is expensive and due to its volatile characteristics, data is lost when the system goes down. Therefore, additional work is required to run the virtual environment in main memory. In this paper, we propose a hybrid in-memory storage that combines a block storage such as a high-speed SSD with main memory to safely operate virtual machines and containers on main memory. In addition, the proposed storage showed 6 times faster write speed and 42 times faster read operation compared to regular disks for virtual machines, and showed the average 12% improvement of container's performance tests.

Gen-Z memory pool system implementation and performance measurement

  • Kwon, Won-ok;Sok, Song-Woo;Park, Chan-ho;Oh, Myeong-Hoon;Hong, Seokbin
    • ETRI Journal
    • /
    • 제44권3호
    • /
    • pp.450-461
    • /
    • 2022
  • The Gen-Z protocol is a memory semantic protocol between the memory and CPU used in computer architectures with large memory pools. This study presents the implementation of the Gen-Z hardware system configured using Gen-Z specification 1.0 and reports its performance. A hardware prototype of a DDR4 Gen-Z memory pool with an optimized character, a block device driver, and a file system for the Gen-Z hardware was designed. The Gen-Z IP was targeted to the FPGA, and a 512 GB Gen-Z memory pool was configured on an ×86 server. In the experiments, the latency and throughput of the Gen-Z memory were measured and compared with those of the local memory, SATA SSD, and NVMe using character or block device interfaces. The Gen-Z hardware exhibited superior throughput and latency performance compared with SATA SSD and NVMe at block sizes under 4 kB. The MySQL and File IO benchmark of Gen-Z showed good write performance in all block sizes and threads. Besides, it showed low latency in RocksDB's fillseq dbbench using the ext4 direct access filesystem.

CXL 메모리 및 활용 소프트웨어 기술 동향 (Technology Trends in CXL Memory and Utilization Software )

  • 안후영;김선영;박유미;한우종
    • 전자통신동향분석
    • /
    • 제39권1호
    • /
    • pp.62-73
    • /
    • 2024
  • Artificial intelligence relies on data-driven analysis, and the data processing performance strongly depends on factors such as memory capacity, bandwidth, and latency. Fast and large-capacity memory can be achieved by composing numerous high-performance memory units connected via high-performance interconnects, such as Compute Express Link (CXL). CXL is designed to enable efficient communication between central processing units, memory, accelerators, storage, and other computing resources. By adopting CXL, a composable computing architecture can be implemented, enabling flexible server resource configuration using a pool of computing resources. Thus, manufacturers are actively developing hardware and software solutions to support CXL. We present a survey of the latest software for CXL memory utilization and the most recent CXL memory emulation software. The former supports efficient use of CXL memory, and the latter offers a development environment that allows developers to optimize their software for the hardware architecture before commercial release of CXL memory devices. Furthermore, we review key technologies for improving the performance of both the CXL memory pool and CXL-based composable computing architecture along with various use cases.

QPlayer: Lightweight, scalable, and fast quantum simulator

  • Ki-Sung Jin;Gyu-Il Cha
    • ETRI Journal
    • /
    • 제45권2호
    • /
    • pp.304-317
    • /
    • 2023
  • With the rapid evolution of quantum computing, digital quantum simulations are essential for quantum algorithm verification, quantum error analysis, and new quantum applications. However, the exponential increase in memory overhead and operation time is challenging issues that have not been solved for years. We propose a novel approach that provides more qubits and faster quantum operations with smaller memory than before. Our method selectively tracks realized quantum states using a reduced quantum state representation scheme instead of loading the entire quantum states into memory. This method dramatically reduces memory space ensuring fast quantum computations without compromising the global quantum states. Furthermore, our empirical evaluation reveals that our proposed idea outperforms traditional methods for various algorithms. We verified that the Grover algorithm supports up to 55 qubits and the surface code algorithm supports up to 85 qubits in 512 GB memory on a single computational node, which is against the previous studies that support only between 35 qubits and 49 qubits.

가상 구역에 따른 메모리 자가 치유에 대한 분석 알고리즘 (Analysis Algorithm for Memory BISR as Imagination Zone)

  • 박재흥;심은성;장훈
    • 대한전자공학회논문지SD
    • /
    • 제46권12호
    • /
    • pp.73-79
    • /
    • 2009
  • 최근 VLSI 회로 직접도가 급속도로 증가함에 따라 하나의 시스템 칩에 고밀도와 고용량의 내장 메모리(Embedded Memory)가 구현되고 있다. 고장난 메모리를 여분 메모리(Spare Memory)로 재배치함으로써 메모리 수율 향상과 사용자에게 메모리를 투명하게 사용할 수 있도록 제공할 수 있다. 본 논문에서는 고장난 메모리 부분을 여분 메모리의 행과 열 메모리 사용으로 고장난 메모리를 고장이 없는 메모리처럼 사용할 수 있도록 여분 메모리 재배치 알고리즘인 MRI를 제안하고자 한다.

CXL 인터커넥트 기술 연구개발 동향 (Trends in Compute Express Link(CXL) Technology)

  • 김선영;안후영;박유미;한우종
    • 전자통신동향분석
    • /
    • 제38권5호
    • /
    • pp.23-33
    • /
    • 2023
  • With the widespread demand from data-intensive tasks such as machine learning and large-scale databases, the amount of data processed in modern computing systems is increasing exponentially. Such data-intensive tasks require large amounts of memory to rapidly process and analyze massive data. However, existing computing system architectures face challenges when building large-scale memory owing to various structural issues such as CPU specifications. Moreover, large-scale memory may cause problems including memory overprovisioning. The Compute Express Link (CXL) allows computing nodes to use large amounts of memory while mitigating related problems. Hence, CXL is attracting great attention in industry and academia. We describe the overarching concepts underlying CXL and explore recent research trends in this technology.

CCIX 연결망과 메모리 확장기술 동향 (Trends of the CCIX Interconnect and Memory Expansion Technology)

  • 김선영;안후영;전성익;박유미;한우종
    • 전자통신동향분석
    • /
    • 제37권1호
    • /
    • pp.42-52
    • /
    • 2022
  • With the advent of the big data era, the memory capacity required for computing systems is rapidly increasing, especially in High Performance Computing systems. However, the number of DRAMs that can be used in a computing node is limited by the structural limitations of the hardware (for example, CPU specifications). Memory expansion technology has attracted attention as a means of overcoming this limitation. This technology expands the memory capacity by leveraging the external memory connected to the host system through hardware interface such as PCIe and CCIX. In this paper, we present an overview and describe the development trends of the memory expansion technology. We also provide detailed descriptions and use cases of the CCIX that provides higher bandwidth and lower latency than cases of the PCIe.

SRAM 이중-포트를 위한 내장된 메모리 BIST IP 자동생성 시스템 개발 (The Development on Embedded Memory BIST IP Automatic Generation System for the Dual-Port of SRAM)

  • 심은성;이정민;이찬영;장훈
    • 대한전자공학회논문지SD
    • /
    • 제42권2호
    • /
    • pp.57-64
    • /
    • 2005
  • 본 논문에서는 내장된 메모리의 테스트를 편리하게 하기 위하여 간단한 사용자 설정에 의해 자동으로 BIST IP를 생성해 내는 범용 CAD 툴을 개발하였다. 기존의 툴들은 널리 사용되고 있는 알고리즘에 국한되어 있어 메모리의 모델이 변하게 되면 다시 메모리 모델에 따라 BIST IP를 설계해야 하는 번거로움이 있었다. 하지만 본 논문에서는 사용자가 원하는 메모리 모델에 따라 알고리즘을 적용해 자동으로 BIST IP를 생성해 주는 툴을 개발하였다. 내장된 메모리로는 리프레쉬가 필요 없는 다중-포트 비동기식 SRAM이 가장 많이 사용되며, 본 연구에서는 이중-포트 SRAM에 대하여 연구 하였다.

OpenCL을 활용한 이기종 파이프라인 컴퓨팅 기반 Spark 프레임워크 (Spark Framework Based on a Heterogenous Pipeline Computing with OpenCL)

  • 김대희;박능수
    • 전기학회논문지
    • /
    • 제67권2호
    • /
    • pp.270-276
    • /
    • 2018
  • Apache Spark is one of the high performance in-memory computing frameworks for big-data processing. Recently, to improve the performance, general-purpose computing on graphics processing unit(GPGPU) is adapted to Apache Spark framework. Previous Spark-GPGPU frameworks focus on overcoming the difficulty of an implementation resulting from the difference between the computation environment of GPGPU and Spark framework. In this paper, we propose a Spark framework based on a heterogenous pipeline computing with OpenCL to further improve the performance. The proposed framework overlaps the Java-to-Native memory copies of CPU with CPU-GPU communications(DMA) and GPU kernel computations to hide the CPU idle time. Also, CPU-GPU communication buffers are implemented with switching dual buffers, which reduce the mapped memory region resulting in decreasing memory mapping overhead. Experimental results showed that the proposed Spark framework based on a heterogenous pipeline computing with OpenCL had up to 2.13 times faster than the previous Spark framework using OpenCL.

Design and Implementation of Memory-Centric Computing System for Big Data Analysis

  • Jung, Byung-Kwon
    • 한국컴퓨터정보학회논문지
    • /
    • 제27권7호
    • /
    • pp.1-7
    • /
    • 2022
  • 최근 대용량 데이터를 프로그램 자체에서 생성시키면서 구동되는 빅데이터 프로그램, 머신 러닝 프로그램 같은 응용 프로그램의 사용이 일상화됨에 따라 기존의 메인 메모리만으로는 메모리가 부족하여 프로그램의 빠른 실행이 어려운 경우가 발생하고 있다. 특히, 코로나 변이 바이러스 발생으로 염기서열 전체의 유전 변이 여부를 분석해야 하는 상황에는 더욱 빠르게 결과를 도출해야 하는 필요성이 대두되었다. 대용량 데이터를 병렬실행으로 빠른 결과를 필요로 하는 전장유전체(WGS; Whole Genome Sequencing) 분석 방법에 기존 SSD에서 대용량 데이터를 처리하는 것이 아닌 자체 개발한 메모리풀 MOCA host adapter가 장착된 컴퓨팅 시스템에 적용하여 성능을 측정한 결과 기존 SSD 시스템에 비해 16%의 성능 향상이 있었다. 그리고, 그 외의 다양한 벤치마크 시험에서도 워크플로우의 task별 SortSampleBam, ApplyBQSR, GatherBamFiles등 메모리풀 MOCA host adapter가 장착된 컴퓨팅 시스템에서도 SSD를 사용한 경우보다 IO 성능이 각각 92.8%, 80.6%, 32.8% 실행시간 단축을 보였다. 전장유전체파이프라인 분석같이 대용량 데이터 분석시 본 연구에서 개발한 메모리풀 MOCA host adapter가 장착된 컴퓨팅 시스템에서 분석할 경우 런타임(run time)시 발생하는 측정 지연을 줄일 수 있을 것으로 판단된다.