• Title/Summary/Keyword: 캐시 일관성

Search Result 53, Processing Time 0.014 seconds

A Development of Fusion Processor Architecture for Efficient Main Memory Access in CPU-GPU Environment (CPU-GPU환경에서 효율적인 메인메모리 접근을 위한 융합 프로세서 구조 개발)

  • Park, Hyun-Moon;Kwon, Jin-San;Hwang, Tae-Ho;Kim, Dong-Sun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.11 no.2
    • /
    • pp.151-158
    • /
    • 2016
  • The HSA resolves an old problem with existing CPU and GPU architectures by allowing both units to directly access each other's memory pools via unified virtual memory. In a physically realized system, however, frequent data exchanges between CPU and GPU for a virtual memory block result bottlenecks and coherence request overheads. In this paper, we propose Fusion Processor Architecture for efficient access of main memory from both CPU and GPU. It consists of Job Manager, Re-mapper, and Pre-fetcher to control, organize, and distribute work loads and working areas for GPU cores. These components help on reducing memory exchanges between the two processors and improving overall efficiency by eliminating faulty page table requests. To verify proposed algorithm architectures, we develop an emulator based on QEMU, and compare several architectures such as CUDA(Compute Unified Device Architecture), OpenMP, OpenCL. As a result, Proposed fusion processor architectures show 198% faster than others by removing unnecessary memory copies and cache-miss overheads.

A Performance Study on CPU-GPU Data Transfers of Unified Memory Device (통합메모리 장치에서 CPU-GPU 데이터 전송성능 연구)

  • Kwon, Oh-Kyoung;Gu, Gibeom
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.5
    • /
    • pp.133-138
    • /
    • 2022
  • Recently, as GPU performance has improved in HPC and artificial intelligence, its use is becoming more common, but GPU programming is still a big obstacle in terms of productivity. In particular, due to the difficulty of managing host memory and GPU memory separately, research is being actively conducted in terms of convenience and performance, and various CPU-GPU memory transfer programming methods are suggested. Meanwhile, recently many SoC (System on a Chip) products such as Apple M1 and NVIDIA Tegra that bundle CPU, GPU, and integrated memory into one large silicon package are emerging. In this study, data between CPU and GPU devices are used in such an integrated memory device and performance-related research is conducted during transmission. It shows different characteristics from the existing environment in which the host memory and GPU memory in the CPU are separated. Here, we want to compare performance by CPU-GPU data transmission method in NVIDIA SoC chips, which are integrated memory devices, and NVIDIA SMX-based V100 GPU devices. For the experimental workload for performance comparison, a two-dimensional matrix transposition example frequently used in HPC applications was used. We analyzed the following performance factors: the difference in GPU kernel performance according to the CPU-GPU memory transfer method for each GPU device, the transfer performance difference between page-locked memory and pageable memory, overall performance comparison, and performance comparison by workload size. Through this experiment, it was confirmed that the NVIDIA Xavier can maximize the benefits of integrated memory in the SoC chip by supporting I/O cache consistency.

Implementation of a DB-Based Virtual File System for Lightweight IoT Clouds (경량 사물 인터넷 클라우드를 위한 DB 기반 가상 파일 시스템 구현)

  • Lee, Hyung-Bong;Kwon, Ki-Hyeon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.10
    • /
    • pp.311-322
    • /
    • 2014
  • IoT(Internet of Things) is a concept of connected internet pursuing direct access to devices or sensors in fused environment of personal, industrial and public area. In IoT environment, it is possible to access realtime data, and the data format and topology of devices are diverse. Also, there are bidirectional communications between users and devices to control actuators in IoT. In this point, IoT is different from the conventional internet in which data are produced by human desktops and gathered in server systems by way of one-sided simple internet communications. For the cloud or portal service of IoT, there should be a file management framework supporting systematic naming service and unified data access interface encompassing the variety of IoT things. This paper implements a DB-based virtual file system maintaining attributes of IoT things in a UNIX-styled file system view. Users who logged in the virtual shell are able to explore IoT things by navigating the virtual file system, and able to access IoT things directly via UNIX-styled file I O APIs. The implemented virtual file system is lightweight and flexible because it maintains only directory structure and descriptors for the distributed IoT things. The result of a test for the virtual shell primitives such as mkdir() or chdir() shows the smooth functionality of the virtual file system, Also, the exploring performance of the file system is better than that of Window file system in case of adopting a simple directory cache mechanism.