• Title/Summary/Keyword: memory I/O

Search Result 374, Processing Time 0.034 seconds

APC: An Adaptive Page Prefetching Control Scheme in Virtual Memory System (APC: 가상 메모리 시스템에서 적응적 페이지 선반입 제어 기법)

  • Ahn, Woo-Hyun;Yang, Jong-Cheol;Oh, Jae-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.3
    • /
    • pp.172-183
    • /
    • 2010
  • Virtual memory systems (VM) reduce disk I/Os caused by page faults using page prefetching, which reads pages together with a desired page at a page fault in a single disk I/O. Operating systems including 4.4BSD attempt to prefetch as many pages as possible at a page fault regardless of page access patterns of applications. However, such an approach increases a disk access time taken to service a page fault when a high portion of the prefetched pages is not referenced. More seriously, the approach can cause the memory pollution, a problem that prefetched pages not to be accessed evict another pages that will be referenced soon. To solve these problems, we propose an adaptive page prefetching control scheme (APC), which periodically monitors access patterns of prefetched pages in a process unit. Such a pattern is represented as the ratio of referenced pages among prefetched ones before they are evicted from memory. Then APC uses the ratio to adjust the number of pages that 4.4BSD VM intends to prefetch at a page fault. Thus APC allows 4.4BSD VM to prefetch a proper number of pages to have a better effect on reducing disk I/Os, though page access patterns of an application vary in runtime. The experiment of our technique implemented in FreeBSD 6.2 shows that APC improves the execution times of SOR, SMM, and FFT benchmarks over 4.4BSD VM by up to 57%.

Multiple ASR for efficient defense against brute force attacks (무차별 공격에 효과적인 다중 Address Space Randomization 방어 기법)

  • Park, Soo-Hyun;Kim, Sun-Il
    • The KIPS Transactions:PartC
    • /
    • v.18C no.2
    • /
    • pp.89-96
    • /
    • 2011
  • ASR is an excellent program security technique that protects various data memory areas without run-time overhead. ASR hides the addresses of variables from attackers by reordering variables within a data memory area; however, it can be broken by brute force attacks because of a limited data memory space. In this paper, we propose Multiple ASR to overcome the limitation of previous ASR approaches. Multiple ASR separates a data memory area into original and duplicated areas, and compares variables in each memory area to detect an attack. In original and duplicated data memory areas variables are arranged in the opposite order. This makes it impossible to overwrite the same variables in the different data areas in a single attack. Although programs with Multiple ASR show a relatively high run-time overhead due to duplicated execution, programs with many I/O operations such as web servers, a favorite attack target, show 40~50% overhead. In this paper we develop and test a tool that transforms a program into one with Multiple ASR applied.

MultiRing An Efficient Hardware Accelerator for Design Rule Checking (멀티링 설계규칙검사를 위한 효과적인 하드웨어 가속기)

  • 노길수;경종민
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.6
    • /
    • pp.1040-1048
    • /
    • 1987
  • We propose a hardware architecture called Multiring which is applicable for various geometrical operations on rectilinear objects such as design rule checking in VLSI layout and many image processing operations including noise suppression and coutour extraction. It has both a fast execution speed and extremely high flexibility. The whole architecture is mainly divided into four parts` I/O between host and Multiring, ring memory, linear processor array and instruction decoder. Data transmission between host and Multiring is bit serial thereby reducing the bandwidth requirement for teh channel and the number of external pins, while each row data in the bit map stored in ring memory is processed in the corresponding processor in full parallelism. Each processor is simultaneously configured by the instruction decoder/controller to perform one of the 16 basic instructions such as Boolean (AND, OR, NOT, and Copy), geometrical(Expand and Shrink), and I/O operations each ring cycle, which gives Multiring maximal flexibility in terms of design rule change or the instruction set enhancement. Correct functional behavior of Multiring was confirmed by successfully running a software simulator having one-to-one structural correspondence to the Multiring hardware.

  • PDF

Development of Simulation Tool SMPLE and Its Application to Performance Analysis of Multiprocessor Systems (시뮬레이션 도구 SMPLE의 개발 및 멀티프로세서 시스템 성능 분석에의 활용)

  • 조성만
    • Journal of the Korea Society for Simulation
    • /
    • v.1 no.1
    • /
    • pp.87-102
    • /
    • 1992
  • This paper presents the development of event-driven system level simulation tool SMPLE(Smpl Extende, an extention fo smpl) and its application to the performance analysis of multiprocessor computer systems. Because of its data structure, it is very difficult to change, expand or add new functions to simulation language smpl implemented by MacDougall. In SMPLE, we change data structure with structure and pointer, add new functions, and enable dynamic memory management. Using new data structure, facilities, and functions added in SMPLE, we simulate job processing of a shared bus multiprocessor system with autonomous hierarchical I/O subsystem. We set system performance contribution of subsystems and units. The impact of disk I/O on system performance is evaluated under vairous conditions of number of processors, processing power, memory access time and disk seek time.

  • PDF

A Performance Study on CPU-GPU Data Transfers of Unified Memory Device (통합메모리 장치에서 CPU-GPU 데이터 전송성능 연구)

  • Kwon, Oh-Kyoung;Gu, Gibeom
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.5
    • /
    • pp.133-138
    • /
    • 2022
  • Recently, as GPU performance has improved in HPC and artificial intelligence, its use is becoming more common, but GPU programming is still a big obstacle in terms of productivity. In particular, due to the difficulty of managing host memory and GPU memory separately, research is being actively conducted in terms of convenience and performance, and various CPU-GPU memory transfer programming methods are suggested. Meanwhile, recently many SoC (System on a Chip) products such as Apple M1 and NVIDIA Tegra that bundle CPU, GPU, and integrated memory into one large silicon package are emerging. In this study, data between CPU and GPU devices are used in such an integrated memory device and performance-related research is conducted during transmission. It shows different characteristics from the existing environment in which the host memory and GPU memory in the CPU are separated. Here, we want to compare performance by CPU-GPU data transmission method in NVIDIA SoC chips, which are integrated memory devices, and NVIDIA SMX-based V100 GPU devices. For the experimental workload for performance comparison, a two-dimensional matrix transposition example frequently used in HPC applications was used. We analyzed the following performance factors: the difference in GPU kernel performance according to the CPU-GPU memory transfer method for each GPU device, the transfer performance difference between page-locked memory and pageable memory, overall performance comparison, and performance comparison by workload size. Through this experiment, it was confirmed that the NVIDIA Xavier can maximize the benefits of integrated memory in the SoC chip by supporting I/O cache consistency.

Caching and Prefetching Policies Using Program Page Reference Patterns on a File System Layer for NAND Flash Memory (NAND 플래시 메모리용 파일 시스템 계층에서 프로그램의 페이지 참조 패턴을 고려한 캐싱 및 선반입 정책)

  • Kim, Gyeong-San;Kim, Seong-Jo
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.777-778
    • /
    • 2006
  • In this thesis, we design and implement a Flash Cache Core Module (FCCM) which operates on the YAFFS NAND flash memory. The FCCM applies memory replacement policy and prefetching policy based on the page reference pattern of applications. Also, implement the Clean-First memory replacement technique considering the characteristics of flash memory. In this method the decision is made according to page hit to apply prefetched waiting area. The FCCM decrease I/O hit frequency up to 37%, Compared with the linux cache and prefetching policy. Also, it operated using less memory for prefetching(maximum 24% and average 16%) compared with the linux kernel.

  • PDF

Memory-mapped I/O Implication of Virtual Machine in Cloud System (클라우드 환경에서 가상 머신의 효율적인 호스트 메모리 사용을 위한 메모리 사상 기법)

  • Song, Nae Young;Choe, Chan-Ho;Eom, Hyeonsang;Yeom, Heon Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.264-267
    • /
    • 2012
  • 늘어가는 자원의 사용과 데이터의 양에 따라 클라우드 시스템의 사용이 대두되고 있는 가운데, 클라우드 환경에서 가상 머신을 효율적으로 사용하는 방법에 대한 많은 기법이 제시되고 있다. 이 중 하나가 호스트 머신 메모리 사용의 오버커밋먼트를 방지하는 것인데 가상 머신들끼리 되도록 같은 파일을 메모리 사상해서 사용하자는 것이다. 이 때 사용한 mmap() 함수는 스토리지 스택을 사용하지 않고 I/O를 할 수 있는 등의 장점을 가지고 있지만 확장성이 떨어진다는 단점이 있다. 본 논문에서는 가상머신들이 mmap()을 사용해서 호스트 메모리를 접근할 때 발생하는 문제점을 짚어보고 이것을 해결할 수 있도록 새로운 mmap() I/O path를 제안한다. 개선된 mmap() I/O path는 수행시간을 40% 가량 향상 시켰다.

An Adaptive Polling Selection Technique for Ultra-Low Latency Storage Systems (초저지연 저장장치를 위한 적응형 폴링 선택 기법)

  • Chun, Myoungjun;Kim, Yoona;Kim, Jihong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.2
    • /
    • pp.63-69
    • /
    • 2019
  • Recently, ultra-low latency flash storage devices such as Z-SSD and Optane SSD were introduced with the significant technological improvement in the storage devices which provide much faster response time than today's other NVMe SSDs. With such ultra-low latency, $10{\mu}s$, storage devices the cost of context switch could be an overhead during interrupt-driven I/O completion process. As an interrupt-driven I/O completion process could bring an interrupt handling overhead, polling or hybrid-polling for the I/O completion is known to perform better. In this paper, we analyze tail latency problem in a polling process caused by process scheduling in data center environment where multiple applications run simultaneously under one system and we introduce our adaptive polling selection technique which dynamically selects efficient processing method between two techniques according to the system's conditions.

Dynamic Bandwidth Distribution Method for High Performance Non-volatile Memory in Cloud Computing Environment (클라우드 환경에서 고성능 저장장치를 위한 동적 대역폭 분배 기법)

  • Kwon, Piljin;Ahn, Sungyong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.3
    • /
    • pp.97-103
    • /
    • 2020
  • Linux Cgroups takes a fundamental role for sharing system resources among multiple containers on container-based cloud computing environment. Especially for I/O resource, Linux Cgroups supports a mechanism for sharing I/O bandwidth in proportion to I/O weight. However, the current mechanism of Linux Cgroups using BFQ I/O scheduler seriously degrades the I/O performance with high bandwidth storage device such as NVMe SSDs. In this paper, we proposed a new feedback based I/O bandwidth sharing scheme for Linux Cgroups which allocates I/O credits to containers according to I/O weights and adjusts the amount of credits to performance fluctuation of NVMe SSDs. The proposed scheme is implemented on Linux kernel 5.3 and evaluated. The evaluation results show that it can share the I/O bandwidth among multiple containers proportionally to I/O weights while improving I/O performance more than twice as high as the existing scheme.

Development of FPGA-based Programmable Timing Controller

  • Cho, Soung-Moon;Jeon, Jae-Wook
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1016-1021
    • /
    • 2003
  • The overall size of electronic product is becoming small according to development of technology. Accordingly it is difficult to inspect these small components by human eyes. So, an automation system for inspecting them has been used. The existing system put microprocessor or Programmable Logic Controller (PLC) use. The structure of microprocessor-based controller and PLC use basically composed of memory devices such as ROM, RAM and I/O ports. Accordingly, the system is not only becomes complicated and enlarged but also higher price. In this paper, we implement FPGA-based One-chip Programmable Timing Controller for Inspecting Small components to resolve above problems and design the high performance controller by using VHDL. With fast development, the FPGA of high capacity that can have memory and PLL have been introduced. By using the high-capacity FPGA, the peripherals of the existent controller, such as memory, I/O ports can be implemented in one FPGA. By doing this, because the complicated system can be simplified, the noise and power dissipation problems can be minimized and it can have the advantage in price. Since the proposed controller is organized to have internal register, counter, and software routines for generating timing signals, users do not have to problem the details about timing signals and need to only send some values about an inspection system through an RS232C port. By selecting theses values appropriate for a given inspection system, desired timing signals can be generated.

  • PDF