Search | Korea Science

Performance Analysis of NVMe SSDs and Design of Direct Access Engine on Virtualized Environment (가상화 환경에서 NVMe SSD 성능 분석 및 직접 접근 엔진 개발)

Kim, Sewoog;Choi, Jongmoo
- KIISE Transactions on Computing Practices
- /
- v.24 no.3
- /
- pp.129-137
- /
- 2018
NVMe(Non-Volatile Memory Express) SSD(Solid State Drive) is a high-performance storage that makes use of flash memory as a storage cell, PCIe as an interface and NVMe as a protocol on the interface. It supports multiple I/O queues which makes it feasible to process parallel-I/Os on multi-core environments and to provide higher bandwidth than SATA SSDs. Hence, NVMe SSD is considered as a next generation-storage for data-center and cloud computing system. However, in the virtualization system, the performance of NVMe SSD is not fully utilized due to the bottleneck of the software I/O stack. Especially, when it uses I/O stack of the hypervisor or the host operating system like Xen and KVM, I/O performance degrades seriously due to doubled-I/O stack between host and virtual machine. In this paper, we propose a new I/O engine, called Direct-AIO (Direct-Asynchronous I/O) engine, that can access NVMe SSD directly for I/O performance improvements on QEMU emulator. We develop our proposed I/O engine and analyze I/O performance differences between the existed I/O engine and Direct-AIO engine.
https://doi.org/10.5626/KTCP.2018.24.3.129 인용 KSCI

Design and Implementations for Network Asynchronous I/O for Linux kernel 2.6 (리눅스 커널 2.6을 위한 Network Asynchronous I/O의 설계와 구현)

Lim, Eun-Ji;Kim, Chei-Yul;Cha, Gyu-Il;Ahn, Baik-Song;Jung, Sung-In
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.10a
- /
- pp.356-361
- /
- 2006
수많은 동시 사용자를 처리해야 하는 인터넷 서버에서는 다수의 연결을 효율적으로 처리 하는 것이 중요한 문제이다. 기존의 멀티 쓰레드 방식이나 이벤트 드리븐 방식이 가지는 한계를 극복하기 위한 한 가지 대안으로서 네트워크 비동기 입출력 방식을 들 수 있다. 네트워크 비동기 입출력을 요청 한 후에 완료될 때까지 블로킹 되지 않고 즉시 다른 작업을 진행할 수 있는 방식으로서, 하나의 쓰레드에서 다중 연결을 효율적으로 처리할 수 있게 한다. 본 논문에서는 리눅스 커널에 네크워크 비동기 입출력을 구현하고 실험을 통한 성능 분석을 수행하였다.
PDF

Design of an Asynchronous eFuse One-Time Programmable Memory IP of 1 Kilo Bits Based on a Logic Process (Logic 공정 기반의 비동기식 1Kb eFuse OTP 메모리 IP 설계)

Lee, Jae-Hyung;Kang, Min-Cheol;Jin, Liyan;Jang, Ji-Hye;Ha, Pan-Bong;Kim, Young-Hee
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.13 no.7
- /
- pp.1371-1378
- /
- 2009
We propose a low-power eFuse one-time programmable (OTP) memory cell based on a logic process. The eFuse OTP memory cell uses separate transistors optimized at program and read mode, and reduces an operation current at read mode by reducing parasitic capacitances existing at both WL and BL. Asynchronous interface, separate I/O, BL SA circuit of digital sensing method are used for a low-power and small-area eFuse OTP memory IP. It is shown by a computer simulation that operation currents at a logic power supply voltage of VDD and at I/O interface power supply voltage of VIO are 349.5${\mu}$A and 3.3${\mu}$A, respectively. The layout size of the designed eFuse OTP memory IP with Dongbu HiTek's 0.18${\mu}$m generic process is 300 ${\times}$557${\mu}m^2$.
https://doi.org/10.6109/JKIICE.2009.13.7.1371 인용 PDF KSCI

Asynchronous plural I/O index scan using flash SSD (플래시 SSD를 활용한 비동기 복수 I/O 인덱스 스캔)

Park, Ji-Young;Kang, Woon-Hak;Lee, Sang-Won
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.11a
- /
- pp.1389-1391
- /
- 2012
인덱스는 데이터 검색을 빠르게 하기 위하여 사용되며, 많은 데이터를 저장하는 대용량 데이터베이스 시스템은 B+-tree 인덱스를 주로 사용한다. B-tree 인덱스를 사용하여 범위 검색을 수행하는 경우 레코드 각각에 대하여 I/O를 요청함으로써 프로세스가 자주 대기(waiting) 상태가 되어 많은 오버헤드가 발생하였다. 이러한 문제를 해결하고자 본 논문에서 비동기 복수 I/O 인덱스 스캔방법을 제안한다. 비동기 복수 I/O 인덱스 스캔이 최고 6.5배 빠른 성능을 보였다.
https://doi.org/10.3745/PKIPS.y2012m11a.1389 인용 PDF

I/O Optimization Strategies for a GPU-based Graph Engine with High-Performance Storage (고성능 스토리지를 갖는 GPU 기반 그래프 분석 엔진을 위한 I/O 최적화 전략)

Jeong-Min Park;Myung-Hwan Jang;Sang-Wook Kim
- Proceedings of the Korea Information Processing Society Conference
- /
- 2023.05a
- /
- pp.386-388
- /
- 2023
본 논문은 고성능 스토리지를 사용하는 환경에서 대규모 그래프를 분석을 위한 GPU 기반 그래프 분석 엔진의 I/O 최적화 전략을 제안한다. 사전 실험을 통해 최신 GPU 기반 그래프 엔진인 RealGraph^GPU 가 고성능 스토리지의 대역폭을 충분히 활용하지 못하고 있음을 발견하였다. 이를 개선하기 위해 (1) User-space I/O, (2) Asynchronous I/O 두 가지 최적화 전략을 적용하였으며, 실험을 통해 두 전략이 RealGraphGPU 의 그래프 분석 성능 향상시키는 데 효과적임을 확인하였다.
https://doi.org/10.3745/PKIPS.y2023m05a.386 인용 PDF

A design on low-power and small-area EEPROM for UHF RFID tag chips (UHF RFID 태그 칩용 저전력, 저면적 비동기식 EEPROM 설계)

Baek, Seung-Myun;Lee, Jae-Hyung;Song, Sung-Young;Kim, Jong-Hee;Park, Mu-Hun;Ha, Pan-Bong;Kim, Young-Hee
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.12
- /
- pp.2366-2373
- /
- 2007
In this paper, a low-power and small-area asynchronous 1 kilobit EEPROM for passive UHF RFID tag chips is designed with $0.18{\mu}m$ EEPROM cells. As small area solutions, command and address buffers are removed since we design asynchronous I/O interface and data output buffer is also removed by using separate I/O. To supply stably high voltages VPP and VPPL used in the cell array from low voltage VDD, Dickson charge pump is designed with schottky diodes instead of a PN junction diodes. On that account, we can decrease the number of stages of the charge pump, which can decrease layout area of charge pump. As a low-power solution, we can reduce write current by using the proposed VPPL power switching circuit which selects each needed voltage at either program or write mode. A test chip of asynchronous 1 kilobit EEPROM is fabricated, and its layout area is $554.8{\times}306.9{\mu}m2$., 11% smaller than its synchronous counterpart.
https://doi.org/10.6109/jkiice.2007.11.12.2366 인용 PDF KSCI

Efficient Prefetching and Asynchronous Writing for Flash Memory (플래시 메모리를 위한 효율적인 선반입과 비동기 쓰기 기법)

Park, Kwang-Hee;Kim, Deok-Hwan
- Journal of KIISE:Computing Practices and Letters
- /
- v.15 no.2
- /
- pp.77-88
- /
- 2009
According to the size of NAND flash memory as the storage system of mobile device becomes large, the performance of address translation and life cycle management in FTL (Flash Translation Layer) to interact with file system becomes very important. In this paper, we propose the continuity counters, which represent the number of continuous physical blocks whose logical addresses are consecutive, to reduce the number of address translation. Furthermore we propose the prefetching method which preloads frequently accessed pages into main memory to enhance I/O performance of flash memory. Besides, we use the 2-bit write prediction and asynchronous writing method to predict addresses repeatedly referenced from host and prevent from writing overhead. The experiments show that the proposed method improves the I/O performance and extends the life cycle of flash memory. As a result, proposed CFTL (Clustered Flash Translation Layer)'s performance of address translation is faster 20% than conventional FTLs. Furthermore, CFTL is reduced about 50% writing time than that of conventional FTLs.
PDF KSCI

Design of a Low-Power and Low-Area EEPROM IP of 256 Bits for an UHF RFID Tag Chip (UHF RFID 태그 칩용 저전력, 저면적 256b EEPROM IP 설계)

Kang, Min-Cheol;Lee, Jae-Hyung;Kim, Tae-Hoon;Jang, Ji-Hye;Ha, Pan-Bong;Kim, Young-Hee
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2009.05a
- /
- pp.671-674
- /
- 2009
We design a low-power and low-area asynchronous EEPROM of 256 bits used in a passive UHF RFID tag chip. For a low-power solution, we use a supply voltage of 1.8V and design a Dickson charge pump using N-type Schottky diodes with a low-voltage characteristic. And we use an asynchronous interface and a separate I/O method for a low-area solution of the peripheral circuit of the designed EEPROM. And we design a Dickson charge pump using N-type Schottky diodes to reduce an area of DC-DC converter. The layout area of the designed EEPROM of 256 bits with an array of 16 rows and 16 columns using $0.18{\mu}m$ EEPROM process is $311.66{\times}490.59{\mu}m^2$.
PDF

Addressing Concurrency Design for HealthCare Web Service Gateway in Remote Healthcare Monitoring System

Nkenyereye, Lionel;Jang, Jong-Wook
- International journal of advanced smart convergence
- /
- v.5 no.3
- /
- pp.32-39
- /
- 2016
With the help of a small wearable device, patients reside in an isolated village need constant monitoring which may increase access to care and decrease healthcare delivery cost. As the number of patients' requests increases in simultaneously manner, the web service gateway located in the village hall encounters limitations for performing them successfully and concurrently. The gateway based RESTful technology responsible for handling patients' requests attests an internet latency in case a large number of them submit toward the gateway increases. In this paper, we propose the design tasks of the web service gateway for handling concurrency events. In the procedure of designing tasks, concurrency is best understood by employing multiple levels of abstraction. The way that is eminently to accomplish concurrency is to build an object-oriented environment with support for messages passing between concurrent objects. We also investigate the performance of event-driven architecture for building web service gateway using node.js. The experiments results show that server-side JavaScript with Node.js and MongoDB as database is 40% faster than Apache Sling. With Node.js developers can build a high-performance, asynchronous, event-driven healthcare hub server to handle an increasing number of concurrent connections for Remote Healthcare Monitoring System in an isolated village with no access to local medical care.
https://doi.org/10.7236/IJASC.2016.5.3.32 인용 PDF KSCI

Design of a High-Level Synthesis System Supporting Asynchronous Interfaces (비동기 인터페이스를 지원하는 정원 수준 합성 시스템의 설계)

이형종;이종화;황선영
- Journal of the Korean Institute of Telematics and Electronics A
- /
- v.31A no.2
- /
- pp.116-124
- /
- 1994
This paper describes the design of a high-level synthesis system. ISyn: Interface Synthesis System for ISPS-A. which generates hardware satisfying timing constraints. The original version of ISPS is extended to be used for the description/capture of interface operations and timing constraints in the ISPS-A. To generate the schedule satisfying interface constraints the scheduling process is divided into two steps:pre-scheduling and post-scheduling. ISyn allocates hardware modules with I/O ports by the clique partitioning algorithm. Experimental results show that ISyn is capable of synthesizing hardware modules effectively for internal and/or interactive operations.
PDF

Search Result 15, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)