Search | Korea Science

GPU Memory Management Technique to Improve the Performance of GPGPU Task of Virtual Machines in RPC-Based GPU Virtualization Environments (RPC 기반 GPU 가상화 환경에서 가상머신의 GPGPU 작업 성능 향상을 위한 GPU 메모리 관리 기법)

Kang, Jihun
- KIPS Transactions on Computer and Communication Systems
- /
- v.10 no.5
- /
- pp.123-136
- /
- 2021
RPC (Remote Procedure Call)-based Graphics Processing Unit (GPU) virtualization technology is one of the technologies for sharing GPUs with multiple user virtual machines. However, in a cloud environment, unlike CPU or memory, general GPUs do not provide a resource isolation technology that can limit the resource usage of virtual machines. In particular, in an RPC-based virtualization environment, since GPU tasks executed in each virtual machine are performed in the form of multi-process, the lack of resource isolation technology causes performance degradation due to resource competition. In addition, the GPU memory competition accelerates the performance degradation as the resource demand of the virtual machines increases, and the fairness decreases because it cannot guarantee equal performance between virtual machines. This paper, in the RPC-based GPU virtualization environment, analyzes the performance degradation problem caused by resource contention when the GPU memory requirement of virtual machines exceeds the available GPU memory capacity and proposes a GPU memory management technique to solve this problem. Also, experiments show that the GPU memory management technique proposed in this paper can improve the performance of GPGPU tasks.
https://doi.org/10.3745/KTCCS.2021.10.5.123 인용 PDF KSCI

Design of Shared Memory Controller Device Driver in Embedded System (임베디드 시스템에서의 공유 메모리 컨트롤러 디바이스 드라이버 설계)

Moon, Ji-Hoon;Oh, Jae-Chul
- The Journal of the Korea institute of electronic communication sciences
- /
- v.9 no.6
- /
- pp.703-709
- /
- 2014
In the AMP(Asymmetric Multiprocessing) based dual core using core-specific operating system in a single processor system, shared memory method is used to send data between processors in dual core. To used shared memory in different operating systems, there is a problem of needing to solving the issue of message communication and synchronization between the two operations systems. In this paper, separate memory controller was used for data sharing between different processor cores in dual core environment. This controller can designate two slave ports to allow simultaneous access from two processors, and in the case of process data simultaneously by two processors, priority order of slave ports is determined through memory mediator. When sending data from A to B processor, SRAM area was logically separated into 8 pages. It allowed using memory area from multiple processes with the size of 4KByte per page, and control register with the size of 4Byte was used to discern the usability of current page.
https://doi.org/10.13067/JKIECS.2014.9.6.703 인용 PDF KSCI

THE INFLUENCE OF THE TIME SLICING OF A PROCESSOR SHARING COMMUNICATION MODEL

LIM JONG SEUL;PARK CHIN HONG;AHN SEONG JOON
- Journal of applied mathematics & informatics
- /
- v.17 no.1_2_3
- /
- pp.737-746
- /
- 2005
Average memory occupancy and congestion in computer system or communication system may be reduced further if new jobs are admitted only when the number of jobs queued at CPU is below a certain threshold, run queue cutoff (RQ). In our previous paper we showed that response time of a job is invariant with respect to RQ if jobs do not communicate each other. In this paper, we prove that the invariance property by considering the evolution of the queue lengths as point processes. We also present an approximate method for the delay due to context switching under time slicing.

Flow Labeling Method for Realtime Detection of Heavy Traffic Sources (대량 트래픽 전송자의 실시간 탐지를 위한 플로우 라벨링 방법)

Lee, KyungHee;Nyang, DaeHun
- KIPS Transactions on Computer and Communication Systems
- /
- v.2 no.10
- /
- pp.421-426
- /
- 2013
As a greater amount of traffic have been generated on the Internet, it becomes more important to know the size of each flow. Many research studies have been conducted on the traffic measurement, and mostly they have focused on how to increase the measurement accuracy with a limited amount of memory. In this paper, we propose an explicit flow labeling technique that can be used to find out the names of the top flows and to increase the counting upper bound of the existing scheme. The labeling technique is applied to CSM (Counter Sharing Method), the most recent traffic measurement algorithm, and the performance is evaluated using the CAIDA dataset.
https://doi.org/10.3745/KTCCS.2013.2.10.421 인용 PDF KSCI

Design of High-speed Pointer Switching Fabric (초고속 포인터 스위칭 패브릭의 설계)

Ryu, Kyoung-Sook;Choe, Byeong-Seog
- Journal of Internet Computing and Services
- /
- v.8 no.5
- /
- pp.161-170
- /
- 2007
The proposed switch which has separated data plane and switching plane can make parallel processing for packet data storing, memory address pointer switching and simultaneously can be capable of switching the variable length for IP packets. The proposed architecture does not require the complicated arbitration algorithms in VOQ, also is designed for QoS of generic output queue switch as well as input queue. At the result of simulations, the proposed architecture has less average packet delay than the one of the memory-sharing based architecture and guarantees keeping a certain average packet delay in increasing switch size.
PDF

Buffer Invalidation Schemes for High Performance Transaction Processing in Shared Database Environment (공유 데이터베이스 환경에서 고성능 트랜잭션 처리를 위한 버퍼 무효화 기법)

김신희;배정미;강병욱
- The Journal of Information Systems
- /
- v.6 no.1
- /
- pp.159-180
- /
- 1997
Database sharing system(DBSS) refers to a system for high performance transaction processing. In DBSS, the processing nodes are locally coupled via a high speed network and share a common database at the disk level. Each node has a local memory, a separate copy of operating system, and a DBMS. To reduce the number of disk accesses, the node caches database pages in its local memory buffer. However, since multiple nodes may be simultaneously cached a page, cache consistency must be ensured so that every node can always access the latest version of pages. In this paper, we propose efficient buffer invalidation schemes in DBSS, where the database is logically partitioned using primary copy authority to reduce locking overhead. The proposed schemes can improve performance by reducing the disk access overhead and the message overhead due to maintaining cache consistency. Furthermore, they can show good performance when database workloads are varied dynamically.
PDF

A Study on the Block Lookup and Replacement in Global Memory (전역적 메모리에서의 블록 룩업과 재배치에 관한 연구)

이영섭;김은경;정병수
- Proceedings of the IEEK Conference
- /
- 2000.11c
- /
- pp.51-54
- /
- 2000
Due to the emerging of high-speed network, lots of interests of access to remote data have increased. Those interests motivate using of Cooperative Caching that uses remote cache like local cache by sharing other clients' cache. The conventional algorithm like GMS(Global Memory Service) has some disadvantages that occurred bottleneck and decreasing performance because of exchanges of many messages to server or manager. On the other hand, Hint-based algorithm resolves a GMS's server bottleneck as each client has hint information of all blocks. But Hint-based algorithm also causes some problems such as inaccurate information in it, if it has too old hint information. In this paper, we offer the policy that supplement bottleneck and inaccuracy； by using file identifier that can search for the lookup table and by exchanging oldest block information between each client periodically.
PDF

Ultrasound Synthetic Aperture Beamformer Architecture Based on the Simultaneous Multi-scanning Approach (동시 다중 주사 방식의 초음파 합성구경 빔포머 구조)

Lee, Yu-Hwa;Kim, Seung-Soo;Ahn, Young-Bok;Song, Tai-Kyong
- Journal of Biomedical Engineering Research
- /
- v.28 no.6
- /
- pp.803-810
- /
- 2007
Although synthetic aperture focusing techniques can improve the spatial resolution of ultrasound imaging, they have not been employed in a commercial product because they require a real-time N-channel beamformer with a tremendously increased hardware complexity for simultaneous beamforming along M multiple lines. In this paper, a hardware-efficient beamformer architecture for synthetic aperture focusing is presented. In contrast to the straightforward design using NM delay calculators, the proposed method utilizes only M delay calculators by sharing the same values among the focusing delays which should be calculated at the same time between the N channels for all imaging points along the M scan lines. In general, synthetic aperture beamforming requires M 2-port memories. In the proposed beamformer, the input data for each channel is first upsampled with a 4-fold interpolator and each polyphase component of the interpolator output is stored into a 2-port memory separately, requiring 4M 2-port memories for each channel. By properly limiting the area formed with the synthetic aperture focusing, the input memory buffer can be implemented with only 4 2-port memories and one short multi-port memory.
https://doi.org/10.9718/JBER.2007.28.6.803 인용 PDF KSCI

Design of Lightweight RTOS for MCU (MCU를 위한 경량화된 RTOS 설계)

Bak, Chang-Gyu
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.15 no.6
- /
- pp.1301-1306
- /
- 2011
RTOS in the embedded system is a powerful tool for the design of multi-tasking. However, previous RTOS has large proportion in the MCU with limited memory. So it is difficult to apply RTOS. In this paper, I removed less frequently used features from the traditional RTOS, and designed lightweight RTOS that schedules and manages the resources with minimal code. I used techniques to obtain user memory using sharing stack, and to reduce the overhead at context. Considering ratio of kernel and applications, the RTOS designed in this paper is available on the MCU with more than 4KB of program memory.
https://doi.org/10.6109/jkiice.2011.15.6.1301 인용 PDF KSCI

Design and Fabrication of Low Power Sensor Network Platform for Ubiquitous Health Care

Lee, Young-Dong;Jeong, Do-Un;Chung, Wan-Young
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.1826-1829
- /
- 2005
Recent advancement in wireless communications and electronics has enabled the development of low power sensor network. Wireless sensor network are often used in remote monitoring control applications, health care, security and environmental monitoring. Wireless sensor networks are an emerging technology consisting of small, low-power, and low-cost devices that integrate limited computation, sensing, and radio communication capabilities. Sensor network platform for health care has been designed, fabricated and tested. This system consists of an embedded micro-controller, Radio Frequency (RF) transceiver, power management, I/O expansion, and serial communication (RS-232). The hardware platform uses Atmel ATmega128L 8-bit ultra low power RISC processor with 128KB flash memory as the program memory and 4KB SRAM as the data memory. The radio transceiver (Chipcon CC1000) operates in the ISM band at 433MHz or 916MHz with a maximum data rate of 76.8kbps. Also, the indoor radio range is approximately 20-30m. When many sensors have to communicate with the controller, standard communication interfaces such as Serial Peripheral Interface (SPI) or Integrated Circuit ($I^{2}C$) allow sharing a single communication bus. With its low power, the smallest and low cost design, the wireless sensor network system and wireless sensing electronics to collect health-related information of human vitality and main physiological parameters (ECG, Temperature, Perspiration, Blood Pressure and some more vitality parameters, etc.)
PDF

Search Result 172, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)