• Title/Summary/Keyword: memory contention

Search Result 32, Processing Time 0.025 seconds

QEMU/KVM Based In-Memory Block Cache Module for Virtualization Environment (가상화 환경을 위한 QEMU/KVM 기반의 인메모리 블록 캐시 모듈 구현)

  • Kim, TaeHoon;Song, KwangHyeok;No, JaeChun;Park, SungSoon
    • Journal of KIISE
    • /
    • v.44 no.10
    • /
    • pp.1005-1018
    • /
    • 2017
  • Recently, virtualization has become an essential component of cloud computing due to its various strengths, including maximizing server resource utilization, easy-to-maintain software, and enhanced data protection. However, since virtualization allows sharing physical resources among the VMs, the system performance can be deteriorated due to device contentions. In this paper, we first investigate the I/O overhead based on the number of VMs on the same server platform and analyze the block I/O process of the KVM hypervisor. We also propose an in-memory block cache mechanism, called QBic, to overcome I/O virtualization latency. QBic is capable of monitoring the block I/O process of the hypervisor and stores the data with a high access frequency in the cache. As a result, QBic provides a fast response for VMs and reduces the I/O contention to physical devices. Finally, we present a performance measurement of QBic to verify its effectiveness.

Design and Implementation of An I/O System for Irregular Application under Parallel System Environments (병렬 시스템 환경하에서 비정형 응용 프로그램을 위한 입출력 시스템의 설계 및 구현)

  • No, Jae-Chun;Park, Seong-Sun;;Gwon, O-Yeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.11
    • /
    • pp.1318-1332
    • /
    • 1999
  • 본 논문에서는 입출력 응용을 위해 collective I/O 기법을 기반으로 한 실행시간 시스템의 설계, 구현 그리고 그 성능평가를 기술한다. 여기서는 모든 프로세서가 동시에 I/O 요구에 따라 스케쥴링하며 I/O를 수행하는 collective I/O 방안과 프로세서들이 여러 그룹으로 묶이어, 다음 그룹이 데이터를 재배열하는 통신을 수행하는 동안 오직 한 그룹만이 동시에 I/O를 수행하는 pipelined collective I/O 등의 두 가지 설계방안을 살펴본다. Pipelined collective I/O의 전체 과정은 I/O 노드 충돌을 동적으로 줄이기 위해 파이프라인된다. 이상의 설계 부분에서는 동적으로 충돌 관리를 위한 지원을 제공한다. 본 논문에서는 다른 노드의 메모리 영역에 이미 존재하는 데이터를 재 사용하여 I/O 비용을 줄이기 위해 collective I/O 방안에서의 소프트웨어 캐슁 방안과 두 가지 모형에서의 chunking과 온라인 압축방안을 기술한다. 그리고 이상에서 기술한 방안들이 입출력을 위해 높은 성능을 보임을 기술하는데, 이 성능결과는 Intel Paragon과 ASCI/Red teraflops 기계 상에서 실험한 것이다. 그 결과 응용 레벨에서의 bandwidth는 peak point가 55%까지 측정되었다.Abstract In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two designs, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. In other words, the design provides support for dynamic contention management. Then we present a software caching method using collective I/O to reduce I/O cost by reusing data already present in the memory of other nodes. Finally, chunking and on-line compression mechanisms are included in both models. We demonstrate that we can obtain significantly high-performance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine. Application level I/O bandwidth up to 55% of the peak is observed.he peak is observed.

Framework-assisted Selective Page Protection for Improving Interactivity of Linux Based Mobile Devices (리눅스 기반 모바일 기기에서 사용자 응답성 향상을 위한 프레임워크 지원 선별적 페이지 보호 기법)

  • Kim, Seungjune;Kim, Jungho;Hong, Seongsoo
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1486-1494
    • /
    • 2015
  • While Linux-based mobile devices such as smartphones are increasingly used, they often exhibit poor response time. One of the factors that influence the user-perceived interactivity is the high page fault rate of interactive tasks. Pages owned by interactive tasks can be removed from the main memory due to the memory contention between interactive and background tasks. Since this increases the page fault rate of the interactive tasks, their executions tend to suffer from increased delays. This paper proposes a framework-assisted selective page protection mechanism for improving interactivity of Linux-based mobile devices. The framework-assisted selective page protection enables the run-time system to identify interactive tasks at the framework level and to deliver their IDs to the kernel. As a result, the kernel can maintain the pages owned by the identified interactive tasks and avoid the occurrences of page faults. The experimental results demonstrate the selective page protection technique reduces response time up to 11% by reducing the page fault rate by 37%.

A design of GPU container co-execution framework measuring interference among applications (GPU 컨테이너 동시 실행에 따른 응용의 간섭 측정 프레임워크 설계)

  • Kim, Sejin;Kim, Yoonhee
    • KNOM Review
    • /
    • v.23 no.1
    • /
    • pp.43-50
    • /
    • 2020
  • As General Purpose Graphics Processing Unit (GPGPU) recently plays an essential role in high-performance computing, several cloud service providers offer GPU service. Most cluster orchestration platforms in a cloud environment using containers allocate the integer number of GPU to jobs and do not allow a node shared with other jobs. In this case, resource utilization of a GPU node might be low if a job does not intensively require either many cores or large size of memory in GPU. GPU virtualization brings opportunities to realize kernel concurrency and share resources. However, performance may vary depending on characteristics of applications running concurrently and interference among them due to resource contention on a node. This paper proposes GPU container co-execution framework with multiple server creation and execution based on Kubernetes, container orchestration platform for measuring interference which may be occurred by sharing GPU resources. Performance changes according to scheduling policies were investigated by executing several jobs on GPU. The result shows that optimal scheduling is not possible only considering GPU memory and computing resource usage. Interference caused by co-execution among applications is measured using the framework.

A Comparative Performance Study for Compute Node Sharing

  • Park, Jeho;Lam, Shui F.
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.4
    • /
    • pp.287-293
    • /
    • 2012
  • We introduce a methodology for the study of the application-level performance of time-sharing parallel jobs on a set of compute nodes in high performance clusters and report our findings. We assume that parallel jobs arriving at a cluster need to share a set of nodes with the jobs of other users, in that they must compete for processor time in a time-sharing manner and other limited resources such as memory and I/O in a space-sharing manner. Under the assumption, we developed a methodology to simulate job arrivals to a set of compute nodes, and gather and process performance data to calculate the percentage slowdown of parallel jobs. Our goal through this study is to identify a better combination of jobs that minimize performance degradations due to resource sharing and contention. Through our experiments, we found a couple of interesting behaviors for overlapped parallel jobs, which may be used to suggest alternative job allocation schemes aiming to reduce slowdowns that will inevitably result due to resource sharing on a high performance computing cluster. We suggest three job allocation strategies based on our empirical results and propose further studies of the results using a supercomputing facility at the San Diego Supercomputing Center.

Video Retrieval System supporting Adaptive Streaming Service (적응형 스트리밍 서비스를 지원하는 비디오 검색 시스템)

  • 이윤채;전형수;장옥배
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.1
    • /
    • pp.1-12
    • /
    • 2003
  • Recently, many researches into distributed processing on Internet, and multimedia data processing have been performed. Rapid and convenient multimedia services supplied with high quality and high speed are to be needed. In this paper, we design and implement clip-based video retrieval system on the Web enviroment in real-time. Our system consists of the content-based indexing system supporting convenient services for video content providers, and the Web-based retrieval system in order to make it easy and various information retrieval for users in the Web. Three important methods are used in the content-based indexing system, key frame extracting method by dividing video data, clip file creation method by clustering related information, and video database construction method by using clip unit. In Web-based retrieval system, retrieval method ny using a key word, two dimension browsing method of key frame, and real-time display method of the clip are used. In this paper, we design and implement the system that supports real-time display method of the clip are used. In this paper, we design and implement the system that supports real-time retrieval for video clips on Web environment and provides the multimedia service in stability. The proposed methods show a usefulness of video content providing, and provide an easy method for serching intented video content.

Analysis on the Performance Impact of Partitioned LLC for Heterogeneous Multicore Processors (이종 멀티코어 프로세서에서 분할된 공유 LLC가 성능에 미치는 영향 분석)

  • Moon, Min Goo;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.2
    • /
    • pp.39-49
    • /
    • 2019
  • Recently, CPU-GPU integrated heterogeneous multicore processors have been widely used for improving the performance of computing systems. Heterogeneous multicore processors integrate CPUs and GPUs on a single chip where CPUs and GPUs share the LLC(Last Level Cache). This causes a serious cache contention problem inside the processor, resulting in significant performance degradation. In this paper, we propose the partitioned LLC architecture to solve the cache contention problem in heterogeneous multicore processors. We analyze the performance impact varying the LLC size of CPUs and GPUs, respectively. According to our simulation results, the bigger the LLC size of the CPU, the CPU performance improves by up to 21%. However, the GPU shows negligible performance difference when the assigned LLC size increases. In other words, the GPU is less likely to lose the performance when the LLC size decreases. Because the performance degradation due to the LLC size reduction in GPU is much smaller than the performance improvement due to the increase of the LLC size of the CPU, the overall performance of heterogeneous multicore processors is expected to be improved by applying partitioned LLC to CPUs and GPUs. In addition, if we develop a memory management technique that can maximize the performance of each core in the future, we can greatly improve the performance of heterogeneous multicore processors.

Understanding the Mismatch between ERP and Organizational Information Needs and Its Responses: A Study based on Organizational Memory Theory (조직의 정보 니즈와 ERP 기능과의 불일치 및 그 대응책에 대한 이해: 조직 메모리 이론을 바탕으로)

  • Jeong, Seung-Ryul;Bae, Uk-Ho
    • Asia pacific journal of information systems
    • /
    • v.22 no.2
    • /
    • pp.21-38
    • /
    • 2012
  • Until recently, successful implementation of ERP systems has been a popular topic among ERP researchers, who have attempted to identify its various contributing factors. None of these efforts, however, explicitly recognize the need to identify disparities that can exist between organizational information requirements and ERP systems. Since ERP systems are in fact "packages" -that is, software programs developed by independent software vendors for sale to organizations that use them-they are designed to meet the general needs of numerous organizations, rather than the unique needs of a particular organization, as is the case with custom-developed software. By adopting standard packages, organizations can substantially reduce many of the potential implementation risks commonly associated with custom-developed software. However, it is also true that the nature of the package itself could be a risk factor as the features and functions of the ERP systems may not completely comply with a particular organization's informational requirements. In this study, based on the organizational memory mismatch perspective that was derived from organizational memory theory and cognitive dissonance theory, we define the nature of disparities, which we call "mismatches," and propose that the mismatch between organizational information requirements and ERP systems is one of the primary determinants in the successful implementation of ERP systems. Furthermore, we suggest that customization efforts as a coping strategy for mismatches can play a significant role in increasing the possibilities of success. In order to examine the contention we propose in this study, we employed a survey-based field study of ERP project team members, resulting in a total of 77 responses. The results of this study show that, as anticipated from the organizational memory mismatch perspective, the mismatch between organizational information requirements and ERP systems makes a significantly negative impact on the implementation success of ERP systems. This finding confirms our hypothesis that the more mismatch there is, the more difficult successful ERP implementation is, and thus requires more attention to be drawn to mismatch as a major failure source in ERP implementation. This study also found that as a coping strategy on mismatch, the effects of customization are significant. In other words, utilizing the appropriate customization method could lead to the implementation success of ERP systems. This is somewhat interesting because it runs counter to the argument of some literature and ERP vendors that minimized customization (or even the lack thereof) is required for successful ERP implementation. In many ERP projects, there is a tendency among ERP developers to adopt default ERP functions without any customization, adhering to the slogan of "the introduction of best practices." However, this study asserts that we cannot expect successful implementation if we don't attempt to customize ERP systems when mismatches exist. For a more detailed analysis, we identified three types of mismatches-Non-ERP, Non-Procedure, and Hybrid. Among these, only Non-ERP mismatches (a situation in which ERP systems cannot support the existing information needs that are currently fulfilled) were found to have a direct influence on the implementation of ERP systems. Neither Non-Procedure nor Hybrid mismatches were found to have significant impact in the ERP context. These findings provide meaningful insights since they could serve as the basis for discussing how the ERP implementation process should be defined and what activities should be included in the implementation process. They show that ERP developers may not want to include organizational (or business processes) changes in the implementation process, suggesting that doing so could lead to failed implementation. And in fact, this suggestion eventually turned out to be true when we found that the application of process customization led to higher possibilities of failure. From these discussions, we are convinced that Non-ERP is the only type of mismatch we need to focus on during the implementation process, implying that organizational changes must be made before, rather than during, the implementation process. Finally, this study found that among the various customization approaches, bolt-on development methods in particular seemed to have significantly positive effects. Interestingly again, this finding is not in the same line of thought as that of the vendors in the ERP industry. The vendors' recommendations are to apply as many best practices as possible, thereby resulting in the minimization of customization and utilization of bolt-on development methods. They particularly advise against changing the source code and rather recommend employing, when necessary, the method of programming additional software code using the computer language of the vendor. As previously stated, however, our study found active customization, especially bolt-on development methods, to have positive effects on ERP, and found source code changes in particular to have the most significant effects. Moreover, our study found programming additional software to be ineffective, suggesting there is much difference between ERP developers and vendors in viewpoints and strategies toward ERP customization. In summary, mismatches are inherent in the ERP implementation context and play an important role in determining its success. Considering the significance of mismatches, this study proposes a new model for successful ERP implementation, developed from the organizational memory mismatch perspective, and provides many insights by empirically confirming the model's usefulness.

  • PDF

The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU (내장형 GPU 환경에서 CPU-GPU 간의 공유 캐시에서의 캐시 분할 방식의 필요성)

  • Sung, Hanul;Eom, Hyeonsang;Yeom, HeonYoung
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.507-512
    • /
    • 2014
  • Recently, Distributed computing processing begins using both CPU(Central processing unit) and GPU(Graphic processing unit) to improve the performance to overcome darksilicon problem which cannot use all of the transistors because of the electric power limitation. There is an integrated graphics processor that CPU and GPU share memory and Last level cache(LLC). But, There is no LLC access rules between CPU and GPU, so if GPU and CPU processes run together at the same time, performance of both processes gets worse because of the contention on the LLC. This Paper gives evidence to prove the need of the Cache Partitioning and is mentioned about the cache partitioning design using page coloring to allocate the L3 Cache space only for the GPU process to guarantee GPU process performance.

VDI deployment and performance analysys for multi-core-based applications (멀티코어 기반 어플리케이션 운용을 위한 데스크탑 가상화 구성 및 성능 분석)

  • Park, Junyong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.10
    • /
    • pp.1432-1440
    • /
    • 2022
  • Recently, as Virtual Desktop Infrastructure(VDI) is widely used not only in office work environments but also in workloads that use high-spec multi-core-based applications, the requirements for real-time and stability of VDI are increasing. Accordingly, the display protocol used for remote access in VDI and performance optimization of virtual machines have also become more important. In this paper, we propose two ways to configure desktop virtualization for multi-core-based application operation. First, we propose a codec configuration of a display protocol with optimal performance in a high load situation due to multi-processing. Second, we propose a virtual CPU scheduling optimization method to reduce scheduling delay in case of CPU contention between virtual machines. As a result of the test, it was confirmed that the H.264 codec of Blast Extreme showed the best and stable frame, and the scheduling performance of the virtual CPU was improved through scheduling optimization.