• Title/Summary/Keyword: multiple CPU's

Search Result 46, Processing Time 0.019 seconds

Event Routing Scheme to Improve I/O Latency of SMP VM (SMP 가상 머신의 I/O 지연 시간 감소를 위한 이벤트 라우팅 기법)

  • Shin, Jungsub;Kim, Hagyoung
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1322-1331
    • /
    • 2015
  • According to the hypervisor scheduler, the vCPU (virtual CPU) operates under two states: the running state and the stop state. When the vCPU is in the stop state, incoming events are delayed until that vCPU's state changes to the running state. The latency in handling such events that are sent to the vCPU is regarded as the I/O latency. Since a SMP (symmetric multiprocessing) VM (virtual machine) incorporates multiple vCPUs, the event latency on a SMP VM can vary according to specific vCPU that receives the event. In this paper, we propose a new scheme named event routing that sends events according to the operation state of each vCPU to reduce the event latency on an SMP VM. We implemented the proposed event routing scheme in Xen ARM hypervisor and confirmed the reduction of I/O latency from measuring the network RTT (round trip time) and the TCP bandwidth under a variety of testing conditions. The network RTT decreases by up to 94% and the TCP bandwidth increases up to 35% when compare to native Xen ARM.

Design and Implementation of Real-Time Parallel Engine for Discrete Event Wargame Simulation (이산사건 워게임 시뮬레이션을 위한 실시간 병렬 엔진의 설계 및 구현)

  • Kim, Jin-Soo;Kim, Dae-Seog;Kim, Jung-Guk;Ryu, Keun-Ho
    • The KIPS Transactions:PartA
    • /
    • v.10A no.2
    • /
    • pp.111-122
    • /
    • 2003
  • Military wargame simulation models must support the HLA in order to facilitate interoperability with other simulations, and using parallel simulation engines offer efficiency in reducing system overhead generated by propelling interoperability. However, legacy military simulation model engines process events using sequential event-driven method. This is due to problems generated by parallel processing such as synchronous reference to global data domains. Additionally. using legacy simulation platforms result in insufficient utilization of multiple CPUs even if a multiple CPU system is under use. Therefore, in this paper, we propose conversing the simulation engine to an object model-based parallel simulation engine to ensure military wargame model's improved system processing capability, synchronous reference to global data domains, external simulation time processing, and the sequence of parallel-processed events during a crash recovery. The converted parallel simulation engine is designed and implemented to enable parallel execution on a multiple CPU system (SMP).

Performance Evaluation of Microservers to drive for Cloud Computing Applications (클라우드 컴퓨팅 응용 구동을 위한 마이크로서버 성능평가)

  • Myeong-Hoon Oh
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.4
    • /
    • pp.85-91
    • /
    • 2023
  • In order to utilize KOSMOS, the performance evaluation results are presented in this paper with CloudSuite, an application service-based benchmark program in the cloud computing area. CloudSuite offers several distinct applications as cloud services in two parts: offline applications and online applications on containers. In comparison with other microservers which have similar hardware specifications of KOSMOS, it was observed that KOSMOS was superior in all CloudSuite benchmark applications. KOSMOS also showed higher performance than Intel Xeon CPU-based servers in an offline application. KOSMOS reduced completion time during executing Graph Analytics by 30.3% and 72.3% compared to two Intel Xeon CPU-based servers in an experimental configuration of multiple nodes in KOSMOS.

Development of robot control system using DSP (DSP를 이용한 로보트 제어시스템 개발)

  • Lee, Bo-Hee;Kim, Jin-Geol
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.1 no.1
    • /
    • pp.50-57
    • /
    • 1995
  • In this paper, the design and the implementation of the controller for an articulate robot, which is developed in our Automatic Control Laboratory, are mainly discussed. The controller reduces software computational load via distributed processing method using multiple CPU's, and simplifies structures by the time-division control with TMS320C31 DSP chip. The method of control is based on the fuzzy-compensated PID control with scale factor, which compensates for the influence of load variation resulting from the various postures of the robot with conventional PID scheme. The application of the proposed controller to the robot system with DC servo-motors shows some excellent control capabilities. Also, the response characteristics of system for the various trajectory commands verify the superiority of the controller.

  • PDF

QoS-Aware Power Management of Mobile Games with High-Load Threads (CPU 부하가 큰 쓰레드를 가진 모바일 게임에서 QoS를 고려한 전력관리 기법)

  • Kim, Minsung;Kim, Jihong
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.328-333
    • /
    • 2017
  • Mobile game apps, which are popular in various mobile devices, tend to be power-hungry and rapidly drain the device's battery. Since a long battery lifetime is a key design requirement of mobile devices, reducing the power consumption of mobile game apps has become an important research topic. In this paper, we investigate the power consumption characteristics of popular mobile games with multiple threads, focusing on the inter-thread. From our power measurement study of popular mobile game apps, we observed that some of these apps have abnormally high-load threads that barely affect the user's gaming experience, despite the high energy consumption. In order to reduce the wasted power from these abnormal threads, we propose a novel technique that detects such abnormal threads during run time and reduces their power consumption without degrading user experience. Our experimental results on an Android smartphone show that the proposed technique can reduce the energy consumption of mobile game apps by up to 58% without any negative impact on the user's gaming experience.

Low Power TLB Supporting Multiple Page Sizes without Operation System (운영체제 도움 없이 멀티 페이지를 지원하는 저전력 TLB 구조)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.12
    • /
    • pp.1-9
    • /
    • 2013
  • Even though the multiple pages TLB are effective in improving the performance, a conventional method with OS support cannot utilize multiple page sizes in user application. Thus, we propose a new multiple-TLB structure supporting multiple page sizes for high performance and low power consumption without any operating system support. The proposed TLB is organised as two parts of a S-TLB(Small TLB) with a small page size and a L-TLB(Large TLB) with a large page size. Both are designed as fully associative bank structures. The S-TLB stores small pages are evicted from the L-TLB, and the L-TLB stores large pages including a small page generated by the CPU. Each one bank module of S-TLB and L-TLB can be selectively accessed base on particular one and two bits of the virtual address generated from CPU, respectively. Energy savings are achieved by reducing the number of entries accessed at a time. Also, this paper proposed the simple 1-bit LRU policy to improve the performance. The proposed LRU policy can present recently referenced block by using an additional one bit of each entry on TLBs. This method can simply select a least recently used page from the L-TLB. According to the simulation results, the proposed TLB can reduce Energy * Delay by about 76%, 57%, and 6% compared with a fully associative TLB, a ARM TLB, and a Dual TLB, respectively.

Algebraic Method for Computation of Natural Frequency and Mode Shape Sensitivities (고유진동수와 모드의 민감도를 계산하기 위한 대수적 방법)

  • Jung, Gil-Ho;Kim, Dong-Ok;Lee, Chong-Won;Lee, In-Won
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.21 no.5
    • /
    • pp.707-718
    • /
    • 1997
  • This paper presents an efficient numerical method for the computation of eigenpair derivatives for a real symmetric eigenvalue problem with distinct and multiple eigenvalues. The method has a very simple algorithm and gives an exact solution. Furthermore, it saves computer sotrage and CPU time. The algorithm preserves not only the symmetricity but also the band width of the matrices, allowing efficient computer storage and solution techniques. Results from the proposed method for calculating the eigenpair derivatives are compared with those from Rudisill and Chu's method and Nelson's method which is known efficient one in the case of distinct natural frequencies. As an example to demonstrate the efficiency of the proposed method in the case of distinct eigenvalues, a cantilever plate is considered. The design parameter of the cantilever plate is its thickness. For the eigenvalue problem with multiple natural frequencies, the adjacent eigenvectors are used in the algebraic equation as side conditions, lying adjacent to the multiplicity of multiple natural frequency distinct eigenvalues, which appear when design parameter varies. A cantilever beam is used to demonstrate the efficiency of the proposed method in the case of multiple natural frequencies. Results form the proposed method for calculating the eigenpair derivatives are compared with those from Dailey's method(an amendation of Ojalvo's work) which finds the exact eigenvector derivatives. The design parameter of the cantilever beam is its height. Data is presented showing the amount of CPU time used to compute the first ten eigenpair derivatives by each method. It is important to note that the numerical stability of the proposed method is proved.

Algebraic Method for Evaluating Natural Frequency and Mode Shape Sensitivities (고유진동수와 모우드의 미분을 구하기 위한 대수적 방법)

  • 정길호;김동욱;이인원
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 1995.10a
    • /
    • pp.225-233
    • /
    • 1995
  • This paper presents an efficient numerical method for computation of eigenpair derivatives for the real symmetric eigenvalue problem with distinct and multiple eigenvalues. The method has very simple algorithm and gives an exact solution. Furthermore, it saves computer storage and CPU time. The algorithm preserves the symmetry and band of the matrices, allowing efficient computer storage and solution techniques. Thus, the algorithm of the proposed method will be inserted easily in the commercial FEM codes. Results of the proposed method for calculating the eigenpair derivatives are compared with those of Rudisill and Chu's method and Nelson's method which is efficient one in the case of distinct natural frequencies. As an example to demonstrate the efficiency of the proposed method in the case of distinct eigenvalues, a cantilever plate is considered. The design parameter of the cantilever plate is its thickness. For the eigenvalue problem with multiple natural frequencies, the adjacent eigenvectors are used in the algebraic equation as side conditions, they lie adjacent to the m (multiplicity of multiple natural frequency) distinct eigenvalues, which appear when design parameter varies. As an example to demonstrate the efficiency of the proposed method in the case of multiple natural frequencies, a cantilever beam is considered. Results of the proposed method fDr calculating the eigenpair derivatives are compared with those of Bailey's method (an amendation of Ojalvo's work) which finds the exact eigenvector derivatives. The design parameter of the cantilever beam is its height. Data is persented showing the amount of CPU time used to compute the first ten eigenpair derivatives by each method. It is important to note that the numerical stability of the proposed method is proved.

  • PDF

Performance Analysis of Open Source Based Distributed Deduplication File System (오픈 소스 기반 데이터 분산 중복제거 파일 시스템의 성능 분석)

  • Jung, Sung-Ouk;Choi, Hoon
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.623-631
    • /
    • 2014
  • Comparison of two representative deduplication file systems, LessFS and SDFS, shows that Lessfs is better in execution time and CPU utilization while SDFS is better in storage usage (around 1/8 less than general file systems). In this paper, a new system is proposed where the advantages of SDFS and Lessfs are combined. The new system uses multiple DFEs and one DSE to maintain the integrity and consistency of the data. An evaluation study to compare between Single DFE and Dual DFE indicates that the Dual DFE was better than the Single DFE. The Dual DFE reduced the CPU usage and provided fast deduplication time. This reveals that proposed system can be used to solve the problem of an increase in large data storage and power consumption.

Large-scale 3D fast Fourier transform computation on a GPU

  • Jaehong Lee;Duksu Kim
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1035-1045
    • /
    • 2023
  • We propose a novel graphics processing unit (GPU) algorithm that can handle a large-scale 3D fast Fourier transform (i.e., 3D-FFT) problem whose data size is larger than the GPU's memory. A 1D FFT-based 3D-FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data-transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large-scale benchmarks and compare its performance with the state-of-the-art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU-based 3D-FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive-scale problems, whereas our method's performance is stable.