• Title/Summary/Keyword: CPU 시간

Search Result 518, Processing Time 0.038 seconds

An Application-Specific and Adaptive Power Management Technique for Portable Systems (휴대장치를 위한 응용프로그램 특성에 따른 적응형 전력관리 기법)

  • Egger, Bernhard;Lee, Jae-Jin;Shin, Heon-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.8
    • /
    • pp.367-376
    • /
    • 2007
  • In this paper, we introduce an application-specific and adaptive power management technique for portable systems that support dynamic voltage scaling (DVS). We exploit both the idle time of multitasking systems running soft real-time tasks as well as memory- or CPU-bound code regions. Detailed power and execution time profiles guide an adaptive power manager (APM) that is linked to the operating system. A post-pass optimizer marks candidate regions for DVS by inserting calls to the APM. At runtime, the APM monitors the CPU's performance counters to dynamically determine the affinity of the each marked region. for each region, the APM computes the optimal voltage and frequency setting in terms of energy consumption and switches the CPU to that setting during the execution of the region. Idle time is exploited by monitoring system idle time and switching to the energy-wise most economical setting without prolonging execution. We show that our method is most effective for periodic workloads such as video or audio decoding. We have implemented our method in a multitasking operating system (Microsoft Windows CE) running on an Intel XScale-processor. We achieved up to 9% of total system power savings over the standard power management policy that puts the CPU in a low Power mode during idle periods.

A Relative Performance Index-based Job Migration in Grid Computing Environment (그리드 컴퓨팅 환경에서의 상대성능지수에 기반한 작업 이주)

  • Kim Young-Gyun;Oh Gil-Ho;Cho Kum Won;Ko Soon-Heum
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.4
    • /
    • pp.293-304
    • /
    • 2005
  • In this paper, we research on job migration in a grid computing environment with cactus and MPICH-C2 based on Globus. Our concepts are to perform job migration by finding the site with plenty of computational resources that would decrease execution time in a grid computing environment. The Migration Manager recovers the job from the checkpointing files and restarts the job on the migrated site. To select a migrating site, the proposed method considers system's performance index, cpu's load, network traffic to send migration job tiles and the execution time predicted on a migration site. Then it selects a site with maximal performance gains. By selecting a site with minimum migration time and minimum execution time. this approach implements a more efficient grid computing environment. The proposed method Is proved by effectively decreasing total execution time at the $K\ast{Grid}$.

CUDA Implementation for the Four-Russian Algorithm (4-러시안 알고리즘의 CUDA 구현)

  • Kim, Young Ho;Jeong, Ju-Hui;Kang, Dae Woong;Sim, Jeong Seop;Kim, Minho;Park, Soo-jun;Lim, Myungeun;Jung, Ho-Youl
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.261-264
    • /
    • 2012
  • 상수 크기의 알파벳 ${\Sigma}$에 대해 길이가 각각 m, n인 두 문자열 X와 Y의 편집거리는 X를 Y로 변환하기 위해 필요한 최소 편집연산의 수로 정의된다. 두 문자열의 편집거리는 잘 알려진 동적프로그래밍을 이용하여 O(mn) 시간과 공간에 계산할 수 있으며, 4-러시안 알고리즘을 이용해도 계산할 수 있다. 4-러시안 알고리즘은 블록 크기를 상수 t라 할 때, 전처리 단계에서 $O\((3{\mid}{\Sigma}{\mid})^{2t}t^2\)$ 시간과 $O\((3{\mid}{\Sigma}{\mid})^{2t}t^2\)$ 공간이 필요하며, 계산 단계에서 O(mn/t) 시간과 O(mn) 공간을 이용하여 편집거리를 계산하는 알고리즘이다. 본 논문에서는 4-러시안 알고리즘의 계산 단계를 CUDA를 이용하여 구현하고 실험을 통해 CPU 기반의 순차적인 수행시간과 GPU 기반의 병렬적인 수행시간의 비교결과를 제시한다. 본 논문의 병렬알고리즘은 m/t개의 쓰레드를 사용하여 O(m+n) 시간에 편집거리를 계산한다. GPU 기반의 알고리즘이 CPU 기반의 알고리즘 보다 t=1일 때 약 10배 빠르고, t=2일 때 약 3배 빠른 결과를 보였다.

An Accelerated Approach to Dose Distribution Calculation in Inverse Treatment Planning for Brachytherapy (근접 치료에서 역방향 치료 계획의 선량분포 계산 가속화 방법)

  • Byungdu Jo
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.5
    • /
    • pp.633-640
    • /
    • 2023
  • With the recent development of static and dynamic modulated brachytherapy methods in brachytherapy, which use radiation shielding to modulate the dose distribution to deliver the dose, the amount of parameters and data required for dose calculation in inverse treatment planning and treatment plan optimization algorithms suitable for new directional beam intensity modulated brachytherapy is increasing. Although intensity-modulated brachytherapy enables accurate dose delivery of radiation, the increased amount of parameters and data increases the elapsed time required for dose calculation. In this study, a GPU-based CUDA-accelerated dose calculation algorithm was constructed to reduce the increase in dose calculation elapsed time. The acceleration of the calculation process was achieved by parallelizing the calculation of the system matrix of the volume of interest and the dose calculation. The developed algorithms were all performed in the same computing environment with an Intel (3.7 GHz, 6-core) CPU and a single NVIDIA GTX 1080ti graphics card, and the dose calculation time was evaluated by measuring only the dose calculation time, excluding the additional time required for loading data from disk and preprocessing operations. The results showed that the accelerated algorithm reduced the dose calculation time by about 30 times compared to the CPU-only calculation. The accelerated dose calculation algorithm can be expected to speed up treatment planning when new treatment plans need to be created to account for daily variations in applicator movement, such as in adaptive radiotherapy, or when dose calculation needs to account for changing parameters, such as in dynamically modulated brachytherapy.

Adaptive Search Range Decision for Accelerating GPU-based Integer-pel Motion Estimation in HEVC Encoders (HEVC 부호화기에서 GPU 기반 정수화소 움직임 추정을 고속화하기 위한 적응적인 탐색영역 결정 방법)

  • Kim, Sangmin;Lee, Dongkyu;Sim, Dong-Gyu;Oh, Seoung-Jun
    • Journal of Broadcast Engineering
    • /
    • v.19 no.5
    • /
    • pp.699-712
    • /
    • 2014
  • In this paper, we propose a new Adaptive Search Range (ASR) decision algorithm for accelerating GPU-based Integer-pel Motion Estimation (IME) of High Efficiency Video Coding (HEVC). For deciding the ASR, we classify a frame into two models using Motion Vector Differences (MVDs) then adaptively decide the search ranges of each model. In order to apply the proposed algorithm to the GPU-based ME process, starting points of the ME are decided using only temporal Motion Vectors (MVs). The CPU decides the ASR as well as the starting points and transfers them to the GPU. Then, the GPU performs the integer-pel ME. The proposed algorithm reduces the total encoding time by 37.9% with BD-rate increase of 1.1% and yields 951.2 times faster ME against the CPU-based anchor. In addition, the proposed algorithm achieves the time reduction of 57.5% in the ME running time with the negligible coding loss of 0.6%, compared with the simple GPU-based ME without ASR decision.

Implementation of a TCP/IP Offload Engine Using High Performance Lightweight TCP/IP (고성능 경량 TCP/IP를 이용한 소프트웨어 기반 TCP/IP 오프로드 엔진 구현)

  • Jun, Yong-Tae;Chung, Sang-Hwa;Yoon, In-Su
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.369-377
    • /
    • 2008
  • Today, Ethernet technology is rapidly developing to have a bandwidth of 10Gbps beyond 1Gbps. In such high-speed networks, the existing method that host CPU processes TCP/IP in the operating system causes numerous overheads. As a result of the overheads, user applications cannot get the enough computing power from the host CPU. To solve this problem, the TCP/IP Offload Engine(TOE) technology was emerged. TOE is a specialized NIC which processes the TCP/IP instead of the host CPU. In this paper, we implemented a high-performance, lightweight TCP/IP(HL-TCP) for the TOE and applied it to an embedded system. The HL-TCP supports existing fundamental TCP/IP functions; flow control, congestion control, retransmission, delayed ACK, processing out-of-order packets. And it was implemented to utilize Ethernet MAC's hardware features such as TCP segmentation offload(TSO), checksum offload(CSO) and interrupt coalescing. Also we eliminated the copy overhead from the host memory to the NIC memory when sending data and we implemented an efficient DMA mechanism for the TCP retransmission. The TOE using the HL-TCP has the CPU utilization of less than 6% and the bandwidth of 453Mbps.

Interrupt Processing in Dynamic Frequency Scaling Processor System (동적 프리퀀시 스케일링을 사용한 프로세서의 인터럽트 처리와 I/O 시스템 성능 향상 기법)

  • Yoo See-Hwan;Yoo Chuck
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.328-330
    • /
    • 2006
  • 동적 전력 관리 기법을 활용한 프로세서의 등장은 고성능 임베디드 장치들의 저전력 설계에 있어서 큰 영향을 주고 있다 특히, XSCALE과 같은 고성능 프로세서의 소비전력은 동작 클럭의 속도와 비례하여 빠르게 증가하고 있으며, 이를 극복하기 위한 다양한 기법이 제시되었다. 동적 전력 관리 기법은 크게 1) 동적 전압 관리 기법과 동적 프리퀀시 관리 기법으로 구분된다. 동적 프리퀀시 관리 기법을 사용한 프로세서는 필요에 따라 프로세서의 동작 클럭 속도를 변경한다. 이는 전체적인 프로세서 성능의 저하를 수반하게 된다 특히, 주변 장치들의 전력 관리가 동시에 이루어지지 않을 경우에는 시스템의 전체적인 성능에 큰 영향을 끼치게 된다. I/O 장치의 인터럽트는 CPU의 현재 실행을 잠시 멈추고, 인터럽트 처리를 우선적으로 수행하도록 한다. 따라서 CPU가 처리할 수 있는 양보다 많은 인터럽트 발생은 인터럽트 처리 이후에 실제 응용 프로그램들이 동작할 시간을 줄이게 되어 CPU는 살아있으나, 인터럽트 이외의 실제 프로세스 실행을 진행할 수 없는 라이브륵(livelock) 현상이 발생한다. 동적 프리퀀시 스케일링을 사용하는 경우, 프로세서의 동작 속도 저하로 인한 livelock 현상이 발생할 수 있으며 이를 막기 위하여, 인터럽트 처리를 제한하는 기법을 제시한다.

  • PDF

Parallelization of CUSUM Test in a CUDA Environment (CUDA 환경에서 CUSUM 검증의 병렬화)

  • Son, Changhwan;Park, Wooyeol;Kim, HyeongGyun;Han, KyungSook;Pyo, Changwoo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.476-481
    • /
    • 2015
  • We have parallelized the cumulative sum (CUSUM) test of NIST's statistical random number test suite in a CUDA environment. Storing random walks in an array instead of in scalar variables eliminates data dependence. The change in data structure makes it possible to apply parallel scans, scatters, and reductions at each stage of the test. In addition, serial data exchanges between CPU and GPU are removed by migrating CPU's tasks to GPU. Finally we have optimized global memory accesses. The overall speedup is 23 times over the sequential version. Our results contribute to improving security of random numbers for cryptographic keys as well as reducing the time for evaluation of randomness.

GPGPU Task Management Technique to Mitigate Performance Degradation of Virtual Machines due to GPU Operation in Cloud Environments (클라우드 환경에서 GPU 연산으로 인한 가상머신의 성능 저하를 완화하는 GPGPU 작업 관리 기법)

  • Kang, Jihun;Gil, Joon-Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.189-196
    • /
    • 2020
  • Recently, GPU cloud computing technology applying GPU(Graphics Processing Unit) devices to virtual machines is widely used in the cloud environment. In a cloud environment, GPU devices assigned to virtual machines can perform operations faster than CPUs through massively parallel processing, which can provide many benefits when operating high-performance computing services in a variety of fields in a cloud environment. In a cloud environment, a GPU device can help improve the performance of a virtual machine, but the virtual machine scheduler, which is based on the CPU usage time of a virtual machine, does not take into account GPU device usage time, affecting the performance of other virtual machines. In this paper, we test and analyze the performance degradation of other virtual machines due to the virtual machine that performs GPGPU(General-Purpose computing on Graphics Processing Units) task in the direct path based GPU virtualization environment, which is often used when assigning GPUs to virtual machines in cloud environments. Then to solve this problem, we propose a GPGPU task management method for a virtual machine.

Design and implementation of the early projection in BADA-II (바다-II에서 조기 프로젝션의 설계와 구현)

  • 김정아;박영철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10b
    • /
    • pp.208-210
    • /
    • 2003
  • 기존의 바다-II는 늦은 프로젝션(late projection) 기법을 사용하기 때문에 중간 임시 결과 파일의 디스크 쓰기와 디스크 읽기에 많은 디스크 자원, 메모리 자원, 그리고 CPU 시간이 소모되었다. 본 논문은 바다-II에 조기 프로젝션(early projection) 기법의 설계와 구현 그리고 그에 따른 성능 향상을 보인다.

  • PDF