• Title/Summary/Keyword: CPU Processing Time

Search Result 332, Processing Time 0.035 seconds

Efficient Workload Distribution of Photomosaic Using OpenCL into a Heterogeneous Computing Environment (이기종 컴퓨팅 환경에서 OpenCL을 사용한 포토모자이크 응용의 효율적인 작업부하 분배)

  • Kim, Heegon;Sa, Jaewon;Choi, Dongwhee;Kim, Haelyeon;Lee, Sungju;Chung, Yongwha;Park, Daihee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.8
    • /
    • pp.245-252
    • /
    • 2015
  • Recently, parallel processing methods with accelerator have been introduced into a high performance computing and a mobile computing. The photomosaic application can be parallelized by using inherent data parallelism and accelerator. In this paper, we propose a way to distribute the workload of the photomosaic application into a CPU and GPU heterogeneous computing environment. That is, the photomosaic application is parallelized using both CPU and GPU resource with the asynchronous mode of OpenCL, and then the optimal workload distribution rate is estimated by measuring the execution time with CPU-only and GPU-only distribution rates. The proposed approach is simple but very effective, and can be applied to parallelize other applications on a CPU and GPU heterogeneous computing environment. Based on the experimental results, we confirm that the performance is improved by 141% into a heterogeneous computing environment with the optimal workload distribution compared with using GPU-only method.

A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems (방출단층촬영 시스템을 위한 GPU 기반 반복적 기댓값 최대화 재구성 알고리즘 연구)

  • Ha, Woo-Seok;Kim, Soo-Mee;Park, Min-Jae;Lee, Dong-Soo;Lee, Jae-Sung
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.459-467
    • /
    • 2009
  • Purpose: The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Materials and Methods: Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. Results: The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 see, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 see, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. Conclusion: The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries.

An Overhead Analysis of Pfair Real-Time Multi-Core Scheduler with CPU Affinity on Embedded Systems (임베디드 시스템에서 CPU 선호도를 고려한 Pfair 실시간 멀티코어 스케줄러의 오버헤드 분석)

  • Lee, Jung-in;Park, Sangsoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.66-68
    • /
    • 2011
  • 낮은 오버헤드를 갖는 실시간 스케줄링 알고리즘은 멀티코어 프로세서가 임베디드 시스템에서 사용되기 위한 가장 중요한 요소 중의 하나이다. 멀티코어 환경에서 스케줄링 오버헤드는 주로 메모리 성능을 저해시키는 코어간 태스크 이동에 의해 발생한다. 본 논문에서는 시스템 이용률 면에서 최적으로 알려진 Pfair 스케줄링 알고리즘을 스케줄링 시에 태스크의 CPU 코어 할당 방식에 대해 스케줄링 오버헤드를 측정하였다. 실험 결과 동일 코어 기반 태스크 할당 방식을 도입함으로 인해서 태스크 이동 횟수를 크게 줄일 수 있음을 보여주었다.

Optimization of Ship Management System (선박관리 시스템의 최적화)

  • Syan, Lim Chia;Park, Soo-Hong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.6
    • /
    • pp.839-846
    • /
    • 2013
  • In this paper, an effort has been made to design and develop an optimized programming model for Real-time Ship Management System. Replacing the conventional interrupt-driven programming model, an embedded real-time operating system (RTOS) has been implemented on the system, allowing processes to run virtually simultaneous and multitasking. Data management algorithms are designed and developed in the RTOS to facilitate data distribution amongst tasks and optimize the CPU processing time through intelligent resource utilization. Finally, data lost in the system has been minimized via the improvement of data processing rate under the optimized programming model.

The GPU-based Parallel Processing Algorithm for Fast Inspection of Semiconductor Wafers (반도체 웨이퍼 고속 검사를 위한 GPU 기반 병렬처리 알고리즘)

  • Park, Youngdae;Kim, Joon Seek;Joo, Hyonam
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.19 no.12
    • /
    • pp.1072-1080
    • /
    • 2013
  • In a the present day, many vision inspection techniques are used in productive industrial areas. In particular, in the semiconductor industry the vision inspection system for wafers is a very important system. Also, inspection techniques for semiconductor wafer production are required to ensure high precision and fast inspection. In order to achieve these objectives, parallel processing of the inspection algorithm is essentially needed. In this paper, we propose the GPU (Graphical Processing Unit)-based parallel processing algorithm for the fast inspection of semiconductor wafers. The proposed algorithm is implemented on GPU boards made by NVIDIA Company. The defect detection performance of the proposed algorithm implemented on the GPU is the same as if by a single CPU, but the execution time of the proposed method is about 210 times faster than the one with a single CPU.

Design and Implementation of a Hardware-based Transmission/Reception Accelerator for a Hybrid TCP/IP Offload Engine (하이브리드 TCP/IP Offload Engine을 위한 하드웨어 기반 송수신 가속기의 설계 및 구현)

  • Jang, Han-Kook;Chung, Sang-Hwa;Yoo, Dae-Hyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.459-466
    • /
    • 2007
  • TCP/IP processing imposes a heavy load on the host CPU when it is processed by the host CPU on a very high-speed network. Recently the TCP/IP Offload Engine (TOE), which processes TCP/IP on a network adapter instead of the host CPU, has become an attractive solution to reduce the load in the host CPU. There have been two approaches to implement TOE. One is the software TOE in which TCP/IP is processed by an embedded processor and the other is the hardware TOE in which TCP/IP is processed by a dedicated ASIC. The software TOE has poor performance and the hardware TOE is neither flexible nor expandable enough to add new features. In this paper we designed and implemented a hybrid TOE architecture, in which TCP/IP is processed by cooperation of hardware and software, based on an FPGA that has two embedded processor cores. The hybrid TOE can have high performance by processing time-critical operations such as making and processing data packets in hardware. The software based on the embedded Linux performs operations that are not time-critical such as connection establishment, flow control and congestions, thus the hybrid TOE can have enough flexibility and expandability. To improve the performance of the hybrid TOE, we developed a hardware-based transmission/reception accelerator that processes important operations such as creating data packets. In the experiments the hybrid TOE shows the minimum latency of about $19{\mu}s$. The CPU utilization of the hybrid TOE is below 6 % and the maximum bandwidth of the hybrid TOE is about 675 Mbps.

The development of parallel computation method for the fire-driven-flow in the subway station (도시철도역사에서 화재유동에 대한 병렬계산방법연구)

  • Jang, Yong-Jun;Lee, Chang-Hyun;Kim, Hag-Beom;Park, Won-Hee
    • Proceedings of the KSR Conference
    • /
    • 2008.06a
    • /
    • pp.1809-1815
    • /
    • 2008
  • This experiment simulated the fire driven flow of an underground station through parallel processing method. Fire analysis program FDS(Fire Dynamics Simulation), using LES(Large Eddy Simulation), has been used and a 6-node parallel cluster, each node with 3.0Ghz_2set installed, has been used for parallel computation. Simulation model was based on the Kwangju-geumnan subway station. Underground station, and the total time for simulation was set at 600s. First, the whole underground passage was divided to 1-Mesh and 8-Mesh in order to compare the parallel computation of a single CPU and Multi-CPU. With matrix numbers($15{\times}10^6$) more than what a single CPU can handle, fire driven flow from the center of the platform and the subway itself was analyzed. As a result, there seemed to be almost no difference between the single CPU's result and the Multi-CPU's ones. $3{\times}10^6$ grid point one employed to test the computing time with 2CPU and 7CPU computation were computable two times and fire times faster than 1CPU respectively. In this study it was confirmed that CPU could be overcome by using parallel computation.

  • PDF

A study on game physics engine focused on real time physics (물리 엔진에 관한 고찰 : 실시간 물리 기술을 중심으로)

  • Ha, You-Jong;Park, Kyoung-Ju
    • Journal of Korea Game Society
    • /
    • v.9 no.5
    • /
    • pp.43-52
    • /
    • 2009
  • This paper analyzes the four game physics engines in terms of real time techniques. Real time physics is the technology that simplifies the physics-based simulation to apply for the real time applications such as game. Our study includes two commercial physics engines, Havok's Physics SDK and NVIDIA's PhysX SDK, and two open source projects, Open Dynamics Engine and Bullet physics engine. As a result, most of them covers rigid body dynamics and some include either deformable body simulation or fluids simulation, or both. For real time simulation, they adopt the simplified numerical methods, the effective in collision detection/response, and also use the parallel processing hardwares, i.e., multi core CPU, Physics processing unit(PPU), or graphics processing unit(GPU).

  • PDF

Secure VPN Performance in IP Layers (IP계층에서의 VPN 전송성능에 관한 연구)

  • 임형진;권윤주;정태명
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.11C
    • /
    • pp.102-112
    • /
    • 2001
  • This paper analyzes Security Performance and Processing Performance to measure performance between nodes by using AH and ESP protocol. IPsec VPN provides application with security service implemented in IP Layer while traffic cost and packet processing time it increased by encryption, decryption and authentication in AH and ESP. We measured overall packet processing time and IPsec module processing time. The result of the efficiency test showed that the factors of influencing electrical transmission efficiency were the size of electrical transmission packets, codes used for tunnelling, authentication functions, CPU velocity of host7, and the embodiment of IPsec; for a high capacity traffic, IPsec transmission was not appropriate, because transmission velocity was delayed by more than ten times in comparison with Non-IPsec.

  • PDF

GPGPU Task Management Technique to Mitigate Performance Degradation of Virtual Machines due to GPU Operation in Cloud Environments (클라우드 환경에서 GPU 연산으로 인한 가상머신의 성능 저하를 완화하는 GPGPU 작업 관리 기법)

  • Kang, Jihun;Gil, Joon-Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.189-196
    • /
    • 2020
  • Recently, GPU cloud computing technology applying GPU(Graphics Processing Unit) devices to virtual machines is widely used in the cloud environment. In a cloud environment, GPU devices assigned to virtual machines can perform operations faster than CPUs through massively parallel processing, which can provide many benefits when operating high-performance computing services in a variety of fields in a cloud environment. In a cloud environment, a GPU device can help improve the performance of a virtual machine, but the virtual machine scheduler, which is based on the CPU usage time of a virtual machine, does not take into account GPU device usage time, affecting the performance of other virtual machines. In this paper, we test and analyze the performance degradation of other virtual machines due to the virtual machine that performs GPGPU(General-Purpose computing on Graphics Processing Units) task in the direct path based GPU virtualization environment, which is often used when assigning GPUs to virtual machines in cloud environments. Then to solve this problem, we propose a GPGPU task management method for a virtual machine.