• Title/Summary/Keyword: CPU time

Search Result 946, Processing Time 0.028 seconds

Assessment of Parallel Computing Performance of Agisoft Metashape for Orthomosaic Generation (정사모자이크 제작을 위한 Agisoft Metashape의 병렬처리 성능 평가)

  • Han, Soohee;Hong, Chang-Ki
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.427-434
    • /
    • 2019
  • In the present study, we assessed the parallel computing performance of Agisoft Metashape for orthomosaic generation, which can implement aerial triangulation, generate a three-dimensional point cloud, and make an orthomosaic based on SfM (Structure from Motion) technology. Due to the nature of SfM, most of the time is spent on Align photos, which runs as a relative orientation, and Build dense cloud, which generates a three-dimensional point cloud. Metashape can parallelize the two processes by using multi-cores of CPU (Central Processing Unit) and GPU (Graphics Processing Unit). An orthomosaic was created from large UAV (Unmanned Aerial Vehicle) images by six conditions combined by three parallel methods (CPU only, GPU only, and CPU + GPU) and two operating systems (Windows and Linux). To assess the consistency of the results of the conditions, RMSE (Root Mean Square Error) of aerial triangulation was measured using ground control points which were automatically detected on the images without human intervention. The results of orthomosaic generation from 521 UAV images of 42.2 million pixels showed that the combination of CPU and GPU showed the best performance using the present system, and Linux showed better performance than Windows in all conditions. However, the RMSE values of aerial triangulation revealed a slight difference within an error range among the combinations. Therefore, Metashape seems to leave things to be desired so that the consistency is obtained regardless of parallel methods and operating systems.

Cascade CNN with CPU-FPGA Architecture for Real-time Face Detection (실시간 얼굴 검출을 위한 Cascade CNN의 CPU-FPGA 구조 연구)

  • Nam, Kwang-Min;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.21 no.4
    • /
    • pp.388-396
    • /
    • 2017
  • Since there are many variables such as various poses, illuminations and occlusions in a face detection problem, a high performance detection system is required. Although CNN is excellent in image classification, CNN operatioin requires high-performance hardware resources. But low cost low power environments are essential for small and mobile systems. So in this paper, the CPU-FPGA integrated system is designed based on 3-stage cascade CNN architecture using small size FPGA. Adaptive Region of Interest (ROI) is applied to reduce the number of CNN operations using face information of the previous frame. We use a Field Programmable Gate Array(FPGA) to accelerate the CNN computations. The accelerator reads multiple featuremap at once on the FPGA and performs a Multiply-Accumulate (MAC) operation in parallel for convolution operation. The system is implemented on Altera Cyclone V FPGA in which ARM Cortex A-9 and on-chip SRAM are embedded. The system runs at 30FPS with HD resolution input images. The CPU-FPGA integrated system showed 8.5 times of the power efficiency compared to systems using CPU only.

FPGA-based design and implementation of data acquisition and real-time processing for laser ultrasound propagation

  • Abbas, Syed Haider;Lee, Jung-Ryul;Kim, Zaeill
    • International Journal of Aeronautical and Space Sciences
    • /
    • v.17 no.4
    • /
    • pp.467-475
    • /
    • 2016
  • Ultrasonic propagation imaging (UPI) has shown great potential for detection of impairments in complex structures and can be used in wide range of non-destructive evaluation and structural health monitoring applications. The software implementation of such algorithms showed a tendency in time-consumption with increment in scan area because the processor shares its resources with a number of programs running at the same time. This issue was addressed by using field programmable gate arrays (FPGA) that is a dedicated processing solution and used for high speed signal processing algorithms. For this purpose, we need an independent and flexible block of logic which can be used with continuously evolvable hardware based on FPGA. In this paper, we developed an FPGA-based ultrasonic propagation imaging system, where FPGA functions for both data acquisition system and real-time ultrasonic signal processing. The developed UPI system using FPGA board provides better cost-effectiveness and resolution than digitizers, and much faster signal processing time than CPU which was tested using basic ultrasonic propagation algorithms such as ultrasonic wave propagation imaging and multi-directional adjacent wave subtraction. Finally, a comparison of results for processing time between a CPU-based UPI system and the novel FPGA-based system were presented to justify the objective of this research.

Synthesis of Ocean Wave Models and Simulation Using GPU (바다물결 모형의 합성 및 GPU를 이용한 시뮬레이션)

  • Lee, Dong-Min;Lee, Sung-Kee
    • The KIPS Transactions:PartA
    • /
    • v.14A no.7
    • /
    • pp.421-434
    • /
    • 2007
  • Among many other CG generated natural scenes, the representation of ocean surfaces is one of the most complicated and time-consuming problem because of its large extent and complex surface movement. We present a hybrid method to represent and animate unbound deep-water ocean surfaces by utilizing graphics processor as both simulation and rendering core. Our technique is mainly based on spectral approaches that generate a high-detailed height field using Fourier transform on a 2D regular grid. Additionally, we incorporate Gerstner model and generate low-detailed height field on a 2D projected grid in order to represent large waves and main structure of ocean surface. There is no interruption between CPU and GPU, and no need to transfer simulation results from the system memory to graphics hardware because the entire simulation and rending processes are done on graphics processor. As a result we can synthesize and render realistic water surfaces in real-time. Proposed techniques are readily adoptable to real-time applications such as computer games that have heavy work load on CPU but still demand plausible natural scenes.

Implementation of Pedestrian Detection and Tracking with GPU at Night-time (GPU를 이용한 야간 보행자 검출과 추적 시스템 구현)

  • Choi, Beom-Joon;Yoon, Byung-Woo;Song, Jong-Kwan;Park, Jangsik
    • Journal of Broadcast Engineering
    • /
    • v.20 no.3
    • /
    • pp.421-429
    • /
    • 2015
  • This paper is about an approach for pedestrian detection and tracking with infrared imagery. We used the CUDA(Computer Unified Device Architecture) that is a parallel processing language in order to improve the speed of video-based pedestrian detection and tracking. The detection phase is performed by Adaboost algorithm based on Haar-like features. Adaboost classifier is trained with datasets generated from infrared images. After detecting the pedestrian with the Adaboost classifier, we proposed a particle filter tracking strategies on HSV histogram feature that exploit adaptively at the same time. The proposed approach is implemented on an NVIDIA Jetson TK1 developer board that is full-featured device ideal for software development within the Linux environment. In this paper, we presented the results of parallel processing with the NVIDIA GPU on the CUDA development environment for detection and tracking of pedestrians. We compared the object detection and tracking processing time for night-time images on both GPU and CPU. The result showed that the detection and tracking speed of the pedestrian with GPU is approximately 6 times faster than that for CPU.

A Real-time Copper Foil Inspection System using Multi-thread (다중 스레드를 이용한 실시간 동판 검사 시스템)

  • Lee Chae-Kwang;Choi Dong-Hyuk
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.6
    • /
    • pp.499-506
    • /
    • 2004
  • The copper foil surface inspection system is necessary for the factory automation and product quality. The developed system is composed of the high speed line scan camera, the image capture board and the processing computer. For the system resource utilization and real-time processing, multi-threaded architecture is introduced. There are one image capture thread, 2 or more defect detection threads, and one defect communication thread. To process the high-speed input image data, the I/O overlap is used through the double buffering. The defect is first detected by the predetermined threshold. To cope with the light irregularity, the compensation process is applied. After defect detection, defect type is classified with the defect width, eigenvalue ratio of the defect covariance matrix and gray level of defect. In experiment, for high-speed input image data, real-time processing is possible with multi -threaded architecture, and the 89.4% of the total 141 defects correctly classified.

A Study on Efficient User Management System of Combat System

  • Hee-Soo Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.7
    • /
    • pp.191-198
    • /
    • 2024
  • In this paper, we proposes a user management system for efficient operation of the combat system within naval ship. Recently, naval ships have seen performance enhancements through various sensors, features, and continuous system development. This progress in the system has led to an increase in multi-funstion consoles that can manipulate various sensors and features within naval ship, consequently increasing the number of operators for these consoles. Therefore, a user management system that can control and manage multi-function consoles and operators in real-time is necessary for efficient management within naval ship. This paper suggests a user management system that can effectively manage the real-time situation of users accessing multi-function consoles. Additionally, a parallelization method using GPUs to reduce the CPU workload in operating various functions of the combat system is proposed. The proposed user management system has shown a performance improvement where the response time decreased by approximately 82% and the occupancy reduced by approximately 20% compared to the method using CPUs.

QoS-Aware Power Management of Mobile Games with High-Load Threads (CPU 부하가 큰 쓰레드를 가진 모바일 게임에서 QoS를 고려한 전력관리 기법)

  • Kim, Minsung;Kim, Jihong
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.328-333
    • /
    • 2017
  • Mobile game apps, which are popular in various mobile devices, tend to be power-hungry and rapidly drain the device's battery. Since a long battery lifetime is a key design requirement of mobile devices, reducing the power consumption of mobile game apps has become an important research topic. In this paper, we investigate the power consumption characteristics of popular mobile games with multiple threads, focusing on the inter-thread. From our power measurement study of popular mobile game apps, we observed that some of these apps have abnormally high-load threads that barely affect the user's gaming experience, despite the high energy consumption. In order to reduce the wasted power from these abnormal threads, we propose a novel technique that detects such abnormal threads during run time and reduces their power consumption without degrading user experience. Our experimental results on an Android smartphone show that the proposed technique can reduce the energy consumption of mobile game apps by up to 58% without any negative impact on the user's gaming experience.

Efficient Task Distribution for Pig Monitoring Applications Using OpenCL (OpenCL을 이용한 돈사 감시 응용의 효율적인 태스크 분배)

  • Kim, Jinseong;Choi, Younchang;Kim, Jaehak;Chung, Yeonwoo;Chung, Yongwha;Park, Daihee;Kim, Hakjae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.10
    • /
    • pp.407-414
    • /
    • 2017
  • Pig monitoring applications consisting of many tasks can take advantage of inherent data parallelism and enable parallel processing using performance accelerators. In this paper, we propose a task distribution method for pig monitoring applications into a heterogenous computing platform consisting of a multicore-CPU and a manycore-GPU. That is, a parallel program written in OpenCL is developed, and then the most suitable processor is determined based on the measured execution time of each task. The proposed method is simple but very effective, and can be applied to parallelize other applications consisting of many tasks on a heterogeneous computing platform consisting of a CPU and a GPU. Experimental results show that the performance of the proposed task distribution method on three different heterogeneous computing platforms can improve the performance of the typical GPU-only method where every tasks are executed on a deviceGPU by a factor of 1.5, 8.7 and 2.7, respectively.

Applying TIPC Protocol for Increasing Network Performance in Hadoop-based Distributed Computing Environment (Hadoop 기반 분산 컴퓨팅 환경에서 네트워크 I/O의 성능개선을 위한 TIPC의 적용과 분석)

  • Yoo, Dae-Hyun;Chung, Sang-Hwa;Kim, Tae-Hun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.5
    • /
    • pp.351-359
    • /
    • 2009
  • Recently with increase of data in the Internet, platform technologies that can process huge data effectively such as Google platform and Hadoop are regarded as worthy of notice. In this kind of platform, there exist network I/O overheads to send task outputs due to the MapReduce operation which is a programming model to support parallel computation in the large cluster system. In this paper, we suggest applying of TIPC (Transparent Inter-Process Communication) protocol for reducing network I/O overheads and increasing network performance in the distributed computing environments. TIPC has a lightweight protocol stack and it spends relatively less CPU time than TCP because of its simple connection establishment and logical addressing. In this paper, we analyze main features of the Hadoop-based distributed computing system, and we build an experimental model which can be used for experiments to compare the performance of various protocols. In the experimental result, TIPC has a higher bandwidth and lower CPU overheads than other protocols.