• Title/Summary/Keyword: Parallel computing model

Search Result 171, Processing Time 0.025 seconds

Fundamental Function Design of Real-Time Unmanned Monitoring System Applying YOLOv5s on NVIDIA TX2TM AI Edge Computing Platform

  • LEE, SI HYUN
    • International journal of advanced smart convergence
    • /
    • v.11 no.2
    • /
    • pp.22-29
    • /
    • 2022
  • In this paper, for the purpose of designing an real-time unmanned monitoring system, the YOLOv5s (small) object detection model was applied on the NVIDIA TX2TM AI (Artificial Intelligence) edge computing platform in order to design the fundamental function of an unmanned monitoring system that can detect objects in real time. YOLOv5s was applied to the our real-time unmanned monitoring system based on the performance evaluation of object detection algorithms (for example, R-CNN, SSD, RetinaNet, and YOLOv5). In addition, the performance of the four YOLOv5 models (small, medium, large, and xlarge) was compared and evaluated. Furthermore, based on these results, the YOLOv5s model suitable for the design purpose of this paper was ported to the NVIDIA TX2TM AI edge computing system and it was confirmed that it operates normally. The real-time unmanned monitoring system designed as a result of the research can be applied to various application fields such as an security or monitoring system. Future research is to apply NMS (Non-Maximum Suppression) modification, model reconstruction, and parallel processing programming techniques using CUDA (Compute Unified Device Architecture) for the improvement of object detection speed and performance.

A STUDY ON THE EFFICIENCY OF AERODYNAMIC DESIGN OPTIMIZATION IN DISTRIBUTED COMPUTING ENVIRONMENT (분산컴퓨팅 환경에서 공력 설계최적화의 효율성 연구)

  • Kim Y.J.;Jung H.J.;Kim T.S.;Son C.H.;Joh C.Y.
    • Journal of computational fluids engineering
    • /
    • v.11 no.2 s.33
    • /
    • pp.19-24
    • /
    • 2006
  • A research to evaluate the efficiency of design optimization was carried out for aerodynamic design optimization problem in distributed computing environment. The aerodynamic analyses which take most of computational work during design optimization were divided into several jobs and allocated to associated PC clients through network. This is not a parallel process based on domain decomposition in a single analysis rather than a simultaneous distributed-analyses using network-distributed computers. GBOM(gradient-based optimization method), SAO(Sequential Approximate Optimization) and RSM(Response Surface Method) were implemented to perform design optimization of transonic airfoils and evaluate their efficiencies. dimensional minimization followed by direction search involved in the GBOM was found an obstacle against improving efficiency of the design process in the present distributed computing system. The SAO was found fairly suitable for the distributed computing environment even it has a handicap of local search. The RSM is apparently the most efficient algorithm in the present distributed computing environment, but additional trial and error works needed to enhance the reliability of the approximation model deteriorate its efficiency from the practical point of view.

Parallel Finite Element Analysis of the Drag of a Car under Road Condition

  • Choi H. G.;Kim B. J.;Kim S. W.;Yoo J. Y.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.84-85
    • /
    • 2003
  • A parallelized FEM code based on domain decomposition method has been recently developed for a large scale computational fluid dynamics. A 4-step splitting finite element algorithm is adopted for unsteady computation of the incompressible Navier-Stokes equation, and Smagorinsky LES(Large Eddy Simulation) model is chosen for turbulent flow computation. Both METIS and MPI library are used for domain partitioning and data communication between processors respectively. Tiburon of Hyundai-motor is chosen as the computational model at $Re=7.5{\times}10^{5}$, which is based on the car height. It is confirmed that the drag under road condition is smaller than that of wind tunnel condition.

  • PDF

Efficient Process Network Implementation of Ray-Tracing Application on Heterogeneous Multi-Core Systems

  • Jung, Hyeonseok;Yang, Hoeseok
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.289-293
    • /
    • 2016
  • As more mobile devices are equipped with multi-core CPUs and are required to execute many compute-intensive multimedia applications, it is important to optimize the systems, considering the underlying parallel hardware architecture. In this paper, we implement and optimize ray-tracing application tailored to a given mobile computing platform with multiple heterogeneous processing elements. In this paper, a lightweight ray-tracing application is specified and implemented in Kahn process network (KPN) model-of-computation, which is known to be suitable for the description of real-time applications. We take an open-source C/C++ implementation of ray-tracing and adapt it to KPN description in the Distributed Application Layer framework. Then, several possible configurations are evaluated in the target mobile computing platform (Exynos 5422), where eight heterogeneous ARM cores are integrated. We derive the optimal degree of parallelism and a suitable distribution of the replicated tasks tailored to the target architecture.

All Phase Discrete Sine Biorthogonal Transform and Its Application in JPEG-like Image Coding Using GPU

  • Shan, Rongyang;Zhou, Xiao;Wang, Chengyou;Jiang, Baochen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.9
    • /
    • pp.4467-4486
    • /
    • 2016
  • Discrete cosine transform (DCT) based JPEG standard significantly improves the coding efficiency of image compression, but it is unacceptable event in serious blocking artifacts at low bit rate and low efficiency of high-definition image. In the light of all phase digital filtering theory, this paper proposes a novel transform based on discrete sine transform (DST), which is called all phase discrete sine biorthogonal transform (APDSBT). Applying APDSBT to JPEG scheme, the blocking artifacts are reduced significantly. The reconstructed image of APDSBT-JPEG is better than that of DCT-JPEG in terms of objective quality and subjective effect. For improving the efficiency of JPEG coding, the structure of JPEG is analyzed. We analyze key factors in design and evaluation of JPEG compression on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model. Experimental results show that the maximum speedup ratio of parallel algorithm of APDSBT-JPEG can reach more than 100 times with a very low version GPU. Some new parallel strategies are illustrated in this paper for improving the performance of parallel algorithm. With the optimal strategy, the efficiency can be improved over 10%.

Resource Availability-based Multi Auction Model for Cloud Service Reservation and Resource Brokering System (자원 가용성 기반 다중 경매 모델을 이용한 서비스 예약형 클라우드 자원 거래 시스템)

  • Lee, Seok Woo;Kim, Tae Young;Lee, Jong Sik
    • Journal of the Korea Society for Simulation
    • /
    • v.23 no.1
    • /
    • pp.1-10
    • /
    • 2014
  • A cloud computing is one of a parallel and distributed computing. The cloud computing provides some service for user with virtual resources. However, a user's service request does not show a time pattern. As a result, each resource also shows a different availability at the same time. This difference affects a quality of service (QoS) and a resource selection for users. Therefore, we propose the resource availability-based multi auction model for cloud service reservation and resource brokering system. The proposed system is to select the proper resource provider based on the users' request. The proposal adopts the multi phase of the auction to transact resources. The system evaluates the available factor of each resource on the auction phase, and finally reserves the service on the adaptive queue. The proposed model shows the better performance than other existing method.

A STUDY ON THE EFFICIENCY OF AERODYNAMIC DESIGN OPTIMIZATION USING DISTRIBUTED COMPUTATION (분산컴퓨팅 환경에서 공력 설계최적화의 효율성 연구)

  • Kim Y.-J.;Jung H.-J.;Kim T.-S.;Joh C.-Y.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.163-167
    • /
    • 2005
  • A research to evaluate efficiency of design optimization was performed for aerodynamic design optimization problem in distributed computing environment. The aerodynamic analyses which take most of computational work during design optimization were divided into several jobs and allocated to associated PC clients through network. This is not a parallel process based on domain decomposition rather than a simultaneous distributed-analyses process using network-distributed computers. GBOM(gradient-based optimization method), SAO(Sequential Approximate Optimization) and RSM(Response Surface Method) were implemented to perform design optimization of transonic airfoil and to evaluate their efficiencies. One dimensional minimization followed by direction search involved in the GBOM was found an obstacle against improving efficiency of the design process in distributed computing environment. The SAO was found quite suitable for the distributed computing environment even it has a handicap of local search. The RSM is apparently the fittest for distributed computing environment, but additional trial and error works needed to enhance the reliability of the approximation model are annoying and time-consuming so that they often impair the automatic capability of design optimization and also deteriorate efficiency from the practical point of view.

  • PDF

An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines (동적 분산병렬 하둡시스템 및 분산추론기에 응용한 서버가상화 빅데이터 플랫폼)

  • Song, Dong Ho;Shin, Ji Ae;In, Yean Jin;Lee, Wan Gon;Lee, Kang Se
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1129-1139
    • /
    • 2015
  • Inference process generates additional triples from knowledge represented in RDF triples of semantic web technology. Tens of million of triples as an initial big data and the additionally inferred triples become a knowledge base for applications such as QA(question&answer) system. The inference engine requires more computing resources to process the triples generated while inferencing. The additional computing resources supplied by underlying resource pool in cloud computing can shorten the execution time. This paper addresses an algorithm to allocate the number of computing nodes "elastically" at runtime on Hadoop, depending on the size of knowledge data fed. The model proposed in this paper is composed of the layered architecture: the top layer for applications, the middle layer for distributed parallel inference engine to process the triples, and lower layer for elastic Hadoop and server visualization. System algorithms and test data are analyzed and discussed in this paper. The model hast the benefit that rich legacy Hadoop applications can be run faster on this system without any modification.

Disjunctive Process Patterns Refinement and Probability Extraction from Workflow Logs

  • Kim, Kyoungsook;Ham, Seonghun;Ahn, Hyun;Kim, Kwanghoon Pio
    • Journal of Internet Computing and Services
    • /
    • v.20 no.3
    • /
    • pp.85-92
    • /
    • 2019
  • In this paper, we extract the quantitative relation data of activities from the workflow event log file recorded in the XES standard format and connect them to rediscover the workflow process model. Extract the workflow process patterns and proportions with the rediscovered model. There are four types of control-flow elements that should be used to extract workflow process patterns and portions with log files: linear (sequential) routing, disjunctive (selective) routing, conjunctive (parallel) routing, and iterative routing patterns. In this paper, we focus on four of the factors, disjunctive routing, and conjunctive path. A framework implemented by the authors' research group extracts and arranges the activity data from the log and converts the iteration of duplicate relationships into a quantitative value. Also, for accurate analysis, a parallel process is recorded in the log file based on execution time, and algorithms for finding and eliminating information distortion are designed and implemented. With these refined data, we rediscover the workflow process model following the relationship between the activities. This series of experiments are conducted using the Large Bank Transaction Process Model provided by 4TU and visualizes the experiment process and results.

Setting Up of Parallel Cluster System and Reproduction of the Yellow Sea Tidal Hydrodynamics Using a FEM Model (병렬 클러스터 시스템 구축 및 유한요소모형을 이용한 황해 조석재현)

  • Suh, Seung-Won;Lee, Hwa-Young
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.19 no.1
    • /
    • pp.1-15
    • /
    • 2007
  • In this study 8 nodes parallel linux cluster system is constructed and tested for the evaluation of computational efficiency and reliability of the Yellow Sea tidal hydrodynamics prior to compute storm surge inundation along the west coast of the Korean Peninsular. Computational efficiency increases up to 7 times based on NPB bench-marking test. Simulated results by pADCIRC on reproduction of the Yellow Sea tidal hydrodynamics resemble well with previous studies. According to model parameter tests, bottom friction coefficient, which should be appropriately represented shallow depth along the west coast, is essential factor in simulation.