• Title/Summary/Keyword: Parallelization method

Search Result 93, Processing Time 0.027 seconds

MPEG-I RVS Software Speed-up for Real-time Application (실시간 렌더링을 위한 MPEG-I RVS 가속화 기법)

  • Ahn, Heejune;Lee, Myeong-jin
    • Journal of Broadcast Engineering
    • /
    • v.25 no.5
    • /
    • pp.655-664
    • /
    • 2020
  • Free viewpoint image synthesis technology is one of the important technologies in the MPEG-I (Immersive) standard. RVS (Reference View Synthesizer) developed by MPEG-I and in use in MPEG group is a DIBR (Depth Information-Based Rendering) program that generates an image at a virtual (intermediate) viewpoint from multiple viewpoints' inputs. RVS uses the mesh surface method based on computer graphics, and outperforms the pixel-based ones by 2.5dB or more compared to the previous pixel method. Even though its OpenGL version provides 10 times speed up over the non OpenGL based one, it still shows a non-real-time processing speed, i.e., 0.75 fps on the two 2k resolution input images. In this paper, we analyze the internal of RVS implementation and modify its structure, achieving 34 times speed up, therefore, real-time performance (22-26 fps), through the 3 key improvements: 1) the reuse of OpenGL buffers and texture objects 2) the parallelization of file I/O and OpenGL execution 3) the parallelization of GPU shader program and buffer transfer.

Parallelization of Probabilistic RoadMap for Generating UAV Path on a DTED Map (DTED 맵에서 무인기 경로 생성을 위한 Probabilistic RoadMap 병렬화)

  • Noh, Geemoon;Park, Jihoon;Min, Chanoh;Lee, Daewoo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.50 no.3
    • /
    • pp.157-164
    • /
    • 2022
  • In this paper, we describe how to implement the mountainous terrain, radar, and air defense network for UAV path planning in a 3-D environment, and perform path planning and re-planning using the PRM algorithm, a sampling-based path planning algorithm. In the case of the original PRM algorithm, the calculation to check whether there is an obstacle between the nodes is performed 1:1 between nodes and is performed continuously, so the amount of calculation is greatly affected by the number of nodes or the linked distance between nodes. To improve this part, the proposed LineGridMask method simplifies the method of checking whether obstacles exist, and reduces the calculation time of the path planning through parallelization. Finally, comparing performance with existing PRM algorithms confirmed that computational time was reduced by up to 88% in path planning and up to 94% in re-planning.

The parallelization of binarization using a GP-GPU

  • Han, Seong Hyeon;Yoo, Suk Won
    • International Journal of Advanced Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.57-63
    • /
    • 2016
  • In this paper, we propose the optimized binarization in the GP-GPU. Because the binarinztion is esily paralledlized, we propose two ways of binary operations that utilize GP-GPU. The first method was to divide data load, subtraction and conversion, data store. The second method was processed collectibely. The second method was 2.52 times faster than the first method. After synthesizing the GP-GPU to the FPGA, the GP-GPU on the binarization were compared with the binarization on the ODROID XU. The binarization on the GP-GPU was 1.89 times faster than the binarization on the ODROID XU.

Parallelized Unstructured-Grid Finite Volume Method for Modeling Radiative Heat Transfer

  • Kim Gunhong;Kim Seokgwon;Kim Yongmo
    • Journal of Mechanical Science and Technology
    • /
    • v.19 no.4
    • /
    • pp.1006-1017
    • /
    • 2005
  • In this work, we developed an accurate and efficient radiative finite volume method applicable for the complex 2D planar and 3D geometries using an unstructured-grid finite volume method. The present numerical model has fully been validated by several benchmark cases including the radiative heat transfer in quadrilateral enclosure with isothermal medium, tetrahedral enclosure, a three-dimensional idealized furnace, as well as convection-coupled radiative heat transfer in a square enclosure. The numerical results for all cases are well agreed with the previous results. Special emphasis is given to the parallelization of the unstructured-grid radiative FVM using the domain decomposition approach. Numerical results indicate that the present parallel unstruc­tured-grid FVM has the good performance in terms of accuracy, geometric flexibility, and computational efficiency.

A Study on the Efficient m-step Parallel Generalization

  • Kim, Sun-Kyung
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2005.11a
    • /
    • pp.13-16
    • /
    • 2005
  • It would be desirable to have methods for specific problems, which have low communication costs compared to the computation costs, and in specific applications, algorithms need to be developed and mapped onto parallel computer architectures. Main memory access for shared memory system or global communication in message passing system deteriorate the computation speed. In this paper, it is found that the m-step generalization of the block Lanczos method enhances parallel properties by forming m simultaneous search direction vector blocks. QR factorization, which lowers the speed on parallel computers, is not necessary in the m-step block Lanczos method. The m-step method has the minimized synchronization points, which resulted in the minimized global communications compared to the standard methods.

  • PDF

Fast Random-Forest-Based Human Pose Estimation Using a Multi-scale and Cascade Approach

  • Chang, Ju Yong;Nam, Seung Woo
    • ETRI Journal
    • /
    • v.35 no.6
    • /
    • pp.949-959
    • /
    • 2013
  • Since the recent launch of Microsoft Xbox Kinect, research on 3D human pose estimation has attracted a lot of attention in the computer vision community. Kinect shows impressive estimation accuracy and real-time performance on massive graphics processing unit hardware. In this paper, we focus on further reducing the computation complexity of the existing state-of-the-art method to make the real-time 3D human pose estimation functionality applicable to devices with lower computing power. As a result, we propose two simple approaches to speed up the random-forest-based human pose estimation method. In the original algorithm, the random forest classifier is applied to all pixels of the segmented human depth image. We first use a multi-scale approach to reduce the number of such calculations. Second, the complexity of the random forest classification itself is decreased by the proposed cascade approach. Experiment results for real data show that our method is effective and works in real time (30 fps) without any parallelization efforts.

Scheduling and Load Balancing Methods of Multithread Parallel Linear Solver of Finite Element Structural Analysis (유한요소 구조해석 다중쓰레드 병렬 선형해법의 스케쥴링 및 부하 조절 기법 연구)

  • Kim, Min Ki;Kim, Seung Jo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.42 no.5
    • /
    • pp.361-367
    • /
    • 2014
  • In this paper, task scheduling and load balancing methods of multifrontal solution methods of finite element structural analysis in a modern multicore machine are introduced. Many structural analysis problems have generally irregular grid and many kinds of properties and materials. These irregularities and heterogeneities lead to bottleneck of parallelization and cause idle time to analysis. Therefore, task scheduling and load balancing are desired to reduce inefficiency. Several kinds of multithreaded parallelization methods are presented and comparison between static and dynamic task scheduling are shown. To reduce the idle time caused by irregular partitioned subdomains, computational load balancing methods, Balancing all tasks and minmax task pairing balancing, are invented. Theoretical and actual elapsed time are shown and the reason of their performance gap are discussed.

Statistical Characteristics and Complexity Analysis of HEVC Encoder Software (HEVC 부호화기 소프트웨어의 통계적 특성 및 복잡도 분석)

  • Ahn, Yongjo;Hwang, Taejin;Yoo, Sungeun;Han, Woo-Jin;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.17 no.6
    • /
    • pp.1091-1105
    • /
    • 2012
  • In this paper, we analyzed statistical characteristics and complexity of HEVC encoder as a leading research of acceleration, optimization and parallelization. Computational complexity of the HEVC encoder is approximately twice the compression performance compared to H.264/AVC. But, the increase of encoder complexity remains a problem to be solved in the future. Before performing the research on acceleration, optimization and parallelization to reduce high complexity of HEVC encoder, we measure the complexity each module for HEVC encoder using it's reference software HM 7.1. We also measured the predicted complexity of fast HEVC encoder software, used in real applications, using HM 7.1 applying fast encoding method. The complexity is measured in terms of the operating cycle of the encoder software under the common test sequences and conditions in the Windows PC environment. In addition, we analyze statistical characteristics of HEVC encoder software according to encoding structures and limitation using coded bitstreams.

Development and Evaluation of Parallel Beam Optic for X-ray (엑스선용 평행빔 광학소자 개발 및 평가)

  • Park, Byunghun;Cho, Hyungwook;Chon, Kwonsu
    • Journal of the Korean Society of Radiology
    • /
    • v.6 no.6
    • /
    • pp.477-481
    • /
    • 2012
  • An X-ray diffractometer which has various X-ray optics can give qualitative and quantitative information for a sample using a nondestructive analysis method. A parallel beam optic passes the parallel beam and removes divergent beam generated from an X-ray tube. The parallel beam optic used in the X-ray diffractometer was fabricated by wire cut and grading of stainless steel plates and was evaluated its performance using an X-ray imaging system. The measured parallelization of 6.6 mrad for the fabricated the parallel beam optic was a very close to the expected value of 6 mrad. An X-ray imaging technique for evaluating the parallel beam optics can estimate parallelization for each plate and can be used to other X-ray optics.

Parallelization of Genome Sequence Data Pre-Processing on Big Data and HPC Framework (빅데이터 및 고성능컴퓨팅 프레임워크를 활용한 유전체 데이터 전처리 과정의 병렬화)

  • Byun, Eun-Kyu;Kwak, Jae-Hyuck;Mun, Jihyeob
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.10
    • /
    • pp.231-238
    • /
    • 2019
  • Analyzing next-generation genome sequencing data in a conventional way using single server may take several tens of hours depending on the data size. However, in order to cope with emergency situations where the results need to be known within a few hours, it is required to improve the performance of a single genome analysis. In this paper, we propose a parallelized method for pre-processing genome sequence data which can reduce the analysis time by utilizing the big data technology and the highperformance computing cluster which is connected to the high-speed network and shares the parallel file system. For the reliability of analytical data, we have chosen a strategy to parallelize the existing analytical tools and algorithms to the new environment. Parallelized processing, data distribution, and parallel merging techniques have been developed and performance improvements have been confirmed through experiments.