• Title/Summary/Keyword: Software parallelization

Search Result 35, Processing Time 0.028 seconds

Absorbing Boundary Conditions and Parallelization for Waveguide Electromagnetic Analysis Using Finite Element Method (유한요소법을 이용한 도파관 전자기 해석의 흡수경계조건 고찰 및 병렬화)

  • Park, Woobin;Kim, Moonseong;Lee, Woochan
    • Journal of Internet Computing and Services
    • /
    • v.23 no.3
    • /
    • pp.67-76
    • /
    • 2022
  • Power and signal transmission using electromagnetic waves are essential in modern times, and a guided structure is needed to transmit electromagnetic waves efficiently through the desired path. This paper performed an electromagnetic simulation using the in-house code for the 2-D/3-D waveguide using the finite element method. The accuracy of the analysis was verified by comparing it with the results of HFSS, a representative electromagnetic wave simulation software. In addition, the performance of the Absorbing Boundary Condition (ABC), which is essential to truncate the infinite computational domain for computational electromagnetics, was analyzed. Finally, the parallelization technique was applied to accelerate the simulation speed, demonstrating performance improvement.

A Data Dependency Elimination Algorithm for Extracting Maximum Parallelism (최대 병렬성 추출을 위한 자료 종속성 제거 알고리즘)

  • 송월봉;박두순
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.1
    • /
    • pp.139-139
    • /
    • 1999
  • In most application programs, loops usually comprise most of the computation in a program and the most important source of parallelism. When the data dependency relation is uniformin terms of distance, several compile time parallelization methods were introduced. On the otherhand,when the data dependency relation is non-uniform in distance, the compile time extraction ofparallelism is much complicated. In this paper, a general method the extracting parallelism in nestedloops is presented. This algorithm can be applicable where the dependency relation is both uniform andnon-uniform in distance. According to execution repeatedly the statements in nested loops, thealgorithm which effectively removes these kind of data dependencies is developed in order to presentthe total parallelization of nested loops.

High-Performance Computer-Generated Hologram by Optimized Implementation of Parallel GPGPUs

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Yoo, Ji-Sang;Kim, Dong-Wook
    • Journal of the Optical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.698-705
    • /
    • 2014
  • We propose a new development for calculating a computer-generated hologram (CGH) through the use of multiple general-purpose graphics processing units (GPGPUs). For optimization of the implementation, CGH parallelization, object point tiling, memory selection for object point, hologram tiling, CGMA (compute to global memory access) ratio by block size, and memory mapping were considered. The proposed CGH was equipped with a digital holographic video system consisting of a camera system for capturing images (object points) and CPU/GPGPU software (S/W) for various image processing activities. The proposed system can generate about 37 full HD holograms per second using about 6K object points.

DEVELOPMENT OF SUPERCOMPUTING APPLICATION TECHNOLOGY AND ITS ACHIEVEMENTS (슈퍼컴퓨팅 응용기술 개발 및 성과)

  • Kim, J.H.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2006.10a
    • /
    • pp.207-207
    • /
    • 2006
  • Hardware technologies for high-performance computing has been developing continuously. However, actual performance of software cannot keep up with the speed of development in hardware technologies, because hardware architectures become more and more complicated and hardware scales become larger. So, software technique to utilize high-performance computing systems more efficiently plays more important role in realizing high-performance computing for computational science. In this paper, the effort to enhance software performance on large and complex high-performance computing systems such as performance optimization and parallelization will be presented. Our effort to serve high-performance computational kernels such as high-performance sparse solvers and the achievements through this effort also will be introduced.

  • PDF

PARALLEL OPTIMAL CONTROL WITH MULTIPLE SHOOTING, CONSTRAINTS AGGREGATION AND ADJOINT METHODS

  • Jeon, Moon-Gu
    • Journal of applied mathematics & informatics
    • /
    • v.19 no.1_2
    • /
    • pp.215-229
    • /
    • 2005
  • In this paper, constraint aggregation is combined with the adjoint and multiple shooting strategies for optimal control of differential algebraic equations (DAE) systems. The approach retains the inherent parallelism of the conventional multiple shooting method, while also being much more efficient for large scale problems. Constraint aggregation is employed to reduce the number of nonlinear continuity constraints in each multiple shooting interval, and its derivatives are computed by the adjoint DAE solver DASPKADJOINT together with ADIFOR and TAMC, the automatic differentiation software for forward and reverse mode, respectively. Numerical experiments demonstrate the effectiveness of the approach.

Movie Recommendation System using Community Detection and Parallel Programming (커뮤니티 탐지 및 병렬 프로그래밍을 이용한 영화 추천 시스템)

  • Sadriddinov Ilkhomjon;Yixuan Yang;Sony Peng;Sophort Siet;Dae-Young Kim;Doo-Soon Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.389-391
    • /
    • 2023
  • In the era of Big Data, humanity is facing a huge overflow of information. To overcome such an obstacle, many new cutting-edge technologies are being introduced. The movie recommendation system is also one such technology. To date, many theoretical and practical kinds of research have been conducted. Our research also focuses on the movie recommendation system by implementing methods from Social Network Analysis(SNA) and Parallel Programming. We applied the Girvan-Newman algorithm to detect communities of users, and a future package to perform the parallelization. This approach not only tries to improve the accuracy of the system but also accelerates the execution time. To do our experiment, we used the MovieLense Dataset.

Parallelization of Recursive Functions for Recursive Data Structures (재귀적 자료구조에 대한 재귀 함수의 병렬화)

  • An, Jun-Seon;Han, Tae-Suk
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.12
    • /
    • pp.1542-1552
    • /
    • 1999
  • 자료 병렬성이란 자료 집합의 원소들에 대하여 동일한 작업을 동시에 수행하므로써 얻어지는 병렬성을 말한다. 함수형 언어에서 자료 집합에 대한 반복 수행은 재귀적 자료형에 대한 재귀 함수에 의하여 표현된다. 본 논문에서는 이러한 재귀 함수를 자료 병렬 프로그램으로 변환하기 위한 병렬화 방법을 제시한다. 생성되는 병렬 프로그램의 병렬 수행 구조로는 일반적인 형태의 재귀적 자료형에 대하여 정의되는 다형적인 자료 병렬 연산을 사용하여 트리, 리스트 등과 같은 일반적인 재귀적 자료 집합에 대한 자료 병렬 수행이 가능하도록 하였다. 재귀 함수의 병렬화를 위해서는, 함수를 이루는 각각의 계산들의 병렬성을 재귀 호출에 의해 존재하는 의존성에 기반하여 분류하고, 이에 기반하여 각각의 계산들에 대한 적절한 자료 병렬 연산을 사용하는 병렬 프로그램을 생성하였다.Abstract Data parallelism is obtained by applying the same operations to each element of a data collection. In functional languages, iterative computations on data collections are expressed by recursions on recursive data structures. We propose a parallelization method for data-parallel implementation of such recursive functions. We employ polytypic data-parallel primitives to represent the parallel execution structure of the object programs, which enables data parallel execution with general recursive data structures, such as trees and lists. To transform sequential programs to their parallelized versions, we propose a method to classify the types of parallelism in subexpressions, based on the dependencies of the recursive calls, and generate the data-parallel programs using data-parallel primitives appropriately.

Parallelization of Allocation Module for Scalability and Performance Improvement on Mesos Scheduler (Allocation Module 병렬화를 통한 Mesos 스케줄러의 확장성 및 성능 향상 기법)

  • Han, Ho-Dol;Oh, Sangyoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.139-142
    • /
    • 2015
  • 데이터 센터에서는 물리적인 규모 증가와는 달리 별도의 처리 없이는 분산처리 프레임워크가 동일한 클러스터 내에서 복수로 동작할 수 없어 전체 환경을 정적으로 분할하여 이들을 배치하는 것이 일반적이다. 그러나 최근 연구에서는 복수의 프레임워크를 한 클러스터 내에서 동작시킴으로써 클러스터의 활용률을 높이는 방향으로 이루어지고 있다. Mesos는 복수의 분산처리 프레임워크를 한 클러스터에서 동작시키기 위한 시스템 중 하나로 각 프레임워크 스케줄러의 스케줄링을 지원하는 단일 Allocation Module을 가진다. Allocation Module은 모든 Slave와 프레임워크 스케줄러들의 요청을 처리하는데, 시스템 규모가 커질수록 Allocation Module으로 집중되는 부하가 증가하여 이에 따른 할당 속도 저하로 정상적인 동작이 불가능해진다. 이 문제를 해결하기 위해 본 논문에서는 Mesos 시스템의 Allocation Module 병렬화를 제안한다. 제안 방식을 통해 Allocation Module의 부하를 분산함과 동시에 Head-of-line Blocking으로 인한 스케줄링 지연 문제를 해결할 수 있을 것이다.

Study of Parallelization Methods for Software based Real-time HEVC Encoder Implementation (소프트웨어 기반 실시간 HEVC 인코더 구현을 위한 병렬화 기법에 관한 연구)

  • Ahn, Yong-Jo;Hwang, Tae-Jin;Lee, Dongkyu;Kim, Sangmin;Oh, Seoung-Jun;Sim, Dong-Gyu
    • Journal of Broadcast Engineering
    • /
    • v.18 no.6
    • /
    • pp.835-849
    • /
    • 2013
  • Joint Collaborative Team on Video Coding (JCT-VC), which have founded ISO/IEC MPEG and ITU-T VCEG, has standardized High Efficiency Video Coding (HEVC). Standardization of HEVC has started with purpose of twice or more coding performance compared to H.264/AVC. However, flexible and hierarchical coding block and recursive coding structure are problems to overcome of HEVC standard. Many fast encoding algorithms for reducing computational complexity of HEVC encoder have been proposed. However, it is hard to implement a real-time HEVC encoder only with those fast encoding algorithms. In this paper, for implementation of software-based real-time HEVC encoder, data-level parallelism using SIMD instructions and CPU/GPU multi-threading methods are proposed. And we also proposed appropriate operations and functional modules to apply the proposed methods on HM 10.0 software. Evaluation of the proposed methods implemented on HM 10.0 software showed 20-30fps for $832{\times}480$ sequences and 5-10fps for $1920{\times}1080$ sequences, respectively.

MPEG-I RVS Software Speed-up for Real-time Application (실시간 렌더링을 위한 MPEG-I RVS 가속화 기법)

  • Ahn, Heejune;Lee, Myeong-jin
    • Journal of Broadcast Engineering
    • /
    • v.25 no.5
    • /
    • pp.655-664
    • /
    • 2020
  • Free viewpoint image synthesis technology is one of the important technologies in the MPEG-I (Immersive) standard. RVS (Reference View Synthesizer) developed by MPEG-I and in use in MPEG group is a DIBR (Depth Information-Based Rendering) program that generates an image at a virtual (intermediate) viewpoint from multiple viewpoints' inputs. RVS uses the mesh surface method based on computer graphics, and outperforms the pixel-based ones by 2.5dB or more compared to the previous pixel method. Even though its OpenGL version provides 10 times speed up over the non OpenGL based one, it still shows a non-real-time processing speed, i.e., 0.75 fps on the two 2k resolution input images. In this paper, we analyze the internal of RVS implementation and modify its structure, achieving 34 times speed up, therefore, real-time performance (22-26 fps), through the 3 key improvements: 1) the reuse of OpenGL buffers and texture objects 2) the parallelization of file I/O and OpenGL execution 3) the parallelization of GPU shader program and buffer transfer.