• Title/Summary/Keyword: parallel processing

Search Result 2,118, Processing Time 0.028 seconds

A Performance Comparison of Parallel Programming Models on Edge Devices (엣지 디바이스에서의 병렬 프로그래밍 모델 성능 비교 연구)

  • Dukyun Nam
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.4
    • /
    • pp.165-172
    • /
    • 2023
  • Heterogeneous computing is a technology that utilizes different types of processors to perform parallel processing. It maximizes task processing and energy efficiency by leveraging various computing resources such as CPUs, GPUs, and FPGAs. On the other hand, edge computing has developed with IoT and 5G technologies. It is a distributed computing that utilizes computing resources close to clients, thereby offloading the central server. It has evolved to intelligent edge computing combined with artificial intelligence. Intelligent edge computing enables total data processing, such as context awareness, prediction, control, and simple processing for the data collected on the edge. If heterogeneous computing can be successfully applied in the edge, it is expected to maximize job processing efficiency while minimizing dependence on the central server. In this paper, experiments were conducted to verify the feasibility of various parallel programming models on high-end and low-end edge devices by using benchmark applications. We analyzed the performance of five parallel programming models on the Raspberry Pi 4 and Jetson Orin Nano as low-end and high-end devices, respectively. In the experiment, OpenACC showed the best performance on the low-end edge device and OpenSYCL on the high-end device due to the stability and optimization of system libraries.

Parallel Processing of K-means Clustering Algorithm for Unsupervised Classification of Large Satellite Imagery (대용량 위성영상의 무감독 분류를 위한 K-means 군집화 알고리즘의 병렬처리)

  • Han, Soohee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.3
    • /
    • pp.187-194
    • /
    • 2017
  • The present study introduces a method to parallelize k-means clustering algorithm for fast unsupervised classification of large satellite imagery. Known as a representative algorithm for unsupervised classification, k-means clustering is usually applied to a preprocessing step before supervised classification, but can show the evident advantages of parallel processing due to its high computational intensity and less human intervention. Parallel processing codes are developed by using multi-threading based on OpenMP. In experiments, a PC of 8 multi-core integrated CPU is involved. A 7 band and 30m resolution image from LANDSAT 8 OLI and a 8 band and 10m resolution image from Sentinel-2A are tested. Parallel processing has shown 6 time faster speed than sequential processing when using 10 classes. To check the consistency of parallel and sequential processing, centers, numbers of classified pixels of classes, classified images are mutually compared, resulting in the same results. The present study is meaningful because it has proved that performance of large satellite processing can be significantly improved by using parallel processing. And it is also revealed that it easy to implement parallel processing by using multi-threading based on OpenMP but it should be carefully designed to control the occurrence of false sharing.

High Throughput Parallel Decoding Method for H.264/AVC CAVLC

  • Yeo, Dong-Hoon;Shin, Hyun-Chul
    • ETRI Journal
    • /
    • v.31 no.5
    • /
    • pp.510-517
    • /
    • 2009
  • A high throughput parallel decoding method is developed for context-based adaptive variable length codes. In this paper, several new design ideas are devised and implemented for scalable parallel processing, a reduction in area, and a reduction in power requirements. First, simplified logical operations instead of memory lookups are used for parallel processing. Second, the codes are grouped based on their lengths for efficient logical operation. Third, up to M bits of the input stream can be analyzed simultaneously. For comparison, we designed a logical-operation-based parallel decoder for M=8 and a conventional parallel decoder. High-speed parallel decoding becomes possible with our method. In addition, for similar decoding rates (1.57 codes/cycle for M=8), our new approach uses 46% less chip area than the conventional method.

Parallel Scrambling Techniques for SDH and ATM Transmissions (SDH와 ATM 전송을 위한 병렬혼화 기법)

  • 김석창;이병기
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.8
    • /
    • pp.1146-1158
    • /
    • 1993
  • In this paper, parallel scrambling techniques are considered for practical use in the SDH transmission and the ATM transmission. In the ATM transmission, there are two ways of transmitting ATM cells - the SDH-based and the cell-based - and the corresponding scrambling techniques differ accordingly. For the SDH transmission and the SDH-based ATM transmission, the FSS (frame synchronous scrambling) is applied to the STM frames : while for the cell-based ATM trans-mission, the DSS(distributed sample scrambling) is used on the ATM cell stream. The parallel scrambling techniques are examined for the FSS and the DSS, and applied to achieve the parallel FSSs for use in the SDH and the SDH-based ATM transmission along with the parallel DSS applicable to the cell-based ATM transmission. The resulting(8, 4) PSRG(parallel shift resister generator) and (8, 16) PSRG based parallel scramblings are directly applicable for the STM-1 rate processing of the STM-4 and STM-16 scramblings, respectively. Likewise, the resulting (1, 8)PSRG and double-sampling-double-correction based parallel scrambling techniques can be practically used for a low-rate processing of the SDH-based and the cell-based ATM signal scrambling respectively.

  • PDF

AccessGrid Framework (액세스그리드 프레임워크)

  • Baek Jong-Kwun;Lee Tae-Dong;Jeong Chang-Sung
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06c
    • /
    • pp.214-216
    • /
    • 2006
  • 액세스그리드 프레임워크(AccessGrid Framework)는 지리적인 제한에 관계없이 가상적인 협업 환경을 제공하는 도구이다. 액세스그리드 프레임워크는 기존의 시스템이 갖추지 못한 사용자의 이동성 지원을 추가하고, 유비쿼터스 환경에 적절한 자동화 기능을 제공함으로써 액세스그리드 환경을 확장한다. 이들은 웹 서비스 기반의 기존 구현물인 액세스그리드 툴킷(AccessGrid Toolkit)을 적극적으로 활용하여 개발되었으며, 향후에 불안정한 종속성을 제거하여 개수될 예정이다.

  • PDF

Study on Task Scheduling for Parallel Processing of Nested Loops (다중 루프문의 병렬처리를 위한 타스크 스케줄링에 관한 연구)

  • 허정연;손윤구
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.1
    • /
    • pp.11-17
    • /
    • 1992
  • This paper is to propose an analytical queuing model for parallel processing of sequential program with nested loops. The analytical results are compared with the results from the implemented multiprocessor system composed of four intel 8088 microprocessor, eight 2KB shared common memories, and a hardware token ring. At results, this study shows that the processed results are almost similar in proposed analytical model and real system. Proposed analytical model can be applied to evaluate parallel processing of sequential program with nested loops.

  • PDF

Rescheduling on Parallel Machines with Compressible Processing Times (작업시간이 압축 가능한 경우 병렬기계의 재일정계획)

  • Kim, Suhwan
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.2
    • /
    • pp.47-55
    • /
    • 2015
  • This paper deals with rescheduling on unrelated parallel-machines with compressible processing times, assuming that the arrival of a set of new jobs triggers rescheduling. It formulates this rescheduling problem as an assignment problem with a side constraint and proposes a heuristic to solve it. Computational tests evaluate the efficacy of the heuristic.

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 픽셀 서브워드 병렬처리 명령어 구현)

  • Jung, Yong-Bum;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.99-108
    • /
    • 2011
  • Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.

On Parallel Implementation of Lagrangean Approximation Procedure (Lagrangean 근사과정의 병렬계산)

  • 이호창
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.18 no.3
    • /
    • pp.13-34
    • /
    • 1993
  • By operating on many part of a software system concurrently, the parallel processing computers may provide several orders of magnitude more computing power than traditional serial computers. If the Lagrangean approximation procedure is applied to a large scale manufacturing problem which is decomposable into many subproblems, the procedure is a perfect candidate for parallel processing. By distributing Lagrangean subproblems for given multiplier to multiple processors, concurrently running processors and modifying Lagrangean multipliers at the end of each iteration of a subgradient method,a parallel processing of a Lagrangean approximation procedure may provide a significant speedup. This purpose of this research is to investigate the potential of the parallelized Lagrangean approximation procedure (PLAP) for certain combinational optimization problems in manufacturing systems. The framework of a Plap is proposed for some combinatorial manufacturing problems which are decomposable into well-structured subproblems. The synchronous PLAP for the multistage dynamic lot-sizing problem is implemented on a parallel computer Alliant FX/4 and its computational experience is reported as a promising application of vector-concurrent computing.

  • PDF

Parallel Computing Environment for R with on Supercomputer Systems (빅데이터 분석을 위한 슈퍼컴퓨터 환경에서 R의 병렬처리)

  • Lee, Sang Yeol;Won, Joong Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.4
    • /
    • pp.19-31
    • /
    • 2014
  • We study parallel processing techniques for the R programming language of high performance computing technology. In this study, we used massively parallel computing system which has 25,408 cpu cores. We conducted a performance evaluation of a distributed memory system using MPI and of a the shared memory system using OpenMP. Our findings are summarized as follows. First, For some particular algorithms, parallel processing is about 150 times faster than serial processing in R. Second, the distributed memory system gets faster as the number of nodes increases while shared memory system is limited in the improvement of performance, due to the limit of the number of cpus in a single system.