• Title/Summary/Keyword: Model/Data Parallelism

Search Result 37, Processing Time 0.021 seconds

Deep Learning Model Parallelism (딥러닝 모델 병렬 처리)

  • Park, Y.M.;Ahn, S.Y.;Lim, E.J.;Choi, Y.S.;Woo, Y.C.;Choi, W.
    • Electronics and Telecommunications Trends
    • /
    • v.33 no.4
    • /
    • pp.1-13
    • /
    • 2018
  • Deep learning (DL) models have been widely applied to AI applications such image recognition and language translation with big data. Recently, DL models have becomes larger and more complicated, and have merged together. For the accelerated training of a large-scale deep learning model, model parallelism that partitions the model parameters for non-shared parallel access and updates across multiple machines was provided by a few distributed deep learning frameworks. Model parallelism as a training acceleration method, however, is not as commonly used as data parallelism owing to the difficulty of efficient model parallelism. This paper provides a comprehensive survey of the state of the art in model parallelism by comparing the implementation technologies in several deep learning frameworks that support model parallelism, and suggests a future research directions for improving model parallelism technology.

An Efficient Algorithm for Big Data Prediction of Pipelining, Concurrency (PCP) and Parallelism based on TSK Fuzzy Model (TSK 퍼지 모델 이용한 효율적인 빅 데이터 PCP 예측 알고리즘)

  • Kim, Jang-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.10
    • /
    • pp.2301-2306
    • /
    • 2015
  • The time to address the exabytes of data has come as the information age accelerates. Big data transfer technology is essential for processing large amounts of data. This paper posits to transfer big data in the optimal conditions by the proposed algorithm for predicting the optimal combination of Pipelining, Concurrency, and Parallelism (PCP), which are major functions of GridFTP. In addition, the author introduced a simple design process of Takagi-Sugeno-Kang (TSK) fuzzy model and designed a model for predicting transfer throughput with optimal combination of Pipelining, Concurrency and Parallelism. Hence, the author evaluated the model of the proposed algorithm and the TSK model to prove the superiority.

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data (빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계)

  • Kim, Do Gyun;Choi, Jin Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.4
    • /
    • pp.553-566
    • /
    • 2020
  • Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

Limits on the efficiency of event-based algorithms for Monte Carlo neutron transport

  • Romano, Paul K.;Siegel, Andrew R.
    • Nuclear Engineering and Technology
    • /
    • v.49 no.6
    • /
    • pp.1165-1171
    • /
    • 2017
  • The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, the vector speedup is also limited by differences in the execution time for events being carried out in a single event-iteration.

A Preliminary Architecture for a Data Flow Machine Model with Node Labelling (Node Label에 의한 기본적 Data Flow Machine 모델)

  • 김원섭;박희순
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.34 no.8
    • /
    • pp.301-307
    • /
    • 1985
  • The first four generations of computers are all based on a single basic design: the Von Neuman Processor, which is sequential and does one operation at a time. Efforts to develop concurrent or parallel computers have been carried on for many years. Data flow approach is significant in these efforts to make high speed parallel machines and expected a great deal of parallelism. In this paper we propose a preliminary data Flow Machine Model operating asynchronously on the base of Node Labelling. We introduce a concept of Node Labeling for this purpose which is relevant to the Data dependency and Parallelism. And we explain how the Node Tokens are fired in the proposed system.

  • PDF

A Matched Filter with Two Data Flow Paths for Searching Sychronization in DSSS (DSSS 동기탐색을 위한 이중 데이터 흐름 경로를 갖는 정합필터)

  • Song Myong-Lyol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1A
    • /
    • pp.99-106
    • /
    • 2004
  • In this Paper, the matched filter for searching initial synchronization in DSSS (direct sequence spread spectrum) receiver is studied. The matched filter with a single data flow path is described which can be presented by HDL (Hardware Description Language). In order to improve the processing time of operations for the filter, equations are arranged to represent two data flow paths and the associated hardware model is proposed. The model has an architecture based on parallelism and pipeline for fast processing, in which two data flow paths with a series of memory, multiplier and accumulator are placed in parallel. The performance of the model is analyzed and compared with the matched filter with a single data flow path.

Static forwardin: an approach to reduce data hazards in VLIW processor (정적 포워딩에 의한 VLIW 프로세서의 데이터 hazard 처리)

  • 박형준;김이섭
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.2
    • /
    • pp.1-9
    • /
    • 1998
  • To achieve high performance in VLIW processors, they must exploit the parallelism on application programs. Data dependency makes it difficult to find the instruction-level parallelism. Among the three kinds of data dependency, true dependency causes RAW(Read After Wirte) hazards that occur most frequently in VILW processors. Forwarding is a widely used technique to reduce the performance degradation caused by RAW hazards. However, forwarding requires too much area of the chip when it is applied to VLIW processors. In this paper, static forwarding is proposed to reduce the hardware cost of forwarding circuits. It needs an extended compiler to detect RAW hazards and control the proposed forwarding scheme via instruction. And it uses the modified register file to shrink the area of forwarding path. VLIW Processor Model is also designed to verify static forwarding. This paper describes the operation of static forwarding and the comparison with the conventional forwarding.

  • PDF

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

Design of Multiprocess Models for Parallel Protocol Implementation (병렬 프로토콜 구현을 위한 다중 프로세스 모델의 설계)

  • Choi, Sun-Wan;Chung, Kwang-Sue
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2544-2552
    • /
    • 1997
  • This paper presents three multiprocess models for parallel protocol implementation, that is, (1)channel communication model, (2)fork-join model, and (3)event polling model. For the specification of parallelism for each model, a parallel programming language, Par. C System, is used. to measure the performance of multiprocess models, we implemented the Internet Protocol Suite(IPS) Internet Protocol (IP) for each model by writing the parallel language on the Transputer. After decomposing the IP functions into two parts, that is, the sending side and the receiving side, the parallelism in both sides is exploited in the form of Multiple Instruction Single Data (MISD). Three models are evaluated and compared on the basis of various run-time overheads, such as an event sending via channels in the parallel channel communication model, process creating in the fork-join model and context switching in the event polling model, at the sending side and the receiving side. The event polling model has lower processing delays as about 77% and 9% in comparison with the channel communication model and the fork-join model at the sending side, respectively. At the receiving side, the fork-join model has lower processing delays as about 55% and 107% in comparison with the channel communication model and the event polling model, respectively.

  • PDF

Efficient Processing of Huge Airborne Laser Scanned Data Utilizing Parallel Computing and Virtual Grid (병렬처리와 가상격자를 이용한 대용량 항공 레이저 스캔 자료의 효율적인 처리)

  • Han, Soo-Hee;Heo, Joon;Lkhagva, Enkhbaatar
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.4
    • /
    • pp.21-26
    • /
    • 2008
  • A method for processing huge airborne laser scanned data using parallel computing and virtual grid is proposed and the method is tested by generating raster DSM(Digital Surface Model) with IDW(Inverse Distance Weighting). Parallelism is involved for fast interpolation of huge point data and virtual grid is adopted for enhancing searching efficiency of irregularly distributed point data. Processing time was checked for the method using cluster constituted of one master node and six slave nodes, resulting in efficiency near to 1 and load scalability property. Also large data which cannot be processed with a sole system was processed with cluster system.

  • PDF