• 제목/요약/키워드: Model/Data Parallelism

검색결과 37건 처리시간 0.022초

딥러닝 모델 병렬 처리 (Deep Learning Model Parallelism)

  • 박유미;안신영;임은지;최용석;우영춘;최완
    • 전자통신동향분석
    • /
    • 제33권4호
    • /
    • pp.1-13
    • /
    • 2018
  • Deep learning (DL) models have been widely applied to AI applications such image recognition and language translation with big data. Recently, DL models have becomes larger and more complicated, and have merged together. For the accelerated training of a large-scale deep learning model, model parallelism that partitions the model parameters for non-shared parallel access and updates across multiple machines was provided by a few distributed deep learning frameworks. Model parallelism as a training acceleration method, however, is not as commonly used as data parallelism owing to the difficulty of efficient model parallelism. This paper provides a comprehensive survey of the state of the art in model parallelism by comparing the implementation technologies in several deep learning frameworks that support model parallelism, and suggests a future research directions for improving model parallelism technology.

TSK 퍼지 모델 이용한 효율적인 빅 데이터 PCP 예측 알고리즘 (An Efficient Algorithm for Big Data Prediction of Pipelining, Concurrency (PCP) and Parallelism based on TSK Fuzzy Model)

  • 김장영
    • 한국정보통신학회논문지
    • /
    • 제19권10호
    • /
    • pp.2301-2306
    • /
    • 2015
  • 정보가 급증함에 따라 큰 용량의 데이터를 전송해야 할 경우가 있다. 빅 데이터 전송 기술은 큰 용량의 데이터를 전송할 때 필요하다. 본 논문은 빅 데이터를 최적화된 속도로 전송하기 위해 GridFTP의 주된 기능인 PCP를 사용하며 또한 PCP 값을 예측하는 알고리즘을 개발한다. 또한, TSK 퍼지 모델을 적용하여 PCP에 따른 최적화된 전송률을 측정하는데 사용된다. 따라서, 제안된 TSK모델을 이용한 PCP 예측 알고리즘은 본 논문의 우수성을 입증한다.

빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계 (Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data)

  • 김도균;최진영
    • 품질경영학회지
    • /
    • 제48권4호
    • /
    • pp.553-566
    • /
    • 2020
  • Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

Limits on the efficiency of event-based algorithms for Monte Carlo neutron transport

  • Romano, Paul K.;Siegel, Andrew R.
    • Nuclear Engineering and Technology
    • /
    • 제49권6호
    • /
    • pp.1165-1171
    • /
    • 2017
  • The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, the vector speedup is also limited by differences in the execution time for events being carried out in a single event-iteration.

Node Label에 의한 기본적 Data Flow Machine 모델 (A Preliminary Architecture for a Data Flow Machine Model with Node Labelling)

  • 김원섭;박희순
    • 대한전기학회논문지
    • /
    • 제34권8호
    • /
    • pp.301-307
    • /
    • 1985
  • The first four generations of computers are all based on a single basic design: the Von Neuman Processor, which is sequential and does one operation at a time. Efforts to develop concurrent or parallel computers have been carried on for many years. Data flow approach is significant in these efforts to make high speed parallel machines and expected a great deal of parallelism. In this paper we propose a preliminary data Flow Machine Model operating asynchronously on the base of Node Labelling. We introduce a concept of Node Labeling for this purpose which is relevant to the Data dependency and Parallelism. And we explain how the Node Tokens are fired in the proposed system.

  • PDF

DSSS 동기탐색을 위한 이중 데이터 흐름 경로를 갖는 정합필터 (A Matched Filter with Two Data Flow Paths for Searching Sychronization in DSSS)

  • 송명렬
    • 한국통신학회논문지
    • /
    • 제29권1A호
    • /
    • pp.99-106
    • /
    • 2004
  • 본 논문에서는 DSSS (Direct Sequence Spread Spectrum) 수신기에서 초기동기 탐색에 사용될 수 있는 정합필터에 대해서 연구하였다. 하드웨어기술언어 (HDL)로 표현될 수 있는 단일 데이터 흐름 경로를 갖는 정합필터가 설명되었다. 필터 연산의 처리시간을 개선하기 위해 데이터의 흐름이 이중으로 표현될 수 있도록 식이 정리되고 이와 연관된 하드웨어 모델이 제시되었다. 제안된 모델은 고속 처리를 위해 병렬처리와 파이프라인을 기반으로 하고 일련의 메모리, 곱셈기, 누산기로 구성된 두 개의 데이터 흐름 경로가 평행하게 배열된 구조이다. 제안된 모델에 대해 성능을 분석하였고 단인 데이터 흐름 경로 구조의 정합필터와 비교하였다.

정적 포워딩에 의한 VLIW 프로세서의 데이터 hazard 처리 (Static forwardin: an approach to reduce data hazards in VLIW processor)

  • 박형준;김이섭
    • 전자공학회논문지C
    • /
    • 제35C권2호
    • /
    • pp.1-9
    • /
    • 1998
  • To achieve high performance in VLIW processors, they must exploit the parallelism on application programs. Data dependency makes it difficult to find the instruction-level parallelism. Among the three kinds of data dependency, true dependency causes RAW(Read After Wirte) hazards that occur most frequently in VILW processors. Forwarding is a widely used technique to reduce the performance degradation caused by RAW hazards. However, forwarding requires too much area of the chip when it is applied to VLIW processors. In this paper, static forwarding is proposed to reduce the hardware cost of forwarding circuits. It needs an extended compiler to detect RAW hazards and control the proposed forwarding scheme via instruction. And it uses the modified register file to shrink the area of forwarding path. VLIW Processor Model is also designed to verify static forwarding. This paper describes the operation of static forwarding and the comparison with the conventional forwarding.

  • PDF

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • 제43권5호
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

병렬 프로토콜 구현을 위한 다중 프로세스 모델의 설계 (Design of Multiprocess Models for Parallel Protocol Implementation)

  • 최선완;정광수
    • 한국정보처리학회논문지
    • /
    • 제4권10호
    • /
    • pp.2544-2552
    • /
    • 1997
  • 본 논문은 병렬 프로토콜 구현을 위해서 (1)채널통신 모델, (2)포크-조인 모델, (3)사건조회 모델이라 부르는 3 가지 유형의 다중 프로세스 모델을 제시한다. 각 모델에 대한 병렬화 사양을 위해서 병렬 프로그래밍 언어인 Par. C System을 사용한다. 제안한 다중 프로세스 모델의 성능을 측정하기 위하여 인터넷 프로토콜 스택의 Internet Protocol (IP)을 Transputer상에서 구현한다. IP 프로토콜 기능은 송신측과 수신측으로 분리하고 양측의 병렬화는 Multiple Instruction Single Data(MISD) 구조를 이용한다. 제안한 모델들은 다양한 실행시간 과부하에 대하여 성능 평가와 비교 분석을 한다. 즉, 채널통신 모델에서는 채널을 경유한 사건 송신, 포크-조인 모델에서는 프로세스 생성, 그리고 사건조회 모델에서는 프로세스간 문맥전환시에 발생하는 과부하를 송신측과 수신측에 대하여 성능을 분석한다. 송신측의 성능 측정 결과, 사건조회 모델이 채널통신 모델과 포크-조인 모델과 비교하여 77%와 9%의 빠른 처리 시간을 보였다. 수신측에서는 포크-조인 모델이 채널통신 모델과 사건조회 모델과 비교하여 55%와 107%의 빠른 처리 시간을 보였다.

  • PDF

병렬처리와 가상격자를 이용한 대용량 항공 레이저 스캔 자료의 효율적인 처리 (Efficient Processing of Huge Airborne Laser Scanned Data Utilizing Parallel Computing and Virtual Grid)

  • 한수희;허준;엥흐바타르
    • 한국공간정보시스템학회 논문지
    • /
    • 제10권4호
    • /
    • pp.21-26
    • /
    • 2008
  • 본 연구에서는 대용량의 항공 레이저 스캔 자료를 효율적으로 처리하기 병렬처리 기법과 가상격자 구조를 도입하였으며 제안한 방법의 실효성을 평가하기 위하여 IDW(Inverse Distance Weighting) 방식으로 정규격자 DSM을 생성하였다. 즉, 대용량 항공 레이저 스캔 자료의 신속한 보간을 위하여 병렬처리 기법을 이용하고 불규칙적으로 분포된 포인트의 검색 효율성을 제고하기 위하여 가상격자(virtual grid)를 사용하였다. 마스터 노드와 6대의 슬래이브 노드로 구성된 클러스터를 사용하여 처리 시간을 측정한 결과 노드의 수가 증가하더라도 1에 가까운 efficiency를 나타내었으며 load scalability의 특성도 만족시켰다. 또한 용량의 한계로 인하여 단일 시스템에서 처리할 수 없는 크기의 자료를 클러스터 시스템으로 처리할 수 있음을 확인하였다.

  • PDF