• 제목/요약/키워드: Data Parallelism

검색결과 188건 처리시간 0.02초

A Study on Effect of Code Distribution and Data Replication for Multicore Computing Architectures

  • Cho, Doosan
    • International Journal of Advanced Culture Technology
    • /
    • 제9권4호
    • /
    • pp.282-287
    • /
    • 2021
  • A multicore system must be able to take full advantage of the program's instruction and data parallelism. This study introduces the data replication technique as a support technique to maximize the program's instruction and data parallelism. Instruction level parallelism can be limited by data dependency. In this case, if data is replicated to each processor core and used, instruction level parallelism can be used to the maximum. The technique proposed in this study can maximize the performance improvement effect when applied to scientific applications such as matrix multiplication operation.

딥러닝 모델 병렬 처리 (Deep Learning Model Parallelism)

  • 박유미;안신영;임은지;최용석;우영춘;최완
    • 전자통신동향분석
    • /
    • 제33권4호
    • /
    • pp.1-13
    • /
    • 2018
  • Deep learning (DL) models have been widely applied to AI applications such image recognition and language translation with big data. Recently, DL models have becomes larger and more complicated, and have merged together. For the accelerated training of a large-scale deep learning model, model parallelism that partitions the model parameters for non-shared parallel access and updates across multiple machines was provided by a few distributed deep learning frameworks. Model parallelism as a training acceleration method, however, is not as commonly used as data parallelism owing to the difficulty of efficient model parallelism. This paper provides a comprehensive survey of the state of the art in model parallelism by comparing the implementation technologies in several deep learning frameworks that support model parallelism, and suggests a future research directions for improving model parallelism technology.

다중스레드 데이타 병렬 프로그램의 표현 : PCFG(Parallel Control Flow Graph) (A Representation for Multithreaded Data-parallel Programs : PCFG(Parallel Control Flow Graph))

  • 김정환
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제29권12호
    • /
    • pp.655-664
    • /
    • 2002
  • 데이타 병렬 모델은 대규모 병렬성을 용이하게 얻을 수 있는 장점이 있지만, 데이타 분산으로 인한 통신 지연시간은 상당한 부담이 된다. 본 논문에서는 데이타 병렬 프로그램에 내재되어 있는 태스크 병렬성을 추출하여 이러한 통신 지연시간을 감추는데 이용할 수 있음을 보인다. 기존의 태스크 병렬성 추출은 데이타 병렬성을 고려하지 않았지만, 여기서는 데이타 병렬성을 그대로 유지하면서 태스크 병렬성을 활용하는 방법에 대해 설명한다. 데이타 병렬 루프를 포함할 수 있는 다수의 태스크 스레드들로 구성된 다중스레드 프로그램을 표현하기 위해 본 논문에서는 PCFG(Parallel Control Flow Graph)라는 표현 형태를 제안한다. PCFG는 단일 스레드인 원시 데이타 병렬 프로그램으로부터 HDG(Hierarchical Dependence Graph)를 통해 생성될 수 있으며, 또한 PCFG로부터 다중스레드 코드를 쉽게 생성할 수 있다.

추상해석법을 이용한 논리언어의 AND-병렬 태스크 추출 기법 (Static Analysis of AND-parallelism in Logic Programs based on Abstract Interpretation)

  • Kim, Hiecheol;Lee, Yong-Doo
    • 한국산업정보학회:학술대회논문집
    • /
    • 한국산업정보학회 1997년도 추계학술대회 발표논문집:21세기를 향한 정보통신 기술의 전망
    • /
    • pp.79-89
    • /
    • 1997
  • Logic programming has many advantages as a paradigm for parallel programming because it offers ease of programming while retaining high expressive power due to its declarative semantics. In parallel logic programming, one of the important issues is the compile-time parallelism detection. Static data-dependency analysis has been widely used to gather some information needed for the detection of AND-parallelism. However, the static data-dependency analysis cannot fully detect AND-parallelism because it does not provide some necessary functions such as the propagation of groundness. As an alternative approach, abstract interpretation provides a promising way to deal with AND-parallelism detection, while a full-blown abstract interpretation is not efficient in terms of computation since it inherently employs some complex operations not necessary for gathering the information on AND-parallelism. In this paper, we propose an abstract domain which can provide a precise and efficient way to use the abstract interpretation for the detection of AND-parallelism of logic programs.

  • PDF

병렬 계산을 위한 최대 병렬성 추출 방법 (Extracting Maximum Parallelism for Parallel Computing)

  • 박두순
    • 컴퓨터교육학회논문지
    • /
    • 제8권1호
    • /
    • pp.93-103
    • /
    • 2005
  • 대부분의 프로그램 실행 시간은 루프 구조에서 소비되기 때문에 순차 루프 프로그램으로부터 병렬성을 추출하는 것은 프로그램을 빠르게 실행하는 데 필수적이다. 병렬성을 추출하기 위한 기존의 연구들은 주로 불변 자료 종속 거리에 초점을 맞추어왔다. 본 논문에서는 중첩 루프에서 자료 종속성을 제거하는 방법과 자료 종속성 제거 방법을 확장한 프로시저 호출을 가진 루프에서 병렬성을 추출하는 방법을 제안한다. 이 두 가지 방법들은 모두 자료 종속 거리에 관계없이 적용할 수 있다. 중첩 루프에서의 자료 종속성 제거 방법과 프로시저 호출을 가진 루프에서 병렬성을 추출하는 방법을 기존의 방법들과 CRAY-T3E에서 성능 평가를 하였다. 두 개의 방법 모두가 기존의 방법들보다 매우 우수함을 보였다.

  • PDF

Efficient Use of On-chip Memory through Profile-Driven Array Reorganization

  • Cho, Doosan;Youn, Jonghee
    • 대한임베디드공학회논문지
    • /
    • 제6권6호
    • /
    • pp.345-359
    • /
    • 2011
  • In high performance embedded systems, the use of multiple on-chip memories is an essential architectural feature for exploiting inherent parallelism in multimedia applications. This feature allows multiple data accesses to be executed in parallel. However, it remains difficult to effectively exploit of multiple on-chip memories. The successful use of this architecture strongly depends on how to efficiently detect and exploit memory parallelism in target applications. In this paper, we propose a technique based on a linear array access descriptor [1], which is generated from profiled data, to detect and exploit memory parallelism. The proposed technique tackles an array reorganization problem to maximize memory parallelism in multimedia applications. We present preliminary experiments applying the proposed technique onto a representative coarse grained reconfigurable array processor (CGRA) with multimedia kernel codes. Our experimental results demonstrate that our technique optimizes data placement by putting independent data on separate storage. The results exhibit 9.8% higher performance on average compared to the existing method.

TSK 퍼지 모델 이용한 효율적인 빅 데이터 PCP 예측 알고리즘 (An Efficient Algorithm for Big Data Prediction of Pipelining, Concurrency (PCP) and Parallelism based on TSK Fuzzy Model)

  • 김장영
    • 한국정보통신학회논문지
    • /
    • 제19권10호
    • /
    • pp.2301-2306
    • /
    • 2015
  • 정보가 급증함에 따라 큰 용량의 데이터를 전송해야 할 경우가 있다. 빅 데이터 전송 기술은 큰 용량의 데이터를 전송할 때 필요하다. 본 논문은 빅 데이터를 최적화된 속도로 전송하기 위해 GridFTP의 주된 기능인 PCP를 사용하며 또한 PCP 값을 예측하는 알고리즘을 개발한다. 또한, TSK 퍼지 모델을 적용하여 PCP에 따른 최적화된 전송률을 측정하는데 사용된다. 따라서, 제안된 TSK모델을 이용한 PCP 예측 알고리즘은 본 논문의 우수성을 입증한다.

최대 병렬성 추출을 위한 자료 종속성 제거 알고리즘 (A Data Dependency Elimination Algorithm for Extracting Maximum Parallelism)

  • 송월봉;박두순
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제26권1호
    • /
    • pp.139-139
    • /
    • 1999
  • In most application programs, loops usually comprise most of the computation in a program and the most important source of parallelism. When the data dependency relation is uniformin terms of distance, several compile time parallelization methods were introduced. On the otherhand,when the data dependency relation is non-uniform in distance, the compile time extraction ofparallelism is much complicated. In this paper, a general method the extracting parallelism in nestedloops is presented. This algorithm can be applicable where the dependency relation is both uniform andnon-uniform in distance. According to execution repeatedly the statements in nested loops, thealgorithm which effectively removes these kind of data dependencies is developed in order to presentthe total parallelization of nested loops.

데이터 중첩을 통한 페트리네트의 병렬 시뮬레이션 (Parallel Simulation of Bounded Petri Nets using Data Packing Scheme)

  • 김영찬;김탁곤
    • 한국시뮬레이션학회논문지
    • /
    • 제11권2호
    • /
    • pp.67-75
    • /
    • 2002
  • This paper proposes a parallel simulation algorithm for bounded Petri nets in a single processor, which exploits the SIMD(Single Instruction Multiple Data)-type parallelism. The proposed algorithm is based on a data packing scheme which packs multiple bytes data in a single register, thereby being manipulated simultaneously. The parallelism can reduce simulation time of bounded Petri nets in a single processor environment. The effectiveness of the algorithm is demonstrated by presenting speed-up of simulation time for two bounded Petri nets.

  • PDF

Limits on the efficiency of event-based algorithms for Monte Carlo neutron transport

  • Romano, Paul K.;Siegel, Andrew R.
    • Nuclear Engineering and Technology
    • /
    • 제49권6호
    • /
    • pp.1165-1171
    • /
    • 2017
  • The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, the vector speedup is also limited by differences in the execution time for events being carried out in a single event-iteration.