Search | Korea Science

Parallelizing Imperfectly Nested Loops

Kim, Ki-Chang
- Journal of Electrical Engineering and information Science
- /
- v.1 no.1
- /
- pp.140-150
- /
- 1996
Loops are some of the richest program constructs where parallelism is available. Exploiting fine-grain parallelizm out these constructs is particularly important in light of the growing popularity of superscalar and VLIW machines. This paper explains how the fine-grain parallelization techniques can be generalized to handle nested loops. Our technique integrates nested loop parallelization techniques at the fine-grain level, thus exposing more fine-grain parallelism, and is flexible enough to handle non-perfectly nested loops. Examples and some experimental results are presented to illustrate our approach.
PDF

Parallelism for Nested Loops with Simple Subscripts

Jeong, Sam-Jin
- International Journal of Contents
- /
- v.4 no.4
- /
- pp.1-6
- /
- 2008
In this paper, we propose improved loop splitting method for maximizing parallelism of single loops with non-constant dependence distances. By using the iteration and distance for the source of the first dependence, and by our defined theorems, we present generalized and optimal algorithms for single loops with non-uniform dependences (MPSL). By the extension of the MPSL method, we also apply to exploit parallelism from nested loops with simple subscripts, based on cycle shrinking and loop interchanging method. The algorithms generalize how to transform general single loops with non-uniform dependences as well as nested loops with simple subscripts into parallel loops.
https://doi.org/10.5392/IJoC.2008.4.4.001 인용 PDF

Study on Task Scheduling for Parallel Processing of Nested Loops (다중 루프문의 병렬처리를 위한 타스크 스케줄링에 관한 연구)

허정연;손윤구
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.29B no.1
- /
- pp.11-17
- /
- 1992
This paper is to propose an analytical queuing model for parallel processing of sequential program with nested loops. The analytical results are compared with the results from the implemented multiprocessor system composed of four intel 8088 microprocessor, eight 2KB shared common memories, and a hardware token ring. At results, this study shows that the processed results are almost similar in proposed analytical model and real system. Proposed analytical model can be applied to evaluate parallel processing of sequential program with nested loops.
PDF

Locality-Conscious Nested-Loops Parallelization

Parsa, Saeed;Hamzei, Mohammad
- ETRI Journal
- /
- v.36 no.1
- /
- pp.124-133
- /
- 2014
To speed up data-intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.
https://doi.org/10.4218/etrij.14.0113.0266 인용 PDF KSCI

Unfolding Nested Loops of Functional Languages for Multithreaded Architectures (다중스레드 구조를 위한 함수형 언어의 중첩루프 펼침)

하상호
- Journal of KIISE:Software and Applications
- /
- v.29 no.11
- /
- pp.826-836
- /
- 2002
We need an enormous amount of memories for name spaces as well as additional processors if we are to effectively exploit a massively parallelism in nested loops of functional languages such as Id. If there is no sufficient amount of memories enough to exploit that parallelism, the execution of programs can be aborted during the unfolding of loops. Additionally, if loops are overunfolded, compared with the number of processors available, the system performance can be degraded severely due to the overhead of loop unfolding. This paper suggests and analyzes an algorithm which can be used to effectively unfold nested loops of functional languages on multithreaded architectures. This algorithm has a feature to unfold a given nested loop safely and near optimally, considering the system resources of processors and memories available when the loop is to be unfolded.
PDF KSCI

Enhanced Region Partitioning Method of Non-perfect nested Loops with Non-uniform Dependences

Jeong Sam-Jin
- International Journal of Contents
- /
- v.1 no.1
- /
- pp.40-44
- /
- 2005
This paper introduces region partitioning method of non-perfect nested loops with non-uniform dependences. This kind of loop normally can't be parallelized by existing parallelizing compilers and transformations. Even when parallelized in rare instances, the performance is very poor. Based on the Convex Hull theory which has adequate information to handle non-uniform dependences, this paper proposes an enhanced region partitioning method which divides the iteration space into minimum parallel regions where all the iterations inside each parallel region can be executed in parallel by using variable renaming after copying.
PDF

The Study of the Method that to Choice Efficient Nested Loops Join Order and the Index Design (효율적인 Nested Loops Join을 위한 조인순서 선정 및 인덱스 구성에 관한 연구)

Liu, Chen;Yeo, Jeong-mo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.05a
- /
- pp.877-880
- /
- 2013
정보시스템의 기반이 되는 관계형 데이터베이스에서는 데이터의 양에 따라 성능 차이가 발생한다. 데이터베이스에 관한 여러 가지 기능에 대한 이해가 부족하여 많은 성능 저하 문제를 유발하는데, 그중에 조인 성능문제가 큰 비중을 차지하고 있다. 아주 드문 경우가 아니라면 대부분의 데이터 처리는 하나 이상의 테이블이 필요하기 때문이다. 조인을 정확하게 사용하면 성능 개선에 큰 이점을 가져 올 수 있다. 본 연구는 관계형 데이터베이스 기반의 가장 기본적인 조인방식인 Nested Loops Join 방식을 효율적으로 수행하기 위한 조인순서 선정 및 인덱스 구성에 관한 연구를 하였다. 연구 결과를 평가하기 위해서 SQL Trace을 추출한 후 성능을 비교함으로써 선정된 조인순서가 효율적인 것을 입증하였다. 또한 기존의 응답시간을 기준으로 성능평가방법보다 액세스한 데이터 블록 수를 기준으로 한 성능 평가방법이 더 근본적으로 조인 성능을 개선할 수 있음을 증명하였다. 차후에는 더 복잡한 조인 형태 및 다른 조인방식의 성능개선 방법에 관한 연구를 진행할 것이다.
https://doi.org/10.3745/PKIPS.y2013m05a.877 인용 PDF

A Data Dependency Elimination Algorithm for Extracting Maximum Parallelism (최대 병렬성 추출을 위한 자료 종속성 제거 알고리즘)

송월봉;박두순
- Journal of KIISE:Software and Applications
- /
- v.26 no.1
- /
- pp.139-139
- /
- 1999
In most application programs, loops usually comprise most of the computation in a program and the most important source of parallelism. When the data dependency relation is uniformin terms of distance, several compile time parallelization methods were introduced. On the otherhand,when the data dependency relation is non-uniform in distance, the compile time extraction ofparallelism is much complicated. In this paper, a general method the extracting parallelism in nestedloops is presented. This algorithm can be applicable where the dependency relation is both uniform andnon-uniform in distance. According to execution repeatedly the statements in nested loops, thealgorithm which effectively removes these kind of data dependencies is developed in order to presentthe total parallelization of nested loops.

A New Synchronization Scheme for Parallel Processing on Perfectly Nested Do Loops (완전 중첩 루프에서 병렬처리를 위한 새로운 동기화 기법)

이광형;황종선;박두순;김병수
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.31B no.10
- /
- pp.1-10
- /
- 1994
In most application programs, loops usually contain most of the computation in a program and are the most improtant source of parallelism. When loops are executed on multiprocessors, the cross iteration data dependences need to be enforced by synchronization between processors. In this paper, we propose a new synchronization scheme(Free/Hold) for reducing overgeads occured by synchronization variables in data oriented scheme and delay of time occured by synchronization instruction in statement oriented scheme. The Free/Hold mechanism enforces the correct execution order by inserting synchronization instruction between each instance with data dependence relationship using the RD(Real dependence Distance). We also present an algorithm for removing unnecessary dependences in one-to-many dependences.
PDF

An Improving Method of Restructuring Parallel Programs for Data Race Detection

Ha, Keum-Sook;Lee, Sung woo;Yoo, Kee-Young
- Proceedings of the IEEK Conference
- /
- 2000.07b
- /
- pp.715-718
- /
- 2000
Although shared memory parallel programs are designed to be deterministic both in their final results and intermediate states, the races that occur when different processes access a common memory location in an order not guaranteed by synchronization could result in unintended non-deterministic executions of the program. So, Detecting races, particularly first data races, is important for debugging explicit shared memory parallel programs. It is possible that all data races reported by other on-the-fly algorithms would disappear once the first races were removed. To detect races parallel programs with nested loops and inter-thread coordination, it must guarantee the order of synchronization operations in an execution instance. In this paper, we propose an improved restructuring method that guarantee ordering execution instance and preserve the semantics of original program. This method requires O(np) time and (s + up) space, where n is the number of total operations, s is the number of synchronization operations and p is the number of parallelism in the execution. Also, this method makes on-the-fly detection of parallel program with nested loops and inter-thread coordination more easily in space and time complexity.
PDF

Search Result 21, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)