Browse > Article
http://dx.doi.org/10.4218/etrij.14.0113.0266

Locality-Conscious Nested-Loops Parallelization  

Parsa, Saeed (School of Computer Engineering, Iran University of Science and Technology)
Hamzei, Mohammad (School of Computer Engineering, Iran University of Science and Technology)
Publication Information
ETRI Journal / v.36, no.1, 2014 , pp. 124-133 More about this Journal
Abstract
To speed up data-intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.
Keywords
Automatic nested loops parallelization; data locality; loop tiling;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Lotfi and S. Parsa, "Parallel Loop Generation and Scheduling," J. Supercomput., vol. 50, no. 3, 2009, pp. 289-306.   DOI
2 M.E. Wolf and M.S. Lam, "A Data Locality Optimizing Algorithm," ACM SIGPLAN Notices, vol. 26, no. 6, 1991, pp. 30- 44.   DOI
3 Y. Song and Z. Li, "New Tiling Techniques to Improve Cache Temporal Locality," ACM SIGPLAN Notices, vol. 34, no. 5, 1999, pp. 215-228.   DOI
4 J. Xue and C-H. Huang, "Reuse-Driven Tiling for Improving Data Locality," Int. J. Parallel Programming, vol. 26, no. 6, 1998, pp. 671-696.   DOI
5 J. Ramanujam and P. Sadayappan, "Tiling Multidimensional Iteration Spaces for Multicomputers," J. Parallel Distrib. Comput., vol. 16, no. 2, Oct. 1992, pp. 108-120.   DOI
6 M.E. Wolf and M.S. Lam, "A Loop Transformation Theory and an Algorithm to Maximize Parallelism," IEEE Trans. Parallel Distrib. Syst., vol. 2, no. 4, Oct. 1991, pp. 452-471.   DOI   ScienceOn
7 A.W. Lim, G.I. Cheong, and M.S. Lam, "An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication," Proc. 13th Int. Conf. Supercomput., Rhodes, Greece, June 20-25, 1999, pp. 228-237.
8 P. Feautrier, "Some Efficient Solutions to the Affine Scheduling Problem, Part II: Multidimensional Time," Int. J. Parallel Programming, vol. 21, no. 6, 1992, pp. 389-420.   DOI
9 A. Cohen, S. Girbal, and O. Temam, "A Polyhedral Approach to Ease the Composition of Program Transformations," Euro-Par Parallel Process., 2004, pp. 292-303.
10 L.-N. Pouchet, Iterative Optimization in the Polyhedral Model, doctoral dissertation, University of Paris-Sud XI, France, 2010.
11 C. Bastoul, Extracting Polyhedral Representation from High Level Languages, Technical report, Paris-Sud University, 2008.
12 C. Bastoul, "Efficient Code Generation for Automatic Parallelization and Optimization," Proc. 2nd Int. Symp. Parallel Distrib. Comput., Oct. 13-14, 2003, pp. 23-30.
13 G. Chen and M. Kandemir, "Compiler-Directed Code Restructuring for Improving Performance of MPSoCs," IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 9, Sept. 2008, pp. 1201- 1214.   DOI
14 U. Bondhugula et al., "A Practical Automatic Polyhedral Parallelizer and Locality Optimizer," ACM SIGPLAN Notices, vol. 43, no. 6, 2008, pp. 101-113.
15 L.-N. Pouchet et al., "Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time," ACM SIGPLAN Notices, vol. 43, no. June 6, 2008, pp. 90-100.   DOI
16 M. Griebl, P. Faber, and C. Lengauer, "Space-Time Mapping and Tiling: A Helpful Combination," Concurrency Comput., Practice Experience, vol. 16, no. 2/3, Jan. 2004, pp. 221-246.   DOI
17 O. Ozturk, "Data Locality and Parallelism Optimization Using a Constraint-Based Approach," J. Parallel Distrib. Comput., vol. 71, no. 2, Feb. 2011, pp. 280-287.   DOI