DOI QR코드

DOI QR Code

Speculative Parallelism Characterization Profiling in General Purpose Computing Applications

  • Wang, Yaobin (Department of Computer Science and Technology, Southwest University of Science and Technology) ;
  • An, Hong (Department of Computer Science and Technology, University of Science and Technology of China) ;
  • Liu, Zhiqin (Department of Computer Science and Technology, Southwest University of Science and Technology) ;
  • Li, Li (Department of Computer Science and Technology, Southwest University of Science and Technology) ;
  • Yu, Liang (Department of Computer Science and Technology, Southwest University of Science and Technology) ;
  • Zhen, Yilu (Department of Computer Science and Technology, Southwest University of Science and Technology)
  • 투고 : 2014.04.12
  • 심사 : 2015.03.04
  • 발행 : 2015.03.30

초록

General purpose computing applications have not yet been thoroughly explored in procedure level speculation, especially in the light-weighted profiling way. This paper proposes a light-weighted profiling mechanism to analyze speculative parallelism characterization in several classic general purpose computing applications from SPEC CPU2000 benchmark. By comparing the key performance factors in loop and procedure-level speculation, it includes new findings on the behaviors of loop and procedure-level parallelism under these applications. The experimental results are as follows. The best gzip application can only achieve a 2.4X speedup in loop level speculation, while the best mcf application can achieve almost 3.5X speedup in procedure level. It proves that our light-weighted profiling method is also effective. It is found that between the loop-level and procedure-level TLS, the latter is better on several cases, which is against the conventional perception. It is especially shown in the applications where their 'hot' procedure body is concluded as 'hot' loops.

키워드

참고문헌

  1. T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar, "Speculative thread decomposition through empirical optimization," in Proceedings of the 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, San Jose, CA, 2007, pp. 205-214.
  2. C. Tian, M. Feng, and R. Gupta, "Speculative parallelization using state separation and multiple value prediction," ACM SIGPLAN Notices, vol. 45, no. 8, pp. 63-72, 2010. https://doi.org/10.1145/1932681.1863554
  3. A. Munir, S. Ranka, and A. Gordon-Ross, "High-performance energy-efficient multicore embedded computing," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 4, pp. 684-700, 2012. https://doi.org/10.1109/TPDS.2011.214
  4. D. Prountzos, R. Manevich, K. Pingali, and K. S. McKinley, "A shape analysis for optimizing parallel graph programs," ACM SIGPLAN Notices, vol. 46, no. 1, pp. 159-172, 2011. https://doi.org/10.1145/1925844.1926405
  5. A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August, "Speculative parallelization using software multithreaded transactions," ACM SIGARCH Computer Architecture News, vol. 38, no. 1, pp. 65-76, 2010. https://doi.org/10.1145/1735970.1736030
  6. M. K. Prabhu and K. Olukotun, "Exposing speculative thread parallelism in SPEC2000," in Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Chicago, IL, 2005, pp. 142-152.
  7. A. Kejariwal, X. Tian, W. Li, M. Girkar, S. Kozhukhov, H. Saito, et al, "On the performance potential of different types of speculative thread-level parallelism," in Proceedings of the 20th Annual International Conference on Supercomputing (ICS), Cairns, Australia, 2006, p. 24.
  8. K. Selvamani, and T. M. Taha, "Estimating critical region parallelism to guide platform retargeting," in Proceedings of the 43rd ACM Southeast Regional Conference, Kennesaw, GA, 2005, pp. 168-173.
  9. J. Auerbach, D. F. Bacon, I. Burcea, P. Cheng, S. J. Fink, R. Rabbah, and S. Shukla, "A compiler and runtime for heterogeneous computing," in Proceedings of the 49th Annual Design Automation Conference, San Francisco, CA, 2012, pp. 271-276.
  10. M. Samadi, A. Hormati, J. Lee, and S. Mahlke, "Paragon: collaborative speculative loop execution on GPU and CPU," in Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, London, UK, 2012, pp. 64-73.
  11. P. Yiapanis, D. Rosas-Ham, G. Brown, and M. Lujan, "Optimizing software runtime systems for speculative parallelization," ACM Transactions on Architecture and Code Optimization (TACO), vol. 9, no. 4, article no. 9, 2013.
  12. J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry, "The STAMPede approach to thread-level speculation," ACM Transactions on Computer Systems(TOCS), vol. 23, no. 3, pp. 253-300, 2005. https://doi.org/10.1145/1082469.1082471
  13. L. Hammond, B. A. Hubbert, M. Siu, M. K. Parbhu, M. Chen, and K. Qlukolun, "The Stanford Hydra CMP," IEEE Micro, vol. 20, no. 2, pp. 71-84, 2000. https://doi.org/10.1109/40.848474
  14. G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, "Multiscalar processors," in Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA'95), Barcelona, Spain, 1995, pp. 414-425.
  15. J. T. Oplinger, D. L. Heine, and M. S. Lam, "In search of speculative thread-level parallelism," in Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT'99), Newport Beach, CA, 1999, pp. 303-313.
  16. Z. H. Du, C. C. Lim, X. F. Li, C. Yang, Q. Zhao, and T. F. Ngai, "A cost-driven compilation framework for speculative parallelization of sequential programs," in Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, Washington DC, 2004, pp. 71-81.
  17. Y. Liu, H. An, B. Liang, and L. Wang, "An online profile guided optimization approach for speculative parallel threading," in Advances in Computer Systems Architecture, Lecture Notes in Computer Science vol. 4697, Heidelberg: Springer, pp. 28-39, 2007.
  18. Y. Wang, H. An, B. Liang, L. Wang, & R. Guo, "OpenPro: a dynamic profiling tool set for exploring thread-level speculation parallelism," in Proceedings of the International Conference on Computer and Electrical Engineering (ICCEE), Phuket Island, Thailand, 2008, pp. 256-260.

피인용 문헌

  1. GPU-SAM: Leveraging multi-GPU split-and-merge execution for system-wide real-time support vol.117, 2016, https://doi.org/10.1016/j.jss.2016.02.009