Browse > Article
http://dx.doi.org/10.5139/JKSAS.2002.30.7.068

Acceleration of LU-SGS Code on Latest Microprocessors Considering the Increase of Level 2 Cache Hit-Rate  

Choi, J.Y. (부산대학교 항공우주공학과)
Oh, Se-Jong (부산대학교 항공우주공학과)
Publication Information
Journal of the Korean Society for Aeronautical & Space Sciences / v.30, no.7, 2002 , pp. 68-80 More about this Journal
Abstract
An approach for composing a performance optimized computational code is suggested for latest microprocessors. The concept of the code optimization, called here as localization, is maximizing the utilization of the second level cache that is common to all the latest computer system, and minimizing the access to system main memory. In this study, the localized optimization of LU-SGS (Lower-Upper Symmetric Gauss-Seidel) code for the solution of fluid dynamic equations was carried out in three different levels and tested for several different microprocessor architectures most widely used in these days. The test results of localized optimization showed a remarkable performance gain up to 7.35 times faster solution, depending on the system, than the baseline algorithm for producing exactly the same solution on the same computer system.
Keywords
Computer Code Optimization; Localization; LU-SGS (Lower-Upper Symmetric Gauss-Seidel) scheme; Microprocessors; Level 2 Cache; Cache Hit-Rate;
Citations & Related Records
연도 인용수 순위
  • Reference
1 http://www.netlib.org/atlas/index.html.
2 Intel Pentium 4 and Xeon Processor Optimization Reference Manual, Intel Corp., 1999-2001, http://developer.intel.com.
3 Anderson, E., et al., LAPACK Users' Guide Third Edition, SIAM 1999, Philadelphia, PA, http://www.netlib.org/lapack/index.html.
4 Patankar, S.V., Numerical Heat Transfer and Fluid Plow, Hemisphere, 1980.
5 Moore, G.E., "Cramming more components onto integrated circuits," Electronics, Vol.38, No. 8, April 19, 1965, http://www.intel.com/research/silicon /mooreslaw.htm.
6 Crandall, R.E., "PowerPC G4 for Engineering, Science, and Education," Apple Computer, Inc., Oct. 2000, http://www.apple .com/powermac/ pdf/PowerPC-G4velocityengine.pdf.
7 Johnson, J.J., "The AMD-$760^{TM}$ MPX Platform for the AMD -$Athlon^{TM}$ MP Processor," White Paper PID# 25787A, AMD Inc., Jan. 2002, http://www.amd.com/us-en/Processors/ Productlnformation/0?30_118_756_809,00.html.
8 Schreiber, R. and Dongarra, J., "Automatic Blocking of Nested Loops," University of Tennessee Computer Science Technical Report, CS-90-108, May 1990, http://www.netHb.org /utk/people/ JackDongarra/pdf/autoblock.pdf.
9 Dongarra, J. J., Du Croz, J., Duff, I. S. and Hammarling, S., "A Set of Level 3 Basic Linear Algebra Subprograms", ACM Trans. Math. Soft, 16 (1990), pp. 1-17, http://www.netlib.org /bias/index.html.   DOI
10 Intel Corp., "$Intel^{\circled R}$ 850 Chipset: 82850 Memory Controller Hub (MCH) Datasheet," Intel Document Number 290691-001, Nov. 2000, http://www.intel.com/design/chipsets/850/.
11 Intel Corp., "$Intel{\circled R}$ 845 Chipset: 82845 Memory Controller Hub (MCH) for SDR Datasheet," Intel Document Number 290725-002, Jan. 2002., http://www.intel.com/design/chip sets/845/.
12 Intel Architecture Optimization Reference Manual, Intel Corp., 1998-1999, http://developer .intel.com.
13 http://www.polyhedron.co.uk.
14 Tendler, J.M., Dodson, S., Fields, S., Le, H. and Sinharoy, B., "Power 4 System Micro architecture," IBM Corp., Oct. 2001, http:// www-l.ibm.com/servers/eserver/pseries/hardw are/whitepapers/power4.pdf.
15 Intel Corp., "The Xeon Processor MP Product Overview," Intel Corp., http://www .intel.com/ design/ Xeon/ xeonmp/prodbref/inde x.htm.
16 Yoon, S. and Jameson, A., "Lower-Upper Symmetric-Gauss-Seidel Method for the Euler and navier-Stokes Equations," AIAA Journal, Vol.26, No. 9, 1988, pp.1025-1026.   DOI   ScienceOn
17 Choi, J.-Y., Jeung, I.-S. and Yoon, Y., "Computational Fluid Dynamics Algorithms for Unsteady Shock-Induced Combustion, Part 1: Validation," AIAA Journal, Vol. 38, No. 7, July 2000, pp.1179-1187.   DOI   ScienceOn