A Data Prefetching Scheme Exploiting the Grain Size in Parallel Programs using Data Arrays

데이타 배열을 사용하는 병렬 프로그램에서 그레인 크기를 이용한 데이타 선인출 기법

  • 정인범 (한국과학기술원 전산학과) ;
  • 이준원 (한국과학기술원 전산학과)
  • Published : 2000.01.15

Abstract

The data prefetching scheme is an effective technique to reduce the main memory access latency by exploiting the overlap of processor computations with data accesses. However, if the prefetched data replicate the useful existing data in the cache memory and they are not being used in computations. performances of programs are aggravated. This phenomenon results from the lack of correct predictions for data being used in the future. When parallel programs exploit the data arrays for computations, the grain size is useful information for data prefetching scheme because it implies the range of data using in computations. Based on this information, we suggest a new data prefetching scheme exploited by the grain size of the parallel program. Simulation results show that the suggested prefetching scheme improves the performance of the simulated parallel programs due to the reduction of bus transactions as well as useful prefetching operations.

데이타 선인출 방법은 데이타 참조와 프로세서 계산의 중첩을 이용하여 주메모리 접근 지연시간을 줄여주는 효과적인 방법이다. 그러나 선인출된 데이타가 캐쉬 메모리에 있는 다른 유용한 데이타들을 대체시키거나 또한 선인출된 데이타가 사용되지 않는 무익한 선인출일 경우 프로그램의 성능은 저하된다. 이러한 현상은 향후 사용되는 데이타들에 대한 정확한 예측이 부족하므로 발생된다. 병렬 프로그램이 계산을 위하여 데이타 배열들을 사용할 때 그레인 크기는 향후 사용되는 데이타 지역의 범위를 나타내므로 데이타 선인출을 위한 유용한 정보이다. 이런 정보를 기반으로 본 논문에서는 병렬 프로그램의 그레인 크기를 이용한 새로운 데이타 선인출 방법을 제안한다. 모의시험에서 제안된 선인출 방법은 기존의 선인출 방법들보다 버스 트랜잭션을 감소시킬 뿐만 아니라 유용한 선인출의 증가로 시험된 병렬 프로그램들의 성능을 향상시킨다.

Keywords

References

  1. A.Smith, 'Cache memories,' ACM Computing Surveys, vol.14,pp. 473-530, Sep. 1982 https://doi.org/10.1145/356887.356892
  2. N. Jouppi, 'Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,' In Proceedings of the 17 th Annual International Symposium in Computer Architecture, pp.364-373, 1990 https://doi.org/10.1109/ISCA.1990.134547
  3. J. Baer and T. Chen, 'An effective on-chip preloading scheme to reduce data access penalty,' In Proceedings of Supercomputing '91, pp.176-186, 1991 https://doi.org/10.1145/125826.125932
  4. J. Baer and T.Chen, 'Reducing memory latency via non-blocking and prefetching caches,' In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.51-61, Oct. 1992 https://doi.org/10.1145/143365.143486
  5. J. Fu and J. Patel, 'Data prefetching In multiprocessor vector cache memories,' In Proceedings of the 18th Annual International Symposium on Computer Architecture, pp.54-63, 1991
  6. J. Fu, J. Patel and B. Janssens, 'Stride directed prefetching in scalar processors,' In Proceedings of the 25th International Symposium on Micro-achitecture, pp.102-110, 1992 https://doi.org/10.1145/144953.145006
  7. F. Dahlgren, M. Dubois and P. Stenstrom, 'Fixed and Adaptive sequential prefetching in shared memory multiprocessors,' In Proceedings of the International Conference on Parallel Processing, pp.56-63,1993 https://doi.org/10.1109/ICPP.1993.92
  8. A. Porterfield, 'Software methods for improvement of cache performance on supercomputer applications,' In Technical Report COMP TR-89-93, Rice University
  9. E. Gornish, E. Granston and A. Veidenbaum, 'Compiler-directed data prefetching in multiprocessors with memory hierarchies,' In Proceedings of 1990 International Conference on Supercomputing, pp.354-368, 1990 https://doi.org/10.1145/77726.255176
  10. T. Mowry and A. Gupta, 'Tolerating latency through software-controlled prefetching in shared-memory multiprocessors,' Journal of Parallel and Distributed Computing, Vol.12, no.2, pp.87-106, 1991 https://doi.org/10.1016/0743-7315(91)90014-Z
  11. T. Mowry, M. Lam and A. Gupta, 'Design and evaluation of a compiler algorithm for prefetching,' In proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.62-73, 1992 https://doi.org/10.1145/143365.143488
  12. J. E. Veenstra and R. J. Fowler, 'MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors,' In Proceeding of 2nd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp 201-207, Jan. 1994 https://doi.org/10.1109/MASCOT.1994.284422
  13. J. Archibald and J-L. Baer, 'Cache Coherence Protocols : Evaluation Using a Multiprocessors Simulation Model,' ACM Transactions on Computer Systems, Vol. 4, No. 4, pp 273-298, Nov. 1986 https://doi.org/10.1145/6513.6514
  14. S.C. Woo, M. Ohara, E. Torrie, J.P. Singh and A. Gupta, 'The SPLASH-2 Programs: Characterization and Methodological Considerations,' In Proceedings of the 22th Annual International Symposium on Computer Architecture, pp 24-25, June 1995
  15. Vipin Kumar, Ananth Grama, Anshul Gupta and George Karypis, 'Introduction to Parallel Computing (Design and Analysis of Algorithms),' The Benjamin/Cummings Publishing Company, Inc., pp.169, pp.179, pp.380, 1994