A Data Prefetching Scheme Exploiting the Grain Size in Parallel Programs using Data Arrays

Jung, In-Bum;Lee, Joon-Won;

Journal of KIISE:Computer Systems and Theory (한국정보과학회논문지:시스템및이론)

Volume 27 Issue 1
/
Pages.101-108
/
2000
/
1229-683X(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

A Data Prefetching Scheme Exploiting the Grain Size in Parallel Programs using Data Arrays

데이타 배열을 사용하는 병렬 프로그램에서 그레인 크기를 이용한 데이타 선인출 기법

정인범 (한국과학기술원 전산학과) ;
이준원 (한국과학기술원 전산학과)

Published : 2000.01.15

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The data prefetching scheme is an effective technique to reduce the main memory access latency by exploiting the overlap of processor computations with data accesses. However, if the prefetched data replicate the useful existing data in the cache memory and they are not being used in computations. performances of programs are aggravated. This phenomenon results from the lack of correct predictions for data being used in the future. When parallel programs exploit the data arrays for computations, the grain size is useful information for data prefetching scheme because it implies the range of data using in computations. Based on this information, we suggest a new data prefetching scheme exploited by the grain size of the parallel program. Simulation results show that the suggested prefetching scheme improves the performance of the simulated parallel programs due to the reduction of bus transactions as well as useful prefetching operations.

데이타 선인출 방법은 데이타 참조와 프로세서 계산의 중첩을 이용하여 주메모리 접근 지연시간을 줄여주는 효과적인 방법이다. 그러나 선인출된 데이타가 캐쉬 메모리에 있는 다른 유용한 데이타들을 대체시키거나 또한 선인출된 데이타가 사용되지 않는 무익한 선인출일 경우 프로그램의 성능은 저하된다. 이러한 현상은 향후 사용되는 데이타들에 대한 정확한 예측이 부족하므로 발생된다. 병렬 프로그램이 계산을 위하여 데이타 배열들을 사용할 때 그레인 크기는 향후 사용되는 데이타 지역의 범위를 나타내므로 데이타 선인출을 위한 유용한 정보이다. 이런 정보를 기반으로 본 논문에서는 병렬 프로그램의 그레인 크기를 이용한 새로운 데이타 선인출 방법을 제안한다. 모의시험에서 제안된 선인출 방법은 기존의 선인출 방법들보다 버스 트랜잭션을 감소시킬 뿐만 아니라 유용한 선인출의 증가로 시험된 병렬 프로그램들의 성능을 향상시킨다.

Keywords

References

A.Smith, 'Cache memories,' ACM Computing Surveys, vol.14,pp. 473-530, Sep. 1982 https://doi.org/10.1145/356887.356892
N. Jouppi, 'Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,' In Proceedings of the 17 th Annual International Symposium in Computer Architecture, pp.364-373, 1990 https://doi.org/10.1109/ISCA.1990.134547
J. Baer and T. Chen, 'An effective on-chip preloading scheme to reduce data access penalty,' In Proceedings of Supercomputing '91, pp.176-186, 1991 https://doi.org/10.1145/125826.125932
J. Baer and T.Chen, 'Reducing memory latency via non-blocking and prefetching caches,' In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.51-61, Oct. 1992 https://doi.org/10.1145/143365.143486
J. Fu and J. Patel, 'Data prefetching In multiprocessor vector cache memories,' In Proceedings of the 18th Annual International Symposium on Computer Architecture, pp.54-63, 1991
J. Fu, J. Patel and B. Janssens, 'Stride directed prefetching in scalar processors,' In Proceedings of the 25th International Symposium on Micro-achitecture, pp.102-110, 1992 https://doi.org/10.1145/144953.145006
F. Dahlgren, M. Dubois and P. Stenstrom, 'Fixed and Adaptive sequential prefetching in shared memory multiprocessors,' In Proceedings of the International Conference on Parallel Processing, pp.56-63,1993 https://doi.org/10.1109/ICPP.1993.92
A. Porterfield, 'Software methods for improvement of cache performance on supercomputer applications,' In Technical Report COMP TR-89-93, Rice University
E. Gornish, E. Granston and A. Veidenbaum, 'Compiler-directed data prefetching in multiprocessors with memory hierarchies,' In Proceedings of 1990 International Conference on Supercomputing, pp.354-368, 1990 https://doi.org/10.1145/77726.255176
T. Mowry and A. Gupta, 'Tolerating latency through software-controlled prefetching in shared-memory multiprocessors,' Journal of Parallel and Distributed Computing, Vol.12, no.2, pp.87-106, 1991 https://doi.org/10.1016/0743-7315(91)90014-Z
T. Mowry, M. Lam and A. Gupta, 'Design and evaluation of a compiler algorithm for prefetching,' In proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.62-73, 1992 https://doi.org/10.1145/143365.143488
J. E. Veenstra and R. J. Fowler, 'MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors,' In Proceeding of 2nd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp 201-207, Jan. 1994 https://doi.org/10.1109/MASCOT.1994.284422
J. Archibald and J-L. Baer, 'Cache Coherence Protocols : Evaluation Using a Multiprocessors Simulation Model,' ACM Transactions on Computer Systems, Vol. 4, No. 4, pp 273-298, Nov. 1986 https://doi.org/10.1145/6513.6514
S.C. Woo, M. Ohara, E. Torrie, J.P. Singh and A. Gupta, 'The SPLASH-2 Programs: Characterization and Methodological Considerations,' In Proceedings of the 22th Annual International Symposium on Computer Architecture, pp 24-25, June 1995
Vipin Kumar, Ananth Grama, Anshul Gupta and George Karypis, 'Introduction to Parallel Computing (Design and Analysis of Algorithms),' The Benjamin/Cummings Publishing Company, Inc., pp.169, pp.179, pp.380, 1994

Journal of KIISE:Computer Systems and Theory (한국정보과학회논문지:시스템및이론)

A Data Prefetching Scheme Exploiting the Grain Size in Parallel Programs using Data Arrays

데이타 배열을 사용하는 병렬 프로그램에서 그레인 크기를 이용한 데이타 선인출 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)