1 |
T. Tannenbaum, and M. Litzkow, 'Checkpointing and migration of Unix processes in the Condor distributed system,' D. Dobbs Journal, pp.40-48, Feb. 1995
|
2 |
K. M. Chandy, and L. Lamport, 'Distributed snapshots: Determining global states of distributed system,' ACM Trans. On Computer Systems, 3(1):pp.63-75, Feb. 1985
DOI
ScienceOn
|
3 |
J.S. Plank, 'Efficient Checkpointing on MIMD Architectures,' PhD. thesis, Princeton University, June 1993
|
4 |
M. Hayden, 'The Ensmble System,' Doctoral dissertation, Cornell University, Dept. Computer Sciences, 1997
|
5 |
G.F. Fagg, and J.J. Dongara, 'FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world,' EuroPVM/MPI User's Group Meeting 2000, Springer-Verilag, pp.346-353, 2000
|
6 |
A. Agbaria, and R. Friedman, 'Starfish: Faulttolerant Dynamic MPI programs on cluster of workstations,' Eighth IEEE International Symposium on High Performance Distributed Computing, 1999
|
7 |
MPI Forum, 'MPI: A message-passing interface standard,' International Journal of Supercomputer Applications, 8(3/4):pp,165-414, 1994
|
8 |
G. Bums, R. Daoud, and J. Vaigl, 'LAM: An open cluster environment for MPI,' In Proc. Of Supercomp. Symp., 1994
|
9 |
W. Gropp, E. Lusk, N. Doss, and A. Skjellurn, 'MPICH: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard,' Parallel computing, Vol. 22, No.6, pp.789-828, Sep 1996
DOI
ScienceOn
|
10 |
MPI Software Technology, Inc., 'MPI/Pro,' http://mpi-softtech.com/, 1999
|
11 |
G. Stellner, 'CoCheck: Checkpointing and Process Migration for MPI,' Proc. Of the International Parallel Processing Symposium, IEEE Computer Soc. Press, pp.526-531, 1996
DOI
|
12 |
Sriram Lorenzo Alvisi, and Harrick M., 'Egida: An Extensible Toolkit For Low-overhead Fault-Tolerance,' Symposium on Fault-Tolerant Computing, 1999
DOI
|
13 |
W. Gropp, S. Husss-Lederman, et aI., 'MPI-The Complete Reference, Vol-2, The MPI Extensions,' ISBN, MIT Press, 1998
|
14 |
Victor C. Zandy, Barton P. Miller, and Miron Livny, 'Process Hijacking,' The Eighth IEEE International Symposium on High Performance Distributed Computing (HPDC'99), pp.177-184, August 1999
|
15 |
J. S. Plank, M. Beck, G. Kingsley, and K. Li., 'Libckpt: Transparent Checkpointing under Unix,' In Usenix Winter 1995 Technical Conference, pp.213-223, January, 1995
|
16 |
David Baile, et al., 'The nas parallel benchmarks 2.0,' Technical Report, NSA-95-020 Ames Research Center, December 1995
|
17 |
Ian Foster, and Carl Kesselman, The Grid: Blueprint for a New Computing Infrastructure, MK Publications, 1999
|
18 |
Y. Chen, J. S. Plank, and Kai Li, 'CLIP: A Checkpointing Tool for Message-Passing Parallel Programs,' Proceedings of the ACM/IEEE conference on Supercomputing, 1997
DOI
|
19 |
George Bosilca, et al., 'MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes,' In Proceedings of SC2002. IEEE, 2002
DOI
|
20 |
Rajanikanth Batchu, et al., 'MPl/FT: Architecture and Taxonomies for Fault-Tolerant, MessagePassing Middle ware for Performance-Portable Parallel Computing,' 1st International Symposium on Cluster Computing and the Grid, 2001
|