참고문헌
- F. Cappello, Fault tolerance in petascale/exascale systems: Current knowledge, challenges and research opportunities, Int. J. High Perform. Comput. Appl. 23 (2009), 212-226. https://doi.org/10.1177/1094342009106189
- D. P. Chandrashekar, Robust and Fault-Tolerant Scheduling for Scientific Workflows in Cloud Computing Environments, Ph.D. dissertation, Dept. Computing and Inf. Syst., University of Melbourne, Melbourne, Australia, 2015.
- G. Aupy et al., Checkpointing strategies for scheduling computational workflows, Int. J. Network. Comput. 6 (2016), 2-26. https://doi.org/10.15803/ijnc.6.1_2
- R. N. Calheiros and R. Buyya, Meeting deadlines of scientific workflows in public clouds with tasks replication, IEEE Trans. Parallel Dist. Syst. 25 (2014), 1787-1796. https://doi.org/10.1109/TPDS.2013.238
- K. Plankensteiner and R. Prodan, Meeting soft deadlines in scientific workflows using resubmission impact, IEEE Trans. Parallel Dist. Syst. 23 (2012), 890-901. https://doi.org/10.1109/TPDS.2011.221
- M. T. Rahman et al., Check pointing to minimize completion time for inter-dependent parallel processes on volunteer grids, in Proc. IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput. (Cartagena, Colombia), May 16-19, 2016, pp. 331-335.
- J. W. Young, A first order approximation to the optimum check point interval, Commun. ACM 17 (1974), 530-531. https://doi.org/10.1145/361147.361115
- J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Gener. Comp. Syst. 22 (2006), 303-312. https://doi.org/10.1016/j.future.2004.11.016
- M.-S. Bouguerra et al., A flexible checkpoint, restart model in distributed systems, in Proc. Parallel Process. Appl. Math. (Wroclaw, Poland), Sept. 13-16 (2009), pp. 206-215.
- A. Benoit, M. Hakem, and Y. Robert, Fault tolerant scheduling of precedence task graphs on heterogeneous platforms, in Proc. Int. Symp. Parallel Distrib. (Miami, FL, USA), Apr. 14-18, 2008, pp. 1-8.
- H. Topcuoglu, S. Hariri, and M.-Y. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Trans. Parallel Dist. Syst. 13 (2002), 260-274. https://doi.org/10.1109/71.993206
- S. K. Jayadivya, J. S. Nirmala, and M. S. S. Bhanu, Fault tolerant workflow scheduling based on replication and resubmission of tasks in cloud computing, Int. J. Comput. Sci. Eng. 4 (2012), 996-1006.
- R. Sirvent, R. M. Badia, and J. Labarta, Graph-based Task Replication for Workflow Applications, in Proc. IEEE Int. Conf. High Performance Comput. Commun. (Seoul, Rep. of Korea), June 25-27, 2009, pp. 20-28.
- S. Abrishami, M. Naghibzadeh, and D. H. J. Epema, Deadline constrained workflow scheduling algorithms for infrastructure as a service clouds, Future Gener. Comp. Syst. 29 (2013), 158-169. https://doi.org/10.1016/j.future.2012.05.004
- L. Zhao, Y. Ren, and K. Sakurai, Reliable workflow scheduling with less resource redundancy, Parallel Comput. 39 (2013), 567-585. https://doi.org/10.1016/j.parco.2013.06.003
- M. Wieczorek, R. Prodan, and A. Hoheisel, Taxonomies of the multi-criteria grid workflow scheduling problem, Institute on Resource Management and Scheduling, Innsbruck, Austria, Core-GRID Tech. Rep. TR-0106, Aug. 2007.
- Y. Zhang et al., Combined fault tolerance and scheduling techniques for workflow applications on computational grids, in Proc. IEEE/ACM Int. Symp. Clust. Comput. Grid (Shanghai, China), May 18-21, 2009, pp. 244-251.
- A. Benoit et al., Combining checkpointing and replication for reliable execution of linear workflows, In Proc. IEEE Int. Paallel Distrib. Process. Symp. Workshops (Vancouver, Canada), May 2018, pp. 793-802.
- A. Benoit et al., Optimal check- pointing period with replicated execution on heterogeneous platforms, in Proc. Workshop FTXS@ HPDC (Washington, DC, USA), June 26-27, 2017, pp. 9-16.
- M. Chtepen et al., Adaptive task checkpointing and replication: toward efficient fault-tolerant grids, IEEE Trans. Parallel Distrib. Syst. 20 (2009), 180-190. https://doi.org/10.1109/TPDS.2008.93
- J. Daly, A model for predicting the optimum checkpoint interval for restart dumps, in Proc. Int. Conf. Comput. Sci. (Melbourne, Australia), June 2-4, 2003, pp. 3-12.
- S. Sadi and B. Yagoubi, Communication-aware approaches for transparent checkpointing in cloud computing, Scalable Comput.: Practice Experience 17 (2016), 251-270.
- M. Bougeret et al., Checkpointing strategies for parallel jobs, in Proc. Int. conf. Hight Performance Comput. Netw. Storage Anal. (Seattle, WA, USA), Nov. 12-18, 2011, pp. 1-11.
- G. Aupy and J. Herrmann, Periodicity in optimal hierarchical checkpointing schemes for adjoint computations, Optim Methods Softw. 32, (2017), 594-624. https://doi.org/10.1080/10556788.2016.1230612
- H. Nguyen et al., An execution environment for robust parallel computing on volunteer PC Grids, in Proc. Int. Conf. Parallel Process. (Pittsburgh, PA, USA), Sept. 10-13, 2012, pp. 158-167.
- G. Aupyet al., On the Combination of Silent Error Detection and Check-pointing, in Proc. IEEE Pacific Rim Int. Symp. Dependable Comput. (Vancouver, Canada), Dec. 2-4, 2013, pp. 11-20.
- A. Benoit et al., Two-level check-pointing and verifications for linear task graphs, in Proc. IEEE Int. Parallel Distrib. Process. Symp. (Chicago, IL, USA), May 23-27, 2016, pp. 1239-1248.
- A. Benoit et al., Multi-level check- pointing and silent error detection for linear workflows, J. Comput. Sci. 28 (2018), 398-415. https://doi.org/10.1016/j.jocs.2017.03.024
- L. Han et al., A generic approach to scheduling and checkpointing workflows, in Proc. Int. Conf. Parallel Process. (Eugene, OR, USA), July 29-Aug. 3, 2018, pp. 1-10.
- S. Sadi and B. Yagoubi, On the optimum checkpointing interval selection for variable size checkpoint dumps, in Proc. Int. Conf. Comput. Sci. Applicat. (Saida, Algeria), May 20-21, 2015, pp. 599-610.
- G. Juve et al., Characterizing and profiling scientific workflows, Future Gener. Comput. Syst. 29 (2013), 682-692. https://doi.org/10.1016/j.future.2012.08.015