Acknowledgement
This work is supported by the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant 22AMDP-C161962-02).
References
- "What is Kubernetes?," https://kubernetes.io/docs/home/
- N. Dryden, R. Bohringer, T. Ben-Nun, and T.R. Hoefler, and T. Bohringer, "Clairvoyant prefetching for distributed machine learning I/O," in Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'21), St. Louis, MO, US, 2021, pp. 1-15.
- "What is HDFS?," https://hadoop.apache.org/
- "Introducing the Lustre File System," https://doc.lustre.org/lustre_manual.xhtml
- "AWS S3," https://aws.amazon.com/ko/s3
- "Tensorflow Overview," https://www.tensorflow.org/overview
- "Pytorch Documentation," https://pytorch.org/docs/stable/index.html
- "Spark Overview," https://spark.apache.org/docs/latest/
- R. Gu, K. Zhang, Z. Xu, Y. Che, B. Fan, H. Hou, H. Dai, L. Yi, Y. Ding, G. Chen, and Y. Huang, "Fluid: Dataset abstraction and elastic acceleration for cloud-native deep learning training jobs," in Proc. of 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 2022, pp. 2182-2195
- L. Wang, S. Ye, B. Yang, Y. Lu, H. Zhang, S. Yan, and Q Luo, "Diesel: A dataset-based distributed storage and caching system for large-scale deep learning training," in Proc. of 49th International Conference on Parallel Processing (ICPP'20), Edmonton, AB, Canada, 2020, pp. 17-20.
- M. Abdi, A. Mosayyebzadeh, M.H. Hajkazemi, E.U. Kaynar, A. Turk, L. Rudolph, O. Krieger, and P. Desnoyers, "A community cache with complete information," in Proc. of 19th USENIX Conference on File and Storage Technologies (FAST'21), 2021, pp. 23-25.
- "Kubeflow Introduction," https://www.kubeflow.org/docs/started/introduction/
- "Apache Ignite Documentation Overview," https://ignite.apache.org/docs/latest/