과제정보
This research was financially supported by Hansung University.
참고문헌
- Owens, John D., et al. "GPU computing," Proceedings of the IEEE 96.5, pp. 879-899, 2008. https://doi.org/10.1109/JPROC.2008.917757
- Choquette, Jack, et al. "NVIDIA Al00 tensor core GPU- Performance and innovation," IEEE Miero, vol. 41, no. 2, pp. 29-35, 2021. https://doi.org/10.1109/MM.2021.3061394
- Wang, Yu Emma, Gu-Yeon Wei, and David Brooks. "Benchmarking TPU, GPU, and CPU platforms for deep learning," arXiv preprint arXiv:1907.10701, 2019.
- Choi, Yujeong, and Minsoo Rhu. "Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units," 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020.
- Hoefler, Torsten, et al. "Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks," The Journal cf Machine Learning Research 22.1, pp. 10882-11005, 2021.
- Wu, Hao, et al. "Integer quantization for deep learning inference: Principles and empirical evaluation," arXiv preprint arXiv:2004.09602, 2020.
- Gou, Jianping, et al, "Knowledge distillation: A survey," International Journal of Computer Vision 129:,pp. 1789-1819, 2021. https://doi.org/10.1007/s11263-021-01453-z
- NVIDIA Hl00 Tensor Core GPU Architecture- EXCEPTIONAL PERFORMANCE, SCALABILITY, AND SECURITY FOR THE DATA CENTER (2022), https://www.advancedclustering.com/wp-content/uploads/2022/03/gtc22-whitepaper-hopper.pdf, (accessed 18, 03, 2024)
- Anne C. Elsterm et al. "Nvidia Hopper GPU and Grace GPU Highlights," Computing in Science & Engineering, vol. 24, no. 2, pp. 95-100, 2022.
- MTIA v1: Meta's first-generation Al inference accelerator (2023), https://ai.meta.com/blog/meta-training-inference-accelerator-AI-MTIA, (accessed 18, 03, 2024)
- E. Talpes et al" "The microarchitecture of dojo, tesla's exa-scale computer," IEEE Micro, vol. 43, no. 3, pp. 31-39, 2023. https://doi.org/10.1109/MM.2023.3258906
- Deploying Transformers on the Apple Neural Engine (2022), https://machinelearning.apple.corn/research/neural-engine-transformers, (accessed 18, 03, 2024)
- Yong Cheol Peter Cho, et al.. "AB9: A neural processor for inference acceleration," ETRI ETRI Journal, vol. 42, no. 4, pp. 491-504, Aug. 2020. https://doi.org/10.4218/etrij.2020-0134
- Sapeon (2024), https://www.sapeon.com/, (accessed 18, 03, 2024)
- FuriosaAI (2024), https://furiosa.ai/warboy/specs, (accessed 18, 03, 2024)
- ATOM: 5nm Versatile Inference SoC, Versatile yet Energy Efficient Al System-on-Chip (2023), http://rebellions.ai/rebellions-product/atom-2/, (accessed 18, 03, 2024)
- Albericio, Jorge, et al. "Cnvlutin: Ineffectual-neuron-free deep neural network computing," ACM SIGARCH Computer Architecture News, vol. 44, no. 3 pp. 1-13, 2016. https://doi.org/10.1145/3007787.3001138
- Liu, Shaoli, et al. "Cambricon: An instruction set architecture for neural networks," ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 393-405, 2016. https://doi.org/10.1145/3007787.3001179