Acknowledgement
이 논문은 2022년도 정부 (과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (No.2018-0-00769,인공지능 시스템을 위한 뉴로모픽 컴퓨팅 SW 플랫폼 기술 개발).
References
- J. W. Bae, B. G. Han, "Implementation of Deep Learning-based Label Inspection System Applicable to Edge Computing Environments," IEMEK J. Embed. Sys. Appl, Vol. 17, No. 2, pp. 77-83, 2022 (in Korean).
- J. Y. Choi, H. J. Lee, C. W. Jeong, H. C. Jung, "Development of AI Service with Surgical Tools Segmentation and Action Recognition," IEMEK J. Embed. Sys. Appl, Vol. 16, No. 2, pp. 51-57, 2021 (in Korean).
- H. D. Kim, "Design of Speech Enhancement U-Net for Embedded Computing," IEMEK J. Embed. Sys. Appl, Vol. 15, No. 5, pp. 227-234, 2020 (in Korean). https://doi.org/10.14372/IEMEK.2020.15.5.227
- H. J. Kim, "Analysis of Reduced-Width Truncated Mitchell Multiplication for Inferences Using CNNs," IEMEK J. Embed. Sys. Appl, Vol. 15, No. 5, pp. 235-242, 2020 (in Korean). https://doi.org/10.14372/IEMEK.2020.15.5.235
- J. M. Lee, M. S. Yu, Y. I. Kwon, T. H. Kim, "Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for fast Deployment," Future Generation Computer Systems 132, pp. 124-135. 2022. https://doi.org/10.1016/j.future.2022.02.005
- G. Y. Kwon, S. W. Park, T. W. Suh, "Cycle-accurate NPU Simulator and Performance Evaluation According to Data Access Strategies," IEMEK J. Embed. Sys. Appl, Vol. 17, No. 4, pp. 217-228, 2022 (in Korean).
- https://coral.ai/products/dev-board
- https://www.intel.com/content/www/us/en/developer/tools /neural-compute-stick/overview.html
- T. Moreau, T. Chen, L. Vega, J. Roesch, E. Yan, L. Zheng, J. Fromm, Z. Jiang, L. Ceze, C. Guestrin, A. Krishnamurthy, "A Hardware-software Blueprint for Flexible Deep Learning Specialization," IEEE Micro, Vol. 39, No. 5, pp. 8-16, 2019. https://doi.org/10.1109/mm.2019.2928962
- S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, E. Shelhamer, "cudnn: Efficient Primitives for Deep Learning." arXiv preprint arXiv:1410.0759, 2014.
- https://www.openblas.net/
- https://github.com/Reference-LAPACK/lapack
- https://www.arm.com/technologies/compute-library
- https://github.com/Maratyszcza/NNPACK
- E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q. Wu, Y. Wang, "Intel Math Kernel Library," High-Performance Computing on the Intel(R) Xeon Phi TM, Springer, Cham, pp 167-188, 2014.
- https://github.com/AlexeyAB/darknet
- K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
- A. Anderson, A. Vasudevan, C. Keane, D. Gregg, "Low-memory Gemm-based Convolution Algorithms for Deep Neural Networks," arXiv preprint arXiv:1709.03395, 2017.
- J. S. Park, K. M. Bin, K. H. Lee, "mGEMM: Low-latency Convolution with Minimal Memory Overhead Optimized for Mobile Devices," Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services, pp. 222-234, 2022.
- M .S. Cho, B. Daniel, "MEC: Memory-efficient Convolution for Deep Neural Network," International Conference on Machine Learning. PMLR, pp. 815-824, 2017.
- M. Dukhan, "The Indirect Convolution Algorithm." arXiv preprint arXiv:1907.02129, 2019.
- https://docs.nvidia.com/cuda/cublas/index.html
- C. Nugteren, "CLBlast: A Tuned OpenCL BLAS Library," Proceedings of the International Workshop on OpenCL, pp. 1-10, 2018.
- https://github.com/clMathLibraries/clBLAS
- K. Goto, V. D. G Robert, "High-performance Implementation of the Level-3 BLAS," ACM Transactions on Mathematical Software (TOMS) Vol. 35, No. 1, pp. 1-14, 2008.