DOI QR코드

DOI QR Code

GPU-Based ECC Decode Unit for Efficient Massive Data Reception Acceleration

  • Kwon, Jisu (School of Electronic and Electrical Engineering, Kyungpook National University) ;
  • Seok, Moon Gi (School of Computer Science and Engineering, Nanyang Technological University) ;
  • Park, Daejin (School of Electronic and Electrical Engineering, Kyungpook National University)
  • Received : 2020.03.19
  • Accepted : 2020.06.09
  • Published : 2020.12.31

Abstract

In transmitting and receiving such a large amount of data, reliable data communication is crucial for normal operation of a device and to prevent abnormal operations caused by errors. Therefore, in this paper, it is assumed that an error correction code (ECC) that can detect and correct errors by itself is used in an environment where massive data is sequentially received. Because an embedded system has limited resources, such as a low-performance processor or a small memory, it requires efficient operation of applications. In this paper, we propose using an accelerated ECC-decoding technique with a graphics processing unit (GPU) built into the embedded system when receiving a large amount of data. In the matrix-vector multiplication that forms the Hamming code used as a function of the ECC operation, the matrix is expressed in compressed sparse row (CSR) format, and a sparse matrix-vector product is used. The multiplication operation is performed in the kernel of the GPU, and we also accelerate the Hamming code computation so that the ECC operation can be performed in parallel. The proposed technique is implemented with CUDA on a GPU-embedded target board, NVIDIA Jetson TX2, and compared with execution time of the CPU.

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (No. NRF-2019R1A2C2005099), and Ministry of Education (No. NRF-2018R1A6A1A03025109), and BK21 Four project funded by the Ministry of Education, Korea (September 2020).

References

  1. S. Maity, M. Abdel-Mottaleb, and S. S. Asfour, "Multimodal biometrics recognition from facial video with missing modalities using deep learning," Journal of Information Processing Systems, vol. 16, no. 1, pp. 6-29, 2020. https://doi.org/10.3745/JIPS.02.0129
  2. A. Gorodilov, D. Gavrilov, and D. Schelkunov, "Neural networks for image and video compression," in Proceedings of 2018 International Conference on Artificial Intelligence Applications and Innovations (ICAIAI), Nicosia, Cyprus, 2018m pp. 37-41.
  3. H. Zeng, Q. Wang, C. Li, and W. Song, "Learning-based multiple pooling fusion in multi-view convolutional neural network for 3D model classification and retrieval," Journal of Information Processing Systems, vol. 15, no. 5, pp. 1179-1191, 2019. https://doi.org/10.3745/jips.02.0120
  4. M. J. J. Ghrabat, G. Ma, I. Y. Maolood, S. S. Alresheedi, and Z. A. Abduljabbar, "An effective image retrieval based on optimized genetic algorithm utilized a novel SVM-based convolutional neural network classifier," Human-centric Computing and Information Sciences, vol. 9, article no. 31, 2019.
  5. V. S. Chua, J. Z. Esquivel, A. S. Paul, T. Techathamnukool, C. F. Fajardo, N. Jain, O. Tickoo, and R. Iyer, "Visual IoT: ultra-low-power processing architectures and implications," IEEE Micro, vol. 37, no. 6, pp. 52-61, 2017. https://doi.org/10.1109/MM.2017.4241343
  6. E. Khan, S. Lehmann, H. Gunji, and M. Ghanbari, "Iterative error detection and correction of H.263 coded video for wireless networks," IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 12, pp. 1294-1307, 2004. https://doi.org/10.1109/TCSVT.2004.837018
  7. J. Kwon, M. G. Seok and D. Park, "User Insensible Sliding Firmware Update Technique for Flash-Area/Time-Cost Reduction toward Low-Power Embedded Software Replacement," 2020 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), Kokubunji, Japan, 2020, pp. 1-3, doi: 10.1109/COOLCHIPS49199.2020.9097638.
  8. J. Kwon and D. Park, "Implementation of Computation-Efficient Sensor Network for Kalman Filter-based Intelligent Position-Aware Application," 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 2020, pp. 565-568, doi: 10.1109/ICAIIC48513.2020.9065098.
  9. M. R. Mesbahi, A. M. Rahmani, and M. Hosseinzadeh, "Reliability and high availability in cloud computing environments: a reference roadmap," Human-centric Computing and Information Sciences, vol. 8, article no. 20, 2018.
  10. S. Unterschutz and V. Turau, "Fail-safe over-the-air programming and error recovery in wireless networks," in Proceedings of the 10th International Workshop on Intelligent Solutions in Embedded Systems, Klagenfurt, Austria, 2012, pp. 27-32.
  11. Q. Zhang, G. Wang, Z. Xiong, J. Zhou, and W. Zhu, "Error robust scalable audio streaming over wireless IP networks," IEEE Transactions on Multimedia, vol. 6, no. 6, pp. 897-909, 2004. https://doi.org/10.1109/TMM.2004.837249
  12. M. Manohara, R. Mudumbai, J. Gibson, and U. Madhow, "Error correction scheme for uncompressed HD video over wireless," in Proceedings of 2009 IEEE International Conference on Multimedia and Expo, New York, NY, 2009, pp. 802-805.
  13. P. Kukieattikool and N. Goertz, "Staircase codes for high-rate wireless transmission on burst-error channels," IEEE Wireless Communications Letters, vol. 5, no. 2, pp. 128-131, 2016. https://doi.org/10.1109/LWC.2015.2507573
  14. Y. Qassim and M. E. Magana, "Error-tolerant non-binary error correction code for low power wireless sensor networks," in Proceedings of the International Conference on Information Networking (ICOIN), Phuket, Thailand, 2014, pp. 23-27.
  15. R. Motwani, Z. Kwok, and S. Nelson, "Low density parity check (LDPC) codes and the need for stronger ECC," in Proceedings of the 6th Annual Flash Memory Summit, Santa Clara, CA, 2011, pp. 41-50.
  16. W. Liu, J. Rho, and W. Sung, "Low-power high-throughput BCH error correction VLSI design for multi-level cell NAND flash memories," in Proceedings of 2006 IEEE Workshop on Signal Processing Systems Design and Implementation, Banff, Canada, 2006, pp. 303-308.
  17. K. Sripimanwat, Turbo Code Applications. Dordrecht: Springer, 2005.
  18. S. Keskin and T. Kocak, "GPU accelerated gigabit level BCH and LDPC concatenated coding system," in Proceedings of 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, 2017, pp. 1-4.
  19. A. K. Subbiah and T. Ogunfunmi, "Memory-efficient Error Correction Scheme for Flash Memories using GPU," in Proceedings of 2018 IEEE International Workshop on Signal Processing Systems (SiPS), Cape Town, South Africa, 2018, pp. 118-122..
  20. A. K. Subbiah and T. Ogunfunmi, "Three-bit fast error corrector for BCH codes on GPUs," in Proceedings of 2019 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, 2019, pp. 1-4.
  21. A. K. Subbiah and T. Ogunfunmi, "Fast BCH syndrome generator using parallel polynomial division algorithm for GPGPUs," in Proceedings of 2019 IEEE 6th International Conference on Industrial Engineering and Applications (ICIEA), Tokyo, Japan, 2019, pp. 458-462.
  22. S. Keskin and T. Kocak, "GPU-based gigabit LDPC decoder," IEEE Communications Letters, vol. 21, no. 8, pp. 1703-1706, 2017. https://doi.org/10.1109/LCOMM.2017.2704113
  23. C. H. Chan and F. C. Lau, "Parallel decoding of LDPC convolutional codes using OpenMP and GPU," in Proceedings of 2012 IEEE Symposium on Computers and Communications (ISCC), Cappadocia, Turkey, 2012, pp. 225-227.
  24. H. Ahn, Y. Jin, S. Han, S. Choi, and S. Ahn, "Design and implementation of GPU-based turbo decoder with a minimal latency," in Proceedings of the 18th IEEE International Symposium on Consumer Electronics (ISCE), JeJu, South Korea, 2014, pp. 1-2.
  25. H. L. Kalter, C. H. Stapper, J. E. Barth, J. DiLorenzo, C. E. Drake, J. A. Fifield, G. A. Kelly, S. C. Lewis, W. B. van der Hoeven, and J. A. Yankosky, "A 50-ns 16-Mb DRAM with a 10-ns data rate and on-chip ECC," IEEE Journal of Solid-State Circuits, vol. 25, no. 5, pp. 1118-1128, 1990. https://doi.org/10.1109/4.62132
  26. T. Tanzawa, T. Tanaka, K. Takeuchi, R. Shirota, S. Aritome, H. Watanabe, et al., "A compact on-chip ECC for low cost flash memories," IEEE Journal of Solid-State Circuits, vol. 32, no. 5, pp. 662-669, 1997. https://doi.org/10.1109/4.568829
  27. K. Dang and X. T. Tran, "Parity-based ECC and mechanism for detecting and correcting soft errors in onchip communication," in Proceedings of 2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Hanoi, Vietnam, 2018, pp. 154-161.
  28. F. Alzahrani and T. Chen, "On-chip TEC-QED ECC for ultra-large, single-chip memory systems," in Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors, Cambridge, MA, 1994, pp. 132-137.
  29. S. H. Kim, W. O. Lee, J. H. Kim, S. S. Lee, S. T. Hwang, C. I. Kim, et al., "A low power and highly reliable 400Mbps mobile DDR SDRAM with on-chip distributed ECC," in Proceedings of 2007 IEEE Asian SolidState Circuits Conference, Jeju, South Korea, 2007, pp. 34-37.
  30. J. A. Fifield and C. H. Stapper, "High-speed on-chip ECC for synergistic fault-tolerance memory chips," IEEE Journal of Solid-State Circuits, vol. 26, no. 10, pp. 1449-1452, 1991. https://doi.org/10.1109/4.90100
  31. NVIDIA, "CUDA Toolkit Documentation," [Online]. Available: https://docs.nvidia.com/cuda/.
  32. ARM Developer, "Neon ISA Description," [Online]. Available: https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.