DOI QR코드

DOI QR Code

Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL

OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화

  • Received : 2016.06.24
  • Accepted : 2016.10.04
  • Published : 2016.12.31

Abstract

Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.

Keywords

References

  1. Y.H. Park, C.H. Kim, J.M. Kim, "Implementation and performance evaluation of the faddev-leverrier algorithm using GPGPU," IEMEK J. Embed. Sys. Appl., No. 8, Vol. 3, 2013 (in Korean).
  2. Khronos OpenCL Working Group, "The OpenCL specification version 1.2," Document Revision 19, 2012.
  3. R.G. Gallager, "Low-density parity check codes," IEEE IRE Transactions on Information Theory, Vol. 8, No. 1, pp. 21-28, 1962. https://doi.org/10.1109/TIT.1962.1057683
  4. S.M. Choi, B.H. Moon, J.T. Ryu, S.H. Park, "Performance analysis on error correction scheme for wireless sensor network over node-to-node interference," IEMEK J. Embed. Sys. Appl., Vol. 2, No. 1, 2006 (in Korean).
  5. S. Wang, S. Cheng, Q. Wu, "A parallel decoding algorithm of LDPC codes using CUDA," Proceedings of Asilomar Conference on Signals, Systems and Computers, pp. 171-175, 2008.
  6. G. Falcão, S. Leonel, S. Vitor, "Massive parallel LDPC decoding on GPU," Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 83-90, 2008.
  7. H.W. Ji, J.H. Cho, W.Y. Sung, "Memory access optimized implementation of cyclic and quasi-cyclic LDPC codes on a GPGPU," Journal of Signal Processing Systems, Vol. 64, No. 1, pp. 149-159, 2011. https://doi.org/10.1007/s11265-010-0547-9
  8. G. Falcão, V. Silva, L. Sousa, "How GPUs can outperform ASICs for fast LDPC decoding," Proceedings of the 23rd international conference on Supercomputing, pp. 390-399, 2009.
  9. J. Shen, J. Fang, H. Sips, A. L. Varbanescu, "Performance traps in OpenCL for CPUs," Proceedings of IEEE 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 38-45, 2013.
  10. B. R. Gaster, L. Howes, D. R. Kaeli, P. Mistry, D. Schadd, "Heterogeneous computing with OpenCL: Revised OpenCL 1.2 Edition," Morgan Kaufmann, 2012.
  11. J.Y. Park, K.S. Chung, "Parallel LDPC decoding using CUDA and OpenMP," EURASIP Journal on Wireless Communications and Networking, Vol. 2011, No. 1, pp. 1-8, 2011. https://doi.org/10.1186/1687-1499-2011-1
  12. Advanced Micro Devices, "AMD accelerates parallel processing OpenCL programming guide," 2013.
  13. D. Leonardo, R. Menon, "OpenMP: an industry standard API for shared-memory programming," IEEE Computational Science and Engineering, Vol. 5, No.1, pp. 46-55, 1998. https://doi.org/10.1109/99.660313
  14. Nvidia, "Compute unified device architecture programming guide Version 2.0," 2008.