멀티코어 시스템을 위한 멀티스레드 H.264/AVC 병렬 디코더

Multi-Threaded Parallel H.264/AVC Decoder for Multi-Core Systems

  • 김원진 (한양대학교 전자컴퓨터공학과) ;
  • 조걸 (한양대학교 전자컴퓨터공학과) ;
  • 정기석 (한양대학교 융합전자공학부)
  • Kim, Won-Jin (Department of Electronics, Computer & Communication Engineering, Hanyang University) ;
  • Cho, Keol (Department of Electronics, Computer & Communication Engineering, Hanyang University) ;
  • Chung, Ki-Seok (Department of Electronics Engineering, Hanyang University)
  • 투고 : 2010.07.07
  • 심사 : 2010.10.25
  • 발행 : 2010.11.25

초록

고해상도의 동영상 서비스가 보편화 되면서 동영상을 빠르게 처리를 위한 연구가 활발히 이루어지고 있다. 멀티코어 프로세서의 사용이 증가하고 멀티코어 시스템에서 H.264/AVC 디코더를 구현하기 위하여 다양한 병렬화 방법이 제안되고 있다. 하지만 H.264/AVC 디코더를 병렬화 하는 경우, 각 스레드에서 처리하는 데이터의 처리 시간 차이로 인하여 지속적으로 스레드의 동기를 확인해야 하는데, 이는 병렬화를 통한 디코더의 성능 향상의 걸림돌이 된다. 이러한 병렬화 과정에서 발생하는 문제점을 해결하기 위해 우리가 제안하는 Multi -Threaded Parallelization(MTP) 방법은 프레임을 매크로 블록 묶음으로 나누어 병렬화 한다. 그리고 병렬화 과정에서 스레드를 처리하는 방법을 개선하고, 메모리를 재사용함으로써 디코더의 성능을 향상 시켰다. 본 논문에서는 FFmpeg H.264/AVC 디코더를 인텔 쿼드 코어 기반의 멀티코어 시스템에서 멀티 스레드로 구현하여 실험이 진행되었다. 그 결과, MTP 방법을 적용하여 병렬화 방법 적용하지 않은 H.264/AVC 디코더와 비교하여 최대 53%의 성능향상을 보였으며, 2Dwave 병렬화 방법의 메모리 사용량에 비해 HD 영상에서 65%, FHD 영상에서 81%의 메모리 사용량을 줄 일 수 있었다.

Wide deployment of high resolution video services leads to active studies on high speed video processing. Especially, prevalent employment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. In this paper, we propose a novel parallel H.264/AVC decoding scheme on a multi-core platform. Parallel H.264/AVC decoding is challenging not only because parallelization may incur significant synchronization overhead but also because software may have complicated dependencies. To overcome such issues, we propose a novel approach called Multi-Threaded Parallelization(MTP). In MTP, to reduce synchronization overhead, a separate thread is allocated to each stage in the pipeline. In addition, an efficient memory reuse technique is used to reduce the memory requirement. To verify the effectiveness of the proposed approach, we parallelized FFmpeg H.264/AVC decoder with the proposed technique using OpenMP, and carried out experiments on an Intel Quad-Core platform. The proposed design performs better than FFmpeg H.264/AVC decoder before the parallelization by 53%. We also reduced the amount of memory usage by 65% and 81% for a high-definition(HD) and a full high-definition(FHD) video, respectively compared with that of popular existing method called 2Dwave.

키워드

참고문헌

  1. ITU-T Recommendation H.264, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Infrastructure of audiovisual services- Coding of moving video, May 2003.
  2. ISO, Information Technology-Coding of Audio-Visual Objects, Part10-Advanced Video Coding, ISO/IEC 14496-10.
  3. Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra, Senior Member, "Overview of the H.264/AVC Video Coding Standard", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003
  4. Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro, "H.264/AVC Baseline Profile Decoder Complexity Analysis," IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 704-716 July 2003. https://doi.org/10.1109/TCSVT.2003.814967
  5. E. van der Tol, E. Jaspers, and R.Gelderblom, "Mapping of H.264 decoding on a multiprocessor architecture," Image and Video Communications and Processing, pp.707-718, May, 2003.
  6. A. Rodriguez, A. Gonzalez, and M. P. Malumbres, "Hierarchical parallelization of an h.264/avc video encoder," in Proc. Int. Symp. on Parallel Computing in Electrical Engineering, 2006, pp. 363–368.
  7. M. Roitzsch, "Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding," in Work-in-Progress Proceedings of the 27th IEEE, 2006
  8. Klaus Schomann, Markus Fauster, Oliver Lampl, Laszlo Boszormenyi, "An Evaluation of Parallelization Concepts for Baseline-Prole Compliant H.264/AVC Decoders," in Lecture Notes in Computer Science. Euro-Par 2007 Parallel Processing, August 2007.
  9. J. Chong, N. R. Satish, B. Catanzaro, K. Ravindran, and K.Keutzer,"Effcient parallelization of h.264 decoding with macro block level scheduling," in 2007 IEEE International Conference on Multimedia and Expo, July 2007.
  10. J. Hoogerbrugge and A. Terechko, "A Multithreaded Multicore System for Embedded Media Processing," Transactions on High- Performance Embedded Architectures and Compilers, vol. 3, no. 2, pp.168-187, June 2008.
  11. Kosuke Nishihara, Atsushi Hatabu, Tatsuji Moriyoshi, "Parallelization of H.264 video decoder for embedded multicore processor," In Proceedings of ICME'2008. pp.329-332
  12. A. Azevedo, C. Meenderinck, B. Juurlink, A. Terechko, J. Hoogerbrugge, M. Alvarez, and A. Rammirez, "Parallel H.264 Decoding on an Embedded Multicore Processor," in Proceedings of the 4th International Conference on High Performance and Embedded Architectures and Compilers-HIPEAC, Jan 2009.
  13. Subbarao Palacharla, Norman P. Jouppi and James E.Smith. "Complexity-E ective Superscalar Processors," In 24th International Symposium on Computer Architecture, pp. 206-218, June 1997.
  14. M.S.Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," in Proc. of the SIGPLAN'88 Conference on PLDI, pages 318-328, Atlanta, GA, June 1988.
  15. Chunhua Liao, Zhenying Liu, Lei Huang, and Barbara Chapman. "Evaluating OpenMP on Chip MultiThreading Platforms," In First international workshop on OpenMP, Eugene, Oregon USA, June 2005.
  16. 조한욱, 조송현, 송용호, "멀티코어 프로세서에서의 H.264/AVC 디코더를 위한 데이터 레벨 병렬화 성능 예측 및 분석," 전자공학회논문지, 제46권, 제8호, 102-116쪽, 2009년 8월.
  17. 심동규, 남정학, "고속 비디오 처리를 위한 병렬화 기술," 전자공학회논문지, 제36권 제4호, (통권 제299호), 83-90쪽, 2009년 4월.