DOI QR코드

DOI QR Code

고성능 멀티미디어 처리용 병렬프로세서 하드웨어 설계 및 구현

Hardware Design and Implementation of a Parallel Processor for High-Performance Multimedia Processing

  • 김용민 (울산대학교 전기공학부) ;
  • 황철희 (울산대학교 전기공학부) ;
  • 김철홍 (전남대학교 전자컴퓨터공학부) ;
  • 김종면 (울산대학교 전기공학부)
  • Kim, Yong-Min (School of Electrical Engineering, University of Ulsan) ;
  • Hwang, Chul-Hee (School of Electrical Engineering, University of Ulsan) ;
  • Kim, Cheol-Hong (Department of Electronics and Computer Engineering, Chonnam National University) ;
  • Kim, Jong-Myon (School of Electrical Engineering, University of Ulsan)
  • 투고 : 2010.11.04
  • 심사 : 2011.02.09
  • 발행 : 2011.05.31

초록

최근 모바일 멀티미디어 기기들의 사용이 증가 하면서 고성능 멀티미디어 프로세서에 대한 필요성이 증가하고 있다. 본 논문에서는 낮은 소비전력으로 고성능 멀티미디어 애플리케이션을 구현할 수 있는 SIMD기반 병렬프로세서를 제안한다. 제안하는 병렬프로세서는 16개의 프로세싱 엘리먼트로 구성되어 있으며, 3단계 파이프라인 구조로 설계되었다. 모의실험 결과, 제안한 SIMD기반 병렬프로세서는 기존의 병렬프로세서보다 프로세싱 엘리먼트 당 상대 연산 처리량에서 높은 성능을 보였으며, 또한 동일한 130nm 테크놀리지와 720 클록주파수에서 상용 고성능 프로세서인 TI C6416보다 1.4~31.4배의 성능 향상 및 5.9~8.1배의 에너지 효율 향상을 보였다. 제안한 병렬프로세서를 하드웨어 설계언어인 verilog HDL을 이용하여 설계하였고, FPGA를 이용해 검증하였다.

As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements (PEs) and operates on a 3-stage pipelining. Experimental results indicated that the proposed parallel processor outperforms conventional parallel processors in terms of performance. In addition, our proposed parallel processor outperforms commercial high-performance TI C6416 DSP in terms of performance (1.4-31.4x better) and energy efficiency (5.9-8.1x better) with same 130nm technology and 720 clock frequency. The proposed parallel processor was developed with verilog HDL and verified with a FPGA prototype system.

키워드

참고문헌

  1. M.K. Chung, S.M. Park, and N.W. Eum, "Technology and trend of parallel processor," Electronics and Telecommunications Trends, vol. 24, no. 6, pp. 86-93, Dec. 2009.
  2. S.H. Lee, "The design and implementation of parallel processing system using Nios(R) II embedded processor", Journal of the Korea Society of Computer and Information, vol. 14, no. 1, pp. 97-103, Nov. 2009.
  3. J.J. Lee, S.M. Park, and N.W. Eum, "Application specific instruction set processor for multimedia applications," Electronics and Telecommunications Trends, vol. 24, no. 6, pp. 94-98, Dec. 2009.
  4. S.H. Kim, S.B. Nam, and H.J. Lim, "An improved area edge detection for real-time image processing," Journal of the Korea Society of Computer and Information, vol. 14, no. 1, pp. 99-106, Jan. 2009.
  5. P. Ranganathan, S. Adve, and N. P. Jouppi, "Performance of image and video processing with general-purpose processors and media ISA extensions," in Proc. of the 26th Intl. Sym. on Computer Architecture, pp. 124-135, May. 1999.
  6. R. Bhargava, L. John, B. Evans, and R. Radhakrishnan, "Evaluating MMX technology using DSP and multimedia applications," in Proc. of IEEE/ACM Sym. on Microarchitecture, pp. 37-46, 1998.
  7. N. Slingerland and A. J. Smith, "Measuring the performance of multimedia instruction sets," IEEE Trans. on Computers, vol. 51, no. 11, pp. 1317-1332, Nov. 2002. https://doi.org/10.1109/TC.2002.1047756
  8. A. Shahbahrami, B. Juurlink, and S. Vassiliadis, "Versatility of extended subwords and the matrix register file," ACM Transactions on Architecture and Code Optimization (TACO), vol. 5, no. 1, Article 5:1-5:30, 2008.
  9. J.-C. Chiu, Y.-L. Chou, and H.-Y. Tzeng, "A Multi-streaming SIMD Architecture for Multimedia Applications", in Proceedings of the 6th ACM conference on Computing frontiers, pp. 51-60, 2009.
  10. A. Gentile and D. S. Wills, "Portable video supercomputing," IEEE Trans. on Computers, vol. 53, no. 8, pp. 960-973, Aug. 2004. https://doi.org/10.1109/TC.2004.48
  11. A. Krikelis, I. P. Jalowiecki, D. Bean, R. Bishop, M. Facey, D. Boughton, S. Murphy, and M. Whitaker, "A programmable processor with 4096 processing units for media applications," in Proc. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pp. 937-940, May. 2001.
  12. L. W. Tucker and G. G. Robertson, "Architecture and applications of the connection machine," IEEE Computer, vol. 21, no. 8, pp. 26-38, 1988.
  13. "Connection machine model CM-2 technical summary," Thinking Machines Corp., version 51, May 1989.
  14. MarPar (MP-2) System Data Sheet. MarPar Corporation, 1993.
  15. M. J. Irwin, R. M. Owens, "A two-dimensional, distributed logic processor," IEEE Trans. on Computers, vol. 40, no. 10, pp. 1094-1101, 1991. https://doi.org/10.1109/12.93742
  16. M. Bolotski, R. Armithrajah, W. Chen, "ABACUS: A high performance architecture for vision," in Proceedings of the International Conference on Pattern Recognition, 1994.
  17. S. M. Chai, T. Taha, D. S. Wills, J. D. Meindl, "Heterogeneous architecture models for interconnect-motivated system design," IEEE Trans. on VLSI Systems, vol. 8, no. 6, pp. 660-670, 2000. https://doi.org/10.1109/92.902260
  18. J. C. Eble, V. K. De, D. S. Wills, J. D. Meindl, "Generic system simulator (GENESYS) for ASIC technology and architecture beyond 2001," Proceedings of the Ninth Annual IEEE International ASIC Conference, pp. 193-196, September 1996.
  19. Jongmyon Kim, Yong-Min Kim, Cheol-Hong Kim, "Performance evaluation of multimedia extensions on variable many-core processors," in the International Conference on Computer Design (CDES'10), pp. 98-104, July 2010.
  20. TMS320C64x families: http://www.bdti.com/procsum /tic64xx.htm.
  21. M.-K. You, Y.-J. Oh, G.-Y. Song, "System-level hardware function verification system," Journal of The Institute of Signal Processing and Systems," vol. 11, no. 2, pp. 86-91, April 2010.
  22. Xilinx Userguide: http://www.xilinx.com/support/doucumentation.user_guides/ug070.pdf

피인용 문헌

  1. Fire Detection Approach using Robust Moving-Region Detection and Effective Texture Features of Fire vol.18, pp.6, 2011, https://doi.org/10.9708/jksci.2013.18.6.021
  2. 매니코어 프로세서를 이용한 SIFT 알고리즘 병렬구현 및 성능분석 vol.18, pp.9, 2011, https://doi.org/10.9708/jksci.2013.18.9.001