Browse > Article
http://dx.doi.org/10.3745/KIPSTA.2011.18A.3.099

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors  

Jung, Yong-Bum (울산대학교 전기공학부)
Kim, Jong-Myon (울산대학교 컴퓨터정보통신공학부)
Abstract
Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.
Keywords
Multimedia Specific Instructions; SIMD Parallel Processor; Image/Video Processing;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 L. W. Tucker, and G. G. Robertson, "Architecture and applications of the connection machine," IEEE Computer, Vol.21, No.8, pp.26-38, 1988.   DOI   ScienceOn
2 "Connection machine model CM-2 technical summary," Thinking Machines Corp., version 51, May, 1989.
3 MarPar (MP-2) System Data Sheet. MarPar Corporation, 1993.
4 M. J. Irwin, R. M. Owens, "A Two-Dimensional, Distributed Logic Processor," IEEE Trans. on Computers, Vol.40, No.10, pp.1094-1101, 1991.   DOI   ScienceOn
5 M. Bolotski, R. Armithrajah, W. Chen, "ABACUS: A High Performance Architecture for Vision," in Proceedings of the International Conference on Pattern Recognition, 1994.
6 S. M. Chai, T. M. Taha, D. S. Wills, and J. D. Meindl, "Heterogeneous architecture models for interconnectmotivated system design," IEEE Trans. VLSI Systems, special issue on system level interconnect prediction, Vol.8, No.6, pp.660-670, Dec., 2000.   DOI   ScienceOn
7 J. C. Eble, V. K. De, D. S. Wills, and J. D. Meindl, "A generic system simulator (GENESYS) for ASIC technology and architecture beyond 2001," In Proc. of the Ninth Ann. IEEE Intl. ASIC Conf., pp.193-196, Sept., 1996.   DOI
8 R. Bhargava, L. John, B. Evans, and R. Radhakrishnan, "Evaluating MMX technology using DSP and multimedia applications," in Proc. of IEEE/ACM Sym. on Microarchitecture, pp.37-46, 1998.   DOI
9 N. Slingerland, and A. J. Smith, "Measuring the performance of multimedia instruction sets," IEEE Trans. on Computers, Vol51, No.11, pp.1317-1332, Nov., 2002.   DOI   ScienceOn
10 A. Krikelis, I. P. Jalowiecki, D. Bean, R. Bishop, M. Facey, D. Boughton, S. Murphy, and M. Whitaker, "A programmable processor with 4096 processing units for media applications," in Proc. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Vol.2, pp.937-940, May, 2001.   DOI
11 R. Sites, Ed., Alpha Reference Manual, Burlington, MA:Digital,1992.
12 박경, "멀티미디어 확장 명령어 세트의 조사", 정보통신산업진흥원, [IITA] 정보통신연구진흥원 학술정보 주간기술 853호, http://kidbs.itfind.or.kr/WZIN/jugidong/853/85302.html
13 P. Ranganathan, S. Adve, and N. P. Jouppi, "Performance of image and video processing with general-purpose processors and media ISA extensions," in Proc. of the 26th Intl. Sym. on Computer Architecture, pp.124-135, May, 1999.
14 S. K. Raman, V. Pentkovski, and J.Keshava,"Implementing Streaming SIMD Extensions on the Pentium III Processor," IEEE Micro, Vol.20, No.4, pp.28-39, 2000.   DOI   ScienceOn
15 R. B. Lee, "Subword Parallelism with MAX-2," IEEE Micro, vol. 16, no. 4, pp. 51-59, Aug. 1996.   DOI   ScienceOn
16 M. Tremblay, J. M. O'Connor, V. Narayanan, and L. He, "VIS Speeds New Media Processing,"IEEE Micro, Vol.16, No.4, pp.10-20, Aug., 1996.   DOI   ScienceOn
17 A. gentile and D. S. Wills, "Portable Video Supercomputing," IEEE Trans. on Computers, Vol.53, No.8, pp.960-973, Aug., 2004.   DOI   ScienceOn
18 H. Nguyen and L. John, "Exploiting SIMD Parallelism in DSP and Multimedia Algorithms using the AltiVec Technology," in Proc. Intl. Conf. on Supercomputer, pp.11-20, June, 1999.
19 정무경, 박성모, 엄낙옹, "병렬 프로세서 기술 및 동향", 전자통신동향분석 제24권, 제6호, 86-93쪽, 2009년 12월.   과학기술학회마을
20 A.D. Blas et. al., "The UCSC Kestrel Parallel Processor," IEEE Trans. on Parallel and Distributed Systems, vol.16, No.1, pp. 80-92, Jan., 2005.   DOI   ScienceOn
21 Luong Van Huynh, 김철홍, 김종면, "퍼지 백터 양자화를 위한 대규모 병렬 알고리즘 ", 한국정보처리학회논문지 A, 제16-A권, 제6호, 411-418쪽, 2009년 12월.
22 A. Peleg and U. Weiser, "MMX Technology Extension to the Intel Architecture," IEEE Micro, Vol.16, No.4, pp.42-50, Aug., 1996.   DOI   ScienceOn
23 V. Tiwari, S. Malik, and A. Wolfe, "Compilation Techniques for Low Energy: An Overview," in Proc. of the IEEE Intl. Symp. on Low Power Electron., pp.38-39, Oct., 1994.   DOI