Browse > Article

Color Media Instructions for Embedded Parallel Processors  

Kim, Cheol-Hong (전남대학교 전자컴퓨터공학과)
Kim, Jong-Myon (울산대학교 컴퓨터정보통신공학부)
Abstract
As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.
Keywords
Color image and video processing; multimedia instructions; embedded SIMD parallel processors;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. N. Plataniotis and A. N. Venetsanopoulos, Color Image Processing and Applications, Springer Verlag, 2000
2 A. Peleg and U. Weiser, "MMX Technology Extension to the Intel Architecture," IEEE Micro, Vol.16, No.4, pp. 42-50, Aug. 1996
3 J. Fridman and Z. Greenfield, "The TigerSHARC DSP architecture," in Proc. IEEE/ACM Intl. Sym. on Computer Architecture, pp. 124-135, May 1999
4 J. Kim and D. S. Wills, "Evaluating a 16-bit YCbCr (6:5:5) color representation for low memory, embedded video processing," in Proc. of the IEEE Intl. Conf. on Consumer Electronics, pp. 181-182, Jan. 2005
5 P. Ranganathan, S. Adve, and N. P. Jouppi, "Performance of image and video processing with general-purpose processors and media ISA extensions, in Proc. of the 26th Intl. Sym. on Computer Architecture, pp. 124-135, May 1999
6 L. W. Tucker and G. G. Robertson, "Architecture and applications of the connection machine," IEEE Computer, Vol.21, No.8, pp. 26-38, 1988
7 "Connection machine model CM-2 technical summary," Thinking Machines Corp., version 51, May 1989
8 MarPar (MP-2) System Data Sheet. MarPar Corporation, 1993
9 M. J. Irwin, R. M. Owens, "A Two-Dimensional, Distributed Logic Processor," IEEE Trans. on Computers, Vol.40, No.10, pp. 1094-1101, 1991   DOI   ScienceOn
10 R. B. Lee, "Subword Parallelism with MAX-2," IEEE Micro, Vol.16, No.4, pp. 51-59, Aug. 1996
11 R. Bhargava, L. John, B. Evans, and R. Radhakrishnan, "Evaluating MMX technology using DSP and multimedia applications," in Proc. of IEEE/ ACM Sym. on Microarchitecture, pp. 37-46, 1998
12 J. Suh and V. K. Prasanna, "An Efficient Algorithm for Out-of-core Matrix Transposition," IEEE Trans. on Computers, Vol.51, No.4, pp. 420-438, April 2002   DOI   ScienceOn
13 H.-M. Hang and B. G. Haskell, "Interpolative vector quantization of color images," IEEE Trans. Commun., Vol.COM-36, No.4, pp. 465-470, April 1988
14 ARM9 Family: http://www.arm.com/products/CPUs/ families/ARM9Family.html
15 S. C. Kwatra, C. M. Lin, and W. A. Whyte, "An adaptive algorithm for motion compensated color image coding," IEEE Trans. Commun., Vol. COM-35, pp. 747-754, July 1987
16 N. Slingerland and A. J. Smith, "Measuring the performance of multimedia instruction sets," IEEE Trans. on Computers, Vol.51, No.11, pp. 1317-1332, Nov. 2002   DOI   ScienceOn
17 S. K. Raman, V. Pentkovski, and J. Keshava, "Implementing Streaming SIMD Extensions on the Pentium III Processor," IEEE Micro, Vol.20, No.4, pp. 28-39, 2000
18 A. Krikelis, I. P. Jalowiecki, D. Bean, R. Bishop, M. Facey, D. Boughton, S. Murphy, and M. Whitaker, "A programmable processor with 4096 processing units for media applications," in Proc. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Vol.2, pp. 937-940, May 2001
19 V. Tiwari, S. Malik, and A. Wolfe, "Compilation Techniques for Low Energy: An Overview," in Proc. of the IEEE Intl. Symp. on Low Power Electron., pp. 38-39, Oct. 1994
20 M. Tremblay, J. M. O'Connor, V. Narayanan, and L. He, "VIS Speeds New Media Processing," IEEE Micro, Vol.16, No.4, pp. 10-20, Aug. 1996
21 C. C. Yang, "Effects of coordinate systems on color image processing," MS Thesis, University of Arizona, Tucson, 1992
22 A. Gentile and D. S. Wills, "Portable Video Supercomputing," IEEE Trans. on Computers, Vol.53, No.8, pp. 960-973, Aug. 2004   DOI   ScienceOn
23 R. Sites, Ed., Alpha Reference Manual, Burlington, MA: Digital, 1992
24 TMS320C64x families: http://www.bdti.com/procsum/ tic64xx.htm
25 J. Kim and D. S. Wills, "Quantized color instruction set for multimedia-on-demand applications," in Proceedings of the IEEE International Conference on Multimedia and Expo, pages 141-144, July 2003
26 J. C. Eble, V. K. De, D. S. Wills, and J. D. Meindl, "A generic system simulator (GENESYS) for ASIC technology and architecture beyond 2001," in Proc. of the Ninth Ann. IEEE Intl. ASIC Conf., pp. 193-196, Sept. 1996
27 A. D. Blas et. al., "The UCSC Kestrel Parallel Processor," IEEE Trans. on Parallel and Distributed Systems, Vol.16, No.1, pp. 80-92, Jan. 2005   DOI   ScienceOn
28 S. M. Chai, T. M. Taha, D. S. Wills, and J. D. Meindl, "Heterogeneous architecture models for interconnect-motivated system design," IEEE Trans. VLSI Systems, special issue on system level interconnect prediction, Vol.8, No.6, pp. 660-670, Dec. 2000
29 H. Nguyen and L. John, "Exploiting SIMD Parallelism in DSP and Multimedia Algorithms using the AltiVec Technology," in Proc. Intl. Conf. on Supercomputer, pp. 11-20, June 1999
30 M. Bolotski, R. Armithrajah, W. Chen, "ABACUS: A High Performance Architecture for Vision," in Proceedings of the International Conference on Pattern Recognition, 1994