Browse > Article
http://dx.doi.org/10.9708/jksci.2011.16.10.011

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor  

Jung, Yong-Bum (School of Electrical Engineering, University of Ulsan)
Kim, Yong-Min (School of Electrical Engineering, University of Ulsan)
Kim, Cheol-Hong (Electronics and Computer Engineering, Chonnam National University)
Kim, Jong-Myon (School of Electrical Engineering, University of Ulsan)
Abstract
This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.
Keywords
Multimedia-specific instructions; SIMD based parallel processor; JPEG compression algorithm; many-core GPU;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 M. J. Irwin, R. M. Owens, "A Two-Dimensional, Distributed Logic Processor," IEEE Trans. on Computers, vol. 40, no. 10, pp. 1094-1101, 1991.   DOI   ScienceOn
2 M. Bolotski, R. Armithrajah, W. Chen, "ABACUS: A High Performance Architecture for Vision," in Proceedings of the International Conference on Pattern Recognition, 1994.
3 S. M. Chai, T. M. Taha, D. S. Wills, and J. D. Meindl, "Heterogeneous architecture models for interconnect-motivated system design," IEEE Trans. VLSI Systems, special issue on system level interconnect prediction, vol. 8, no. 6, pp. 660-670, Dec. 2000.
4 R. B. Lee, "Subword Parallelism with MAX-2," IEEE Micro, vol. 16, no. 4, pp. 51-59, Aug. 1996.   DOI   ScienceOn
5 S. Oberman, G. Favor, F. Weber, "AMD 3DNow! technology: architecture and implementations," IEEE Micro, vol. 19, no. 2, pp. 37-48, Mar/Apr. 1999.   DOI   ScienceOn
6 A.D. Blas et. al., "The UCSC Kestrel Parallel Processor," IEEE Trans. on Parallel and Distributed Systems, vol. 16, no. 1, pp. 80-92, Jan. 2005.   DOI   ScienceOn
7 A. gentile and D. S. Wills, "Portable Video Supercomputing," IEEE Trans. on Computers, vol. 53, no. 8, pp. 960-973, Aug. 2004.   DOI   ScienceOn
8 A. Krikelis, I. P. Jalowiecki, D. Bean, R. Bishop, M. Facey, D. Boughton, S. Murphy, and M. Whitaker, "A programmable processor with 4096 processing units for media applications," in Proc. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pp. 937-940, May 2001.
9 L. W. Tucker, and G. G. Robertson, "Architecture and applications of the connection machine," IEEE Computer, vol. 21, no. 8, pp. 26-38, 1988.
10 "Connection machine model CM-2 technical summary," Thinking Machines Corp., version 51, May 1989.
11 L. V. Huynh, C.-H. Kim, J.-M. Kim, "A massively parallel algorithm for fuzzy vector quantization," Journal of Korea Information Processing Society, Vol. 16-A, No. 6, pp. 411-418, Dec. 2009.   DOI
12 A. Peleg and U. Weiser, "MMX Technology Extension to the Intel Architecture," IEEE Micro, vol.16, no. 4, pp. 42-50, Aug. 1996.   DOI   ScienceOn
13 J. C. Eble, V. K. De, D. S. Wills, and J. D. Meindl, "A generic system simulator (GENESYS) for ASIC technology and architecture beyond 2001," In Proc. of the Ninth Ann. IEEE Intl. ASIC Conf., pp. 193-196, Sept. 1996.
14 MarPar (MP-2) System Data Sheet. MarPar Corp oration, 1993.
15 M. K. Chung, S. M. Park, N. W. Eum, "Technology and trend of parallel processor," Electronics and Telecommunications Research Institute Trend Analysis, vol. 24, no. 6, Dec. 2009.
16 R. Bhargava, L. John, B. Evans, and R. Radhakrishnan, "Evaluating MMX technology using DSP and multimedia applications," in Proc. of IEEE/ACM Sym. on Microarchitecture, pp. 37-46, 1998.
17 N. Slingerland, and A. J. Smith, "Measuring the performance of multimedia instructionsets," IEEE Trans. on Computers, vol. 51, no. 11, pp. 1317-1332 , Nov. 2002.   DOI   ScienceOn
18 W. H. Chen, C. Smith, S. Fralick, A fast computational algorithm for the discrete cosine transform, IEEE Trans. Commun. 25 (9) (2002), pp. 1004-1009.
19 Long-Wen Chang, Ching-Yang Wang, Shiuh-Ming Lee, "Designing JPEG quantization tables based on human visual system," ICIP 99, vol. 2, pp. 376-380, 1999
20 V. Tiwari, S. Malik, and A. Wolfe, "Compilation Techniques for Low Energy: An Overview," in Proc. of the IEEE Intl. Symp. on Low Power Electron., pp. 38-39, Oct. 1994.
21 Wallace, G.K., "The JPEG still picture compression standard," IEEE Transactions on Consumer Electronics, vol 38. no 1, pp. 18 - 33 , Feb 1992.
22 J. Tyler, J. Lent, A. Mather, N. Huy, "AltiVec: bring vector technology to the PowerPC processor family," in IEEE International Performance, Computing, and Communications Conference, p. 437, Feb. 1999.
23 MIPS extension for digital media with 3D. Technical Report: http://www.mips.com, MIPS technologies, Inc., 1997.
24 P. Ranganathan, S. Adve, and N. P. Jouppi, "Performance of image and video processing with genera l-purpose processors and media ISA extensions," in Proc. of the 26th Intl. Sym. on Computer Architecture, pp. 124-135, May 1999.
25 M. Tremblay, J. M. O'Connor, V. Narayanan, and L. He, "VIS Speeds New Media Processing,"IEEE Micro, vol. 16, no. 4, pp. 10-20, Aug. 1996.   DOI   ScienceOn
26 S. K. Raman, V. Pentkovski, and J.Keshava, "Implementing streaming SIMD extensions on the pentium III processor," IEEE Micro, vol. 20, no. 4, pp.28-39, 2000.   DOI   ScienceOn
27 H. Nguyen and L. John, "Exploiting SIMD Paralle lism in DSP and Multimedia Algorithms using the AltiVec Technology," in Proc. Intl. Conf. on Supercomputer, pp. 11-20, June 1999.