Browse > Article
http://dx.doi.org/10.9708/jksci.2011.16.1.001

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing  

Choi, Byong-Kook (School of Electrical Engineering, University of Ulsan)
Kim, Cheol-Hong (Chonnam National University)
Kim, Jong-Myon (School of Electronics and Computer Engineering, University of Ulsan)
Abstract
Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.
Keywords
Many-core processor; image/video processing; data level parallelism;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Xilinx Vertex-4 FPGA XC4VLX60 data sheet, http://www.alldatasheet.net/ datasheet-pdf/pdf /152986/XILINX/XC4VLX60.html
2 M. Bolotski, R. Armithrajah, W. Chen, "ABACUS: A High Performance Architecture for Vision," in Proceedings of the International Conference on Pattern Recognition, 1994.
3 S. M. Chai, T. Taha, D. S. Wills, J. D. Meindl, "Heterogeneous Architecture Models for Interconnect- Motivated System Design," IEEE Trans. on VLSI Systems, vol. 8, no. 6, pp. 660-670, 2000.   DOI   ScienceOn
4 V. Tiwari, S. Malik, and A. Wolfe, "Compilation techniques for Low Energy: An Overview," in Proc. IEEE Intl. Symp. on Low Power Electrin., pp. 38-39, 1994.
5 V. Tiwari, S. Malik,and A. Wolfe, "Compilation Techniques for Low Energy: An Overview," in Proc. of the IEEE Intl. Symp. on Low Power Electron., pp. 38-39, Oct. 1994.
6 ARM 926EJ-S data sheet, http://www.arm.com/products/processors/classic/arm9/arm926.php.
7 ARM 1020E data sheet, http://www.hotchips.org/archives/hc13/2_Mon/02arm. pdf
8 A. Krikelis, I. P. Jalowiecki, D. Bean, R. Bishop, M. Facey, D. Boughton, S. Murphy, and M. Whitaker, "A programmable processor with 4096 processing units for media applications," in Proc. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pp. 937-940, May. 2001.
9 L. W. Tucker and G. G. Robertson, "Architecture and applications of the connection machine," IEEE Computer, vol. 21, no. 8, pp. 26-38, 1988.
10 "Connection machine model CM-2 technical summary," Thinking Machines Corp., version 51, May 1989.
11 MarPar (MP-2) System Data Sheet. MarPar Corporation, 1993.
12 M. J. Irwin, R. M. Owens, "A Two-Dimensional, Distributed Logic Processor," IEEE Trans. on Computers, vol. 40, no. 10, pp. 1094-1101, 1991.   DOI   ScienceOn
13 P. Ranganathan, S. Adve, and N. P. Jouppi, "Performance of image and video processing with general-purpose processors and media ISA extensions," in Proc. of the 26th Intl. Sym. on Computer Architecture, pp. 124-135, May. 1999.
14 R. Bhargava, L. John, B. Evans, and R. Radhakrishnan, "Evaluating MMX technology using DSP and multimedia applications," in Proc. of IEEE/ACM Sym. on Microarchitecture, pp. 37-46, 1998.
15 N. Slingerland and A. J. Smith, "Measuring the performance of multimedia instruction sets," IEEE Trans. on Computers, vol. 51, no. 11, pp. 1317-1332, Nov. 2002.   DOI   ScienceOn
16 A. D. Blas et. al, "The UCSC Kestrel Parallel Processor," IEEE Trans. on Parallel and Distributed Systems, vol. 16, no. 1, pp. 80-92, Jan. 2005.   DOI   ScienceOn
17 A. Gentile and D. S. Wills, "Portable Video Supercomputing," IEEE Trans. on Computers, vol. 53, no. 8, pp. 960-973, Aug. 2004.   DOI   ScienceOn
18 L. V. Huynh, C.-H. Kim, and J.-M. Kim, "A massively parallel algorithm for fuzzy vector quantization," The KIPS Transactions: PartA, vol. 16-A, no. 6, pp. 411-418, Dec. 2009.   DOI
19 TMS320C64x families, http://www.bdti.com/procsum/tic64xx.htm.
20 S.-H. Kim, S.-Y. Nam, and H.-J. Lim, "An improved area edge detection for real-time image processing," Journal of the Korea Society of Computer and Information, vol. 14, no. 1, pp. 99-106, Jan. 2009.
21 X.-G. Jiang, J.-Y. Zhou, J.-H. Shi, H.-H. Chen "FPGA Implementation of Image Rotation Using Modified Compensated CORDIC," in Proc. of 6th Intl. Conf. on ASIC, vol. 2, pp. 752-756, 2005.
22 E. B. Bourennane, S. Bouchoux, J. Miteran, M. Paindavoine, S. Bouillant, "Cost comparison of image rotation implementations on static and dynamic reconfigurable FPGAs," in Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP '02), vol. 3, pp. III-3176-3179, 2002.
23 S.-H. Lee, "The design and implementation of prallel processing system using the Nios(R) II embedded processor," Journal of the Korea Society of Computer and Information, vol. 14, no. 11, pp. 97-103, Nov. 2009.