Browse > Article
http://dx.doi.org/10.6109/jkiice.2015.19.9.2029

A Development Study of The VPT for the improvement of Hadoop performance  

Yang, Ill Deung (Department of Computer & Information Engineering, Cheongju University)
Kim, Seong Ryeol (Department of Computer & Information Engineering, Cheongju University)
Abstract
Hadoop MR(MapReduce) uses a partition function for passing the outputs of mappers to reducers. The partition function determines target reducers after calculating the hash-value from the key and performing mod-operation by reducer number. The legacy partition function doesn't divide the job effectively because it is so sensitive to key distribution. If the job isn't divided effectively then it can effect the total processing time of the job because some reducers need more time to process. This paper proposes the VPT(Virtual Partition Table) and has tested appling the VPT with a preponderance of data. The applied VPT improved three seconds on average and we figure it will improve more when data is increased.
Keywords
Hadoop; MapReduce; Partition Function;
Citations & Related Records
연도 인용수 순위
  • Reference
1 David S. Touretzky, "COMMON LISP: A Gentle Introduction to Symbolic Computation", The Benjamin/Cummings Publishing Company, 1990.
2 Tom White, “Hadoop : The Definitive Guide", OREILLY, 2011.
3 Dhruba Borthakur and the eight members, “Apache Hadoop Goes Realtime at Facebook”, SIGMOD’11, June 12-16, 2011.
4 Sanjay Ghemawat and the two members, "The Google File System", Google, 2003.
5 Konstantin Shvachko and the three members, "The Hadoop Distributed File System", IEEE, 2010.
6 Nandhini.C, Premadevi.P, “A Micro Partitioning Technique in MapReduce for Massive Data Analysis”, International Journal of Innovative Research in Computer and Communication Engineering Vol. 2, Issue 3, March 2014.
7 Kenn Slagter and three members, "An improved partitioning mechanism for optimizing massive data analysis using MapReduce", Springer Science Business Media New York 2013, J Supercomput (2013) 66:539-555.
8 http://wiki.apache.org/hadoop/PoweredBy
9 http://www.gutenberg.org/ebooks/18525?msg=welcome_stranger
10 Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, OSDI, 2004, pp 137-150.