Browse > Article
http://dx.doi.org/10.7465/jkdi.2015.26.5.1129

An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines  

Song, Dong Ho (Software University, Korea Aerospace University)
Shin, Ji Ae (SoftOnNet)
In, Yean Jin (SoftOnNet)
Lee, Wan Gon (Computer Science Department, Sungsil University)
Lee, Kang Se (Computer Science Department, Sungsil University)
Publication Information
Journal of the Korean Data and Information Science Society / v.26, no.5, 2015 , pp. 1129-1139 More about this Journal
Abstract
Inference process generates additional triples from knowledge represented in RDF triples of semantic web technology. Tens of million of triples as an initial big data and the additionally inferred triples become a knowledge base for applications such as QA(question&answer) system. The inference engine requires more computing resources to process the triples generated while inferencing. The additional computing resources supplied by underlying resource pool in cloud computing can shorten the execution time. This paper addresses an algorithm to allocate the number of computing nodes "elastically" at runtime on Hadoop, depending on the size of knowledge data fed. The model proposed in this paper is composed of the layered architecture: the top layer for applications, the middle layer for distributed parallel inference engine to process the triples, and lower layer for elastic Hadoop and server visualization. System algorithms and test data are analyzed and discussed in this paper. The model hast the benefit that rich legacy Hadoop applications can be run faster on this system without any modification.
Keywords
Bigdata platform; distributed parallel inference engine; elastic Hadoop system; server virtualization;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Agarwal, S., Kandula, S., Bruno, N., Wu, M.C., Stoica, I. and Zhou, J. (2012). Re-optimizing data-parallel computing. In Proceedings of USENIX Symposium on Networked Systems Design and Implementation, San Jose, USA.
2 Antoniou, G., Groth, P., Van Harmelen, F. and Hoekstra, R. (2012). A semantic web primer, 3rd Ed., The MIT Press, Cambridge, Massachusetts, London, England.
3 Go, Y. and Kim, J. (2013). Bigdata processing and analysis using rhipe. Journal of the Korean Data & Information Science Society, 24, 975-987.   DOI   ScienceOn
4 Lee, W. G., Kim, J. M. and Park, Y. T. (2014). (2014). Distributed table join for scalable RDFS reasoning on cloud computing environment. Journal of KIISE, 41, 674-685.   DOI   ScienceOn
5 Lee, W. G and Park, Y. T. (2015). ABox realization reasoning in distributed in-memory system. Journal of KIISE, 42, 852-859.   DOI
6 Park, J., Lee S., Kang, D. and Won, J. (2013). Hadoop and Mapreduce. Journal of the Korean Data & Information Science Society, 24, 1013-1027.   DOI   ScienceOn
7 Pastorelli, M., Barbuzzi, A., Carra, D., Dell'Amico, M. and Michiardi, P. (2013). HFSP: Size-based scheduling for Hadoop. In Proceedings of IEEE International Conference on Big Data. Silicon Valley, CA, USA.
8 Song, D. (2015). Annual report on distributed parallel infernce platform for large scale knowledge processing, 13-912-03-005, IITP, Korea.
9 Verma, A., Cherkasova, L. and Campbell, R. H. (2011). Aria: Automatic resource inference and allocation for MapReduce environments. In Proceedings of International Conference on Automation and computing, Huddersfield, United Kingdom.
10 Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker D. and Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, USENIX Association Berkeley, CA, USA.