[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/KJAS.2014.27.7.1171

Rhipe Platform for Big Data Processing and Analysis

Jung, Byung Ho (Department of Information Statistics, Gyeongsang National University)
Shin, Ji Eun (Department of Information Statistics, Gyeongsang National University)
Lim, Dong Hoon (Department of Information Statistics, Gyeongsang National University)

Publication Information

The Korean Journal of Applied Statistics / v.27, no.7, 2014 , pp. 1171-1185 More about this Journal

Abstract

Rhipe that integrates R and Hadoop environment, made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data and simulated data. Experimental results for comparing the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster, showed fully-distributed mode was more fast than pseudo-distributed mode and computing speeds of fully-distributed mode were faster as the number of data nodes increases. We also compared the performance of our Rhipe with stats and biglm packages available on bigmemory. The results showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords

Big data; R; Hadoop; Rhipe; multiple regression analysis;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Kane, M. J. and Emerson, J. W. (2010a). bigmemory: Manage massive matrices with shared memory and memory-mapped files, Rpackage version 4.2.3.
2	Kane, M. J. and Emerson, J. W. (2010b). biganalytics: A library of utilities for big.matrix objects of package bigmemory , R package version 1.0.12.
3	Laney, D. (2001)., 3D Data Management: Controlling Data Volume, Velocity, and Variety. META Group.
4	Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C. and Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute.
5	Prajapati, V. (2013). Big data analytics with R and Hadoop, Packt Publishing Ltd, Birmingham, UK.
6	Sammer, E.(2012). Hadoop Operations, O'Reilly Media, Inc, Sebastopol, CA.
7	White, T. (2012). Hadoop: The Definitive Guide. O'Reilly Media, Inc, Sebastopol, CA.
8	고영준, 김진석. (2013). Rhipe를 활용한 빅데이터 처리 및 분석, 한국데이터정보과학회지, 24(5), 975-987. 과학기술학회마을 DOI
9	ASA data expo. (2009). http://stat-computing.org/dataexpo/2009/the-data.html
10	Adler, D., Nenadic, O. Zucchini, W. and Glaser, C. (2007). The ff package: Handling large data sets in R with memory mapped pages of binary flat files, UseR2007, http://www.r-project.org/conferences/useR-2007/program/presentations/adler.pdf
11	Ciliendo, E., Kunimasa, T. and Braswell, B. (2007). Linux Performance and Tuning Guidelines, IBM.
12	Guha, S. (2010). Computing environment for the statistical analysis of large and complex data. PhD thesis, Department of Statistics, Purdue University, West Lafayette.
13	Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B. and Cleveland, W. S. (2012). Large complex data: divide and recombine (D&R) with RHIPE. Stat, 191, 53-67.
14	Hafen, R., Gibson, T., Dam, K. K. and Critchlow, T. (2014). Power grid data analysis with R and Hadoop in Data Mining Applications with R, pp. 1-34.
15	Lin, H., Yang, S. and Midkiff, S. P. (2013). A Parallel R Framework for Processing Large Dataset on Distributed Systems, DataCloud.

4	Seung-Hyeok Shin. (2015) The Journal of Korea Navigation Institute A Design and Implementation of Web-based System for Real-Time Infographics of Airport Refueling Facilities / 19 (4) , 305
5	Ji Eun Shin. (2015) Journal of the Korean Data and Information Science Society Big data distributed processing system using RHadoop / 26 (5) , 1155
4	Byung Ho Jung. (2016) Journal of the Korean Data and Information Science Society Learning algorithms for big data logistic regression on RHIPE platform / 27 (4) , 911
3	Ji Eun Shin. (2016) Journal of the Korean Data and Information Science Society RHadoop platform for K-Means clustering of big data / 27 (3) , 609

KSCI

Rhipe Platform for Big Data Processing and Analysis 빅데이터 처리 및 분석을 위한 Rhipe 플랫폼

Rhipe Platform for Big Data Processing and Analysis