DOI QR코드

DOI QR Code

Research on Big Data Integration Method

  • Received : 2016.12.08
  • Accepted : 2017.01.26
  • Published : 2017.01.31

Abstract

In this paper we propose the approach for big data integration so as to analyze, visualize and predict the future of the trend of the market, and that is to get the integration data model using the R language which is the future of the statistics and the Hadoop which is a parallel processing for the data. As four approaching methods using R and Hadoop, ff package in R, R and Streaming as Hadoop utility, and Rhipe and RHadoop as R and Hadoop interface packages are used, and the strength and weakness of four methods are described and analyzed, so Rhipe and RHadoop are proposed as a complete set of data integration model. The integration of R, which is popular for processing statistical algorithm and Hadoop contains Distributed File System and resource management platform and can implement the MapReduce programming model gives us a new environment where in R code can be written and deployed in Hadoop without any data movement. This model allows us to predictive analysis with high performance and deep understand over the big data.

Keywords

References

  1. Young-Im Cho, "Understanding Big Data and Its Main Issue," Journal of The Korean Association for Regional Information Society, Vol.16, No.3, pp.43-65, September 2013.
  2. Piyush Gupta, Pardeep Kumar, Girdhar Gopal, "Sentiment Analysis on Hadoop with Hadoop Streaming," International Journal of Computer Applications, Vol.121, No.11, July 2015.
  3. Youngjun Ko, Jinseog Kim, "Analysis of big data using Rhipe," Journal of the Korean Data & Information Science Society, Vol.24, No.5, pp.975-987, August 2013. https://doi.org/10.7465/jkdi.2013.24.5.975
  4. Anju Gahlawat, "Big Data Analysis using R and Hadoop," IJCEM International Journal of Computational Engineering & Management, Vol.17 No.5, September 2014.
  5. Jean-Pierre Dijcks, "Oracle: Big Data for the Enterprise," An Oracle White Paper, pp3-4, June 2013.
  6. "R Tools Evaluation," Telefonica, May 2015. http://madrid.r-es.org/wp-content/uploads/2015/05/R-Evaluation-2015.pdf
  7. Hedlund, B, "Understanding Hadoop clusters and the network," http://bradhedlund.com/2011/09/10/Understanding-Hadoop-clusters-and-the-network/.
  8. D. Adler, O. Nenadic, W. Zucchini, C. Glaser, "The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files," Institute for Statistics and Econometrics, August 2007.
  9. "Hadoop Streaming", The Apache Software Foundation, 2007. https://svn.apache.org/repos/asf/Hadoop/common/tags/release-0.18.2/docs/streaming.pdf,
  10. "The Revolution Anаlytics Perspective on Big Dаtа," http://www.revolutionаnаlytics.com
  11. "Big Data: the new 'The Future'," EPIC 2015. http://www.columbia.edu/-sjm2186/EPIC_R/EPIC_R_ BigData.pdf
  12. Manyika J., Chui M., Brown B., Bughin J., Dobbs R., Roxburgh C. and Byers A., "The next frontier for innovation, competition and productivity," McKinsey & Company, 2011.
  13. Zikopoulos P., Eaton C., Roos de D., Deutsch T. and Lapis G., "Understanding big data: Analytics for enterprise class Hadoop and streaming data," McGraw-Hill, 2012.
  14. David Corrigan, "Integrating and governing big data," IBM Software, White paper, January 2013.
  15. SungWoo Jang, "Oracle: Hadoop Based Processing," Oracle Korea, 2012. http://www.columbia.edu/-sjm2186/EPIC_R/EPIC_R_BigData.pdf
  16. "Implementation of MapReduce in R," DBguide.net, 2016, http://www.dbguide.net/db.db?cmd=view&boardUid=187501&boardConfigUid=9&categoryUid=216&boardIdx=162&boardStep=1
  17. Choonghyun You, "Technology Trends in Big Data Analytics and Introduction to R," NexR, Data Science Team, KRNet2012.
  18. Kevin, "Large Data Analysis Using Rhipe/RHadoop," ebay, Behavioral Insights and Science Team, Nov. 2013.