Browse > Article
http://dx.doi.org/10.15207/JKCS.2021.12.7.045

Design of Distributed Hadoop Full Stack Platform for Big Data Collection and Processing  

Lee, Myeong-Ho (School of Information Communication, Semyung University)
Publication Information
Journal of the Korea Convergence Society / v.12, no.7, 2021 , pp. 45-51 More about this Journal
Abstract
In accordance with the rapid non-face-to-face environment and mobile first strategy, the explosive increase and creation of many structured/unstructured data every year demands new decision making and services using big data in all fields. However, there have been few reference cases of using the Hadoop Ecosystem, which uses the rapidly increasing big data every year to collect and load big data into a standard platform that can be applied in a practical environment, and then store and process well-established big data in a relational database. Therefore, in this study, after collecting unstructured data searched by keywords from social network services based on Hadoop 2.0 through three virtual machine servers in the Spring Framework environment, the collected unstructured data is loaded into Hadoop Distributed File System and HBase based on the loaded unstructured data, it was designed and implemented to store standardized big data in a relational database using a morpheme analyzer. In the future, research on clustering and classification and analysis using machine learning using Hive or Mahout for deep data analysis should be continued.
Keywords
Mobile Frist Strategy; Big Data; Hadoop Ecosystem; Spring Framework; HDFS; Morpheme Analyzer;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Wikipedia. Spring Framework. https://en.wikipedia.org/wiki/Spring_Framework
2 I. M. Lee. (2012). Spring 3.1 of Toby(Vol. 1). Seoul : Acorn.
3 Wikipedia. Web scraping. https://en.wikipedia.org/wiki/Web_scraping
4 SNU IDC. Kkma. http://kkma.snu.ac.kr/
5 SWRC. Han nanum. http://semanticweb.kaist.ac.kr/hannanum/
6 Kakao Tech. Khaiii. https://tech.kakao.com/2018/12/13/khaiii/
7 Wikipedia. VirtualBox. https://www.virtualbox.org/wiki/VirtualBox
8 Naver Developers. Naver Open API. https://developers.naver.com/docs/search/blog/
9 Twitter. Open Korea Text(Okt). https://github.com/open-korean-text/open-korean-text
10 IDC. (2020. May). Worldwide Global DataSphere Forecast. Global DataSphere. https://www.idc.com/getdoc.jsp?containerId=prUS46286020
11 Domo. (2020. August). Data Never Sleeps 8.0. https://www.domo.com/news/press/domo-releases-eighth-annual-data-never-sleeps-infographic
12 Lori Lewis. (2020). This Is What Happens In An Internet Minute. https://lorilewismedia.com/
13 visually. (2020). This Is What Happens In An Internet Minute. https://visual.ly/community/Infographics/other/what-happens-internet-minute-2020
14 Shineware. Komoran. https://www.shineware.co.kr/products/komoran/
15 J. H. Park, S. Y. Lee, D. H. Kang & J. H. Won. (2013). Hadop and MapReduce. Journal of Korea Data & Information Science Society, 24(5), 1013-1027.   DOI
16 Apache Airflow, https://airflow.apache.org/
17 Wikipedia. Apache Mahout. https://en.wikipedia.org/wiki/Machine_learning
18 Data Flair. https://data-flair.training/blogs/hadoop-2-vshadoop-3/
19 MyBatis. (2021. April). https://mybatis.org/mybatis-3/index.html
20 H. J. Kim. (2015). Design and Implementation of an Efficient Web Services Data Processing Using Hadoop-Based Big Data Processing, Journal of the Korea Academia-Industrial cooperation Society, 16(1), 726-734. DOI : 10.5762/kais.2015.16.1.726   DOI
21 Wikipedia. Esper. https://en.wikipedia.org/wiki/Esper_(software)
22 S. B. Heo, D. C. Kang & J. Y. Choi. (2019). Technology Trends of Deep Learning Framework on Hadoop YARN. Communications of the Korean Institute of Information Scientists and Engineers, 37(10), 25-31.
23 S. B. Heo, D. C. Kang & J. Y. Choi. (2019). Hadoop based Deep Learning Framework Technology Trend. Communications of the Korean Institute of Information Scientists and Engineers, 37(10), 25-31.
24 Apache Hadoop 2.10.1, https://hadoop.apache.org/docs/r2.10.1/
25 National Information Society Agency. eGovFrame. https://www.egovframe.go.kr/home/sub.do?menuNo=32
26 Wikipedia. Apache Kafka. https://en.wikipedia.org/wiki/Apache_Kafka
27 Apache HBase. https://hbase.apache.org/
28 Wikipedia. Redis. https://en.wikipedia.org/wiki/Redis
29 Wikipedia.. Apache Spark. https://en.wikipedia.org/wiki/Apache_Spark
30 Wikipedia. Apache Hive. https://en.wikipedia.org/wiki/Apache_Hive
31 Wikipedia. Apache Sqoop. https://en.wikipedia.org/wiki/Sqoop
32 Wikipedia. Machine Learning. https://en.wikipedia.org/wiki/Machine_learning
33 Wikipedia. Apache Impala. https://en.wikipedia.org/wiki/Apache_Impala
34 Apache Zookeeper. http://zookeeper.apache.org/
35 Wikipedia. Apache Storm. https://en.wikipedia.org/wiki/Apache_Storm
36 Wikipedia. Apache Flume. https://en.wikipedia.org/wiki/Apache_Flume
37 Atlassian. Bitbucket, MeCab-ko. https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/
38 Wikipedia. Apache Oozie. https://en.wikipedia.org/wiki/Apache_Oozie
39 National Information Society Agency. Publicdata Portal, https://www.data.go.kr/
40 Apache Hadoop 3.1.4, https://hadoop.apache.org/docs/r3.1.4/