DOI QR코드

DOI QR Code

Design of a Platform for Collecting and Analyzing Agricultural Big Data

농업 빅데이터 수집 및 분석을 위한 플랫폼 설계

  • Nguyen, Van-Quyet (Department of Electronics and Computer Engineering, Chonnam National University) ;
  • Nguyen, Sinh Ngoc (Department of Electronics and Computer Engineering, Chonnam National University) ;
  • Kim, Kyungbaek (Department of Electronics and Computer Engineering, Chonnam National University)
  • Received : 2016.12.21
  • Accepted : 2017.02.25
  • Published : 2017.02.28

Abstract

Big data have been presenting us with exciting opportunities and challenges in economic development. For instance, in the agriculture sector, mixing up of various agricultural data (e.g., weather data, soil data, etc.), and subsequently analyzing these data deliver valuable and helpful information to farmers and agribusinesses. However, massive data in agriculture are generated in every minute through multiple kinds of devices and services such as sensors and agricultural web markets. It leads to the challenges of big data problem including data collection, data storage, and data analysis. Although some systems have been proposed to address this problem, they are still restricted either in the type of data, the type of storage, or the size of data they can handle. In this paper, we propose a novel design of a platform for collecting and analyzing agricultural big data. The proposed platform supports (1) multiple methods of collecting data from various data sources using Flume and MapReduce; (2) multiple choices of data storage including HDFS, HBase, and Hive; and (3) big data analysis modules with Spark and Hadoop.

빅데이터는 경제개발에서 흥미로운 기회와 도전을 보여왔다. 예를 들어, 농업 분야에서 날씨 데이터 및 토양데이터와 같은 복합데이터의 조합과 이들의 분석 결과는 농업종사자 및 농업경영체들에게 귀중하고 도움되는 정보를 제공한다. 그러나 농업 데이터는 센서들과 농업 웹 마켓 등의 다양한 형태의 장치 및 서비스들을 통해 매 분마다 대규모로 생성된다. 이는 데이터 수집, 저장, 분석과 같은 빅데이터 이슈들을 발생시킨다. 비록 몇몇 시스템들이 이 문제를 해결하기 위해 제안되었으나, 이들은 다루는 데이터 종류의 제약, 저장 방식의 제약, 데이터 크기의 제약 등의 문제를 여전히 가지고 있다. 이 논문에서는 농업데이터의 수집과 분석 플랫폼의 새로운 설계를 제안한다. 제안하는 플랫폼은 (1) Flume과 MapReduce를 이용한 다양한 데이터 소스들로부터의 데이터 수집 방법, (2) HDFS, HBase, 그리고 Hive를 이용한 다양한 데이터 저장 방법, (3) Spark와 Hadoop을 이용한 빅데이터 분석 모듈들을 제공한다.

Keywords

References

  1. Ferrara, Emilio, et al. "Web data extraction, applications and techniques: a survey." Knowledge-based systems 70 (2014): 301-323. https://doi.org/10.1016/j.knosys.2014.07.007
  2. Geng, Hua, Qiang Gao, and Jingui Pan. "Extracting content for news web pages based on DOM." IJCSNS International Journal of Computer Science and Network Security 7.2 (2007): 124-129.
  3. Jonathan Hedley. "Jsoup: Java HTML Parser", https://jsoup.org/
  4. Wang, Jie, et al. "The crawling and analysis of agricultural products big data based on Jsoup." Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on. IEEE, 2015.
  5. Apache Flume, https://flume.apache.org/.
  6. Apache Hadoop, http://hadoop.apache.org (2009).
  7. Borthakur, Dhruba. "HDFS architecture guide." HADOOP APACHE PROJECT http://hadoop.apache.org/common/docs/current/hdfs design.pdf(2008):39.
  8. Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113. https://doi.org/10.1145/1327452.1327492
  9. Zaharia, Matei, et al. "Spark: Cluster Computing with Working Sets." HotCloud 10 (2010): 10-10.
  10. Gopalani, Satish, and Rohan Arora. "Comparing apache spark and map reduce with performance analysis using K-means." International Journal of Computer Applications 113.1 (2015).
  11. Seung-jun Choi, Jae-Won Park, Jong-Bae Kim and Jae-Hyun Choi, "A Quality Evaluation Model for Distributed Processing Systems of Big Data", Journal of Digital Contents Society, Vol. 15, Issue 4, pp 533-545, 2014 https://doi.org/10.9728/dcs.2014.15.4.533

Cited by

  1. A Content-based Audio Retrieval System Supporting Efficient Expansion of Audio Database vol.18, pp.5, 2017, https://doi.org/10.9728/dcs.2017.18.5.811
  2. 빅데이터 분석 기반의 정보 검색을 위한 웹 크롤러 서비스 구현 vol.18, pp.5, 2017, https://doi.org/10.9728/dcs.2017.18.5.933
  3. Enhancing the User Experience: A Research on China Mobile E-book App vol.18, pp.8, 2017, https://doi.org/10.9728/dcs.2017.18.8.1475
  4. A Study on South Korean Urban Regeneration Plan System : for strategic urban regeneration plans vol.18, pp.8, 2017, https://doi.org/10.9728/dcs.2017.18.8.1577
  5. Efficient Association Rule Mining based SON Algorithm for a Bigdata Platform vol.18, pp.8, 2017, https://doi.org/10.9728/dcs.2017.18.8.1593
  6. Accessing Impact of DCGAN Image Data Augmentation for CNN based Tomato Disease Classification vol.21, pp.5, 2017, https://doi.org/10.9728/dcs.2020.21.5.959
  7. 수산업 빅데이터 플랫폼 구축 방안에 대한 연구 vol.9, pp.8, 2020, https://doi.org/10.3745/ktccs.2020.9.8.181
  8. Designing of an Enterprise Resource Planning for the Optimal Management of Agricultural Plots Regarding Quality and Environmental Requirements vol.10, pp.9, 2017, https://doi.org/10.3390/agronomy10091352
  9. 방대한 IoT 장치 기반 환경에서 효율적인 빅데이터 수집 기법 설계 vol.14, pp.4, 2017, https://doi.org/10.17661/jkiiect.2021.14.4.300