DOI QR코드

DOI QR Code

빅데이터 플랫폼을 위한 SON알고리즘 기반의 효과적인 연관 룰 마이닝

Efficient Association Rule Mining based SON Algorithm for a Bigdata Platform

  • Nguyen, Giang-Truong (Department of Electronics and Computer Engineering, Chonnam National University) ;
  • Nguyen, Van-Quyet (Department of Electronics and Computer Engineering, Chonnam National University) ;
  • Nguyen, Sinh-Ngoc (Department of Electronics and Computer Engineering, Chonnam National University) ;
  • Kim, Kyungbaek (Department of Electronics and Computer Engineering, Chonnam National University)
  • 투고 : 2017.09.21
  • 심사 : 2017.12.25
  • 발행 : 2017.12.31

초록

빅데이터 플랫폼에서, 연관 룰 마이닝 응용프로그램은 여러 가치를 창출할 수 있다. 예를 들어, 농업 빅데이터 플랫폼에서 농가 소득을 높일 수 있는 농작물들을 농업인들에게 추천할 수 있다. 이 연관 룰 마이닝의 주요 절차는 빈발 아이템셋 마이닝으로, 이는 동시에 나타나는 아이템의 셋을 찾는 작업이다. Apriori를 비롯한 이전 연구에서는 대규모의 가능한 아이템 셋에 의한 메모리 오버로드의 이유로 만족할 만한 성능을 보일 수 없었다. 이를 개선하고자, 아이템 셋을 작은 크기로 분할하여 순차적으로 계산하도록 하는 SON 알고리즘이 제안되었다. 하지만, 단일 머신에서 SON 알고리즘을 돌릴 경우 많은 시간이 소요된다. 이 논문에서는 하둡기반의 빅데이터 플랫폼에서 SON 알고리즘 병렬처리 방식을 이용한 연관룰 탐색 기법을 소개한다. 연관 룰 마이닝을 위한 전처리, SON 알고리즘 기반 빈발 아이템셋 마이닝, 그리고 연관룰 검출 절차를 Hadoop기반의 빅데이터 플랫폼에 구현하였다. 실제 데이터를 활용한 실험을 통해 제안된 연관 룰 마이닝 기법은 Brute Force 기법의 성능을 압도하는 것을 확인하였다.

In a big data platform, association rule mining applications could bring some benefits. For instance, in a agricultural big data platform, the association rule mining application could recommend specific products for farmers to grow, which could increase income. The key process of the association rule mining is the frequent itemsets mining, which finds sets of products accompanying together frequently. Former researches about this issue, e.g. Apriori, are not satisfying enough because huge possible sets can cause memory to be overloaded. In order to deal with it, SON algorithm has been proposed, which divides the considered set into many smaller ones and handles them sequently. But in a single machine, SON algorithm cause heavy time consuming. In this paper, we present a method to find association rules in our Hadoop based big data platform, by parallelling SON algorithm. The entire process of association rule mining including pre-processing, SON algorithm based frequent itemset mining, and association rule finding is implemented on Hadoop based big data platform. Through the experiment with real dataset, it is conformed that the proposed method outperforms a brute force method.

키워드

참고문헌

  1. https://www.growingmagazine.com/fruits/crop-selection/
  2. http://www.cropsreview.com/crop-selection.html
  3. WIREs Data Mining Knowl Discov 2012, 2: 437-456 doi:10.1002/widm.1074
  4. Agrawal, Rakesh, and Ramakrishnan Srikant. "Fast algorithms for mining association rules." Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. 1994.
  5. Savasere, Ashok, Edward Robert Omiecinski, and Shamkant B. Navathe. An efficient algorithm for mining association rules in large databases. Georgia Institute of Technology, 1995.APA
  6. Divide and conquer algorithm https://en.wikipedia.org/wiki/Divide_and_conquer_algorithm
  7. https://spark.apache.org/
  8. Van-Quyet Nguyen, Sinh Ngoc Nguyen, Kyungbaek Kim, "Design of a Platform for Collecting and Analyzing Agricultural Big Data", Journal of Digital Contents Society Vol.18 No.1 pp. 149-158, Feburary 28, 2017. https://doi.org/10.9728/dcs.2017.18.1.149
  9. Van-Quyet Nguyen, Sinh Ngoc Nguyen, Duc Tiep Vu, Kyungbaek Kim, "Design and Implementation of Big Data Platform for Image Processing in Agriculture", In Proceedings of KIPS Fall Conference November 04-05, 2016, Pusan National University, Busan, South Korea.
  10. Ngoc Nguyen-Sinh, Quyet Nguyen-Van, Kyungbaek Kim, "Design of Spark based Agricultural Big Data Analysis Platform", In Proceedings of KISM Spring Conference April 29-30, 2016, Silla University, Busan, South Korea.
  11. MapReduce function https://en.wikipedia.org/wiki/MapReduce
  12. Apache Hadoop http://hadoop.apache.org/

피인용 문헌

  1. Causal Relations and Temporal Interval Relations from Multiple Streams with Multiple Events vol.19, pp.12, 2018, https://doi.org/10.9728/dcs.2018.19.12.2403