Parallel Data Mining with Distributed Frequent Pattern Trees

분산형 FP트리를 활용한 병렬 데이터 마이닝

  • 조두산 (고려대학교 전기공학과) ;
  • 김동승 (고려대학교 전기공학과)
  • Published : 2003.07.01

Abstract

Data mining is an effective method of the discovery of useful information such as rules and previously unknown patterns existing in large databases. The discovery of association rules is an important data mining problem. We have developed a new parallel mining called Distributed Frequent Pattern Tree (abbreviated by DFPT) algorithm on a distributed shared nothing parallel system to detect association rules. DFPT algorithm is devised for parallel execution of the FP-growth algorithm. It needs only two full disk data scanning of the database by eliminating the need for generating the candidate items. We have achieved good workload balancing throughout the mining process by distributing the work equally to all processors. We implemented the algorithm on a PC cluster system, and observed that the algorithm outperformed the Improved Count Distribution scheme.

Keywords