• Title/Summary/Keyword: 트리 마이닝

Search Result 129, Processing Time 0.026 seconds

Parallel Data Mining with Distributed Frequent Pattern Trees (분산형 FP트리를 활용한 병렬 데이터 마이닝)

  • 조두산;김동승
    • Proceedings of the IEEK Conference
    • /
    • 2003.07c
    • /
    • pp.2561-2564
    • /
    • 2003
  • Data mining is an effective method of the discovery of useful information such as rules and previously unknown patterns existing in large databases. The discovery of association rules is an important data mining problem. We have developed a new parallel mining called Distributed Frequent Pattern Tree (abbreviated by DFPT) algorithm on a distributed shared nothing parallel system to detect association rules. DFPT algorithm is devised for parallel execution of the FP-growth algorithm. It needs only two full disk data scanning of the database by eliminating the need for generating the candidate items. We have achieved good workload balancing throughout the mining process by distributing the work equally to all processors. We implemented the algorithm on a PC cluster system, and observed that the algorithm outperformed the Improved Count Distribution scheme.

  • PDF

Adaptive Decision Tree Algorithm for Data Mining in Real-Time Machine Status Database (실시간 기계 상태 데이터베이스에서 데이터 마이닝을 위한 적응형 의사결정 트리 알고리듬)

  • Baek, Jun-Geol;Kim, Kang-Ho;Kim, Sung-Shick;Kim, Chang-Ouk
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.26 no.2
    • /
    • pp.171-182
    • /
    • 2000
  • For the last five years, data mining has drawn much attention by researchers and practitioners because of its many applicable domains. This article presents an adaptive decision tree algorithm for dynamically reasoning machine failure cause out of real-time, large-scale machine status database. Among many data mining methods, intelligent decision tree building algorithm is especially of interest in the sense that it enables the automatic generation of decision rules from the tree, facilitating the construction of expert system. On the basis of experiment using semiconductor etching machine, it has been verified that our model outperforms previously proposed decision tree models.

  • PDF

A Reliable Prediction of User-Behavior Patterns Mined from the ACL- Based Data (에이전트 커뮤니케이션 언어 마이닝을 통한 신뢰성있는 사용자 행동 패턴 예측)

  • Lee, Seung-Cheol;Paik, Ju-Ryon;Kim, Ung-Mo
    • Annual Conference of KIPS
    • /
    • 2006.11a
    • /
    • pp.373-376
    • /
    • 2006
  • 저비용, 네트워크화 된 센서들, 언제 어디서나 쉬운 인터넷 사용과 같은 컴퓨팅 환경의 진화는 우리의 일상생활 속으로 진정한 모바일 환경을 실현 가능하게 만든다. 이런 모바일 환경의 발달은 다양한 모바일 에이전트들을 양산하며 사용자의 편의를 극대화 할 수 있도록 한다. 모바일 에이전트들은 사용자 정보, 주변 환경정보, 컴퓨팅 정보 또는 애플리케이션 정보 등을 XML 기반 표준 언어인 ACML(Agent Communication Markup Language)로 저장한 후 상호교환 및 분석을 하게 된다. 기존 테이블 형태의 정보를 기반으로 사용자의 행동패턴을 분석 및 예측했던 시스템과는 달리 에이전트 환경에서의 사용자 행동패턴 분석 및 예측은 트리구조를 대상으로 하기 때문에 새로운 방법이 요구된다. 본 논문에서 제안한 기법은 XML 기반 표준 언어인 ACML로 저장된 정보를 사용자의 상황(context)에 적합하도록 고려하여 언제, 어디서나 원하는 정보를 자동적으로 사용자에게 제공할 수 있도록 한다.

  • PDF

A Study for XML DTD Matching Method using Inlining Algorithm (Inlining 알고리즘을 이용한 XML DTD 매칭 방법에 관한 연구)

  • Heo, Bo-Jin;Kim, Hyeong-Seok;Kim, Chang-Suk
    • Annual Conference of KIPS
    • /
    • 2003.11c
    • /
    • pp.1505-1508
    • /
    • 2003
  • XML DTD 매칭은 데이터 통합이나 데이터 웨어하우스, 웹 마이닝, 전자상거래, 의미적 질의 처리등과 같은 데이터베이스 관련 응용분야에서 수행해야 할 근본적인 연구 분야이다. 웹이 발전됨에 따라 웹 상의 데이터 교환의 표준인 XML로 많은 데이터를 표현하게 되었고, 이 XML DTD에 대한 매칭이 주된 연구 분야로 대두되었다. XML 스키마는 플랫 구조인 기존의 관계형 데이터베이스 스키마와는 달리 계층적인 트리 구조로 이루어져 DTD를 직접 비교하기가 어렵다. 본 논문에서는 계층적 구조인 XML DTD의 계층적 구조 정보와 무결성 제약조건을 추출하여 일차원적인 직렬 구조로 변환한 후, 유사한 DTD를 매칭하는 방법을 제안한다.

  • PDF

Matching Agent using Automatic Weight-Control (가중치 자동 조절을 이용한 매칭 에이전트)

  • 김동조;박영택
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.11a
    • /
    • pp.439-445
    • /
    • 2000
  • 다차원의 속성들을 포함한 대용량의 데이터베이스 또는 점보 저장소의 데이터로부터 지식을 추출하고 이를 활용하기 위해서는 데이터 마이닝의 인공지능 기법 중 기계학습을 활용할 수 있다. 본 논문은 질의어를 바탕으로 각 작성들에 가중치를 적용하여 사용자가 원하는 데이터 집합을 분류하고, 사용자 피드백을 통하여 속성 가중치를 동적으로 변화시킴으로써 검색결과를 향상시키는 방법을 제안한다. 본 논문에서는 데이터 집합을 분류해내기 위해서 각 속성간의 거리에 가중치를 적용하는 k-nearest neighbor 분류법을 사용하였고, 속성 가중치를 동적으로 변화시키는 규칙을 추출하기 위한 방법으로는 결정 트리 생성에 의한 규칙(decision rule) 생성 방법을 적용하였다. 검색결과 향상을 \ulcorner이기 위한 실험으로써 온라인 커플매칭(online couple-matching) 시스템의 핵심부문을 구현하고 이를 적용하였다.

  • PDF

Efficient Mining of Frequent Itemsets in a Sparse Data Set (희소 데이터 집합에서 효율적인 빈발 항목집합 탐사 기법)

  • Park In-Chang;Chang Joong-Hyuk;Lee Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.12D no.6 s.102
    • /
    • pp.817-828
    • /
    • 2005
  • The main research problems in a mining frequent itemsets are reducing memory usage and processing time of the mining process, and most of the previous algorithms for finding frequent itemsets are based on an Apriori-property, and they are multi-scan algorithms. Moreover, their processing time are greatly increased as the length of a maximal frequent itemset. To overcome this drawback, another approaches had been actively proposed in previous researches to reduce the processing time. However, they are not efficient on a sparse .data set This paper proposed an efficient mining algorithm for finding frequent itemsets. A novel tree structure, called an $L_2$-tree, was proposed int, and an efficient mining algorithm of frequent itemsets using $L_2$-tree, called an $L_2$-traverse algorithm was also proposed. An $L_2$-tree is constructed from $L_2$, i.e., a set of frequent itemsets of size 2, and an $L_2$-traverse algorithm can find its mining result in a short time by traversing the $L_2$-tree once. To reduce the processing more, this paper also proposed an optimized algorithm $C_3$-traverse, which removes previously an itemset in $L_2$ not to be a frequent itemsets of size 3. Through various experiments, it was verified that the proposed algorithms were efficient in a sparse data set.

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.155-162
    • /
    • 2008
  • Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

An Efficient Approach for Single-Pass Mining of Web Traversal Sequences (단일 스캔을 통한 웹 방문 패턴의 탐색 기법)

  • Kim, Nak-Min;Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.221-227
    • /
    • 2010
  • Web access sequence mining can discover the frequently accessed web pages pursued by users. Utility-based web access sequence mining handles non-binary occurrences of web pages and extracts more useful knowledge from web logs. However, the existing utility-based web access sequence mining approach considers web access sequences from the very beginning of web logs and therefore it is not suitable for mining data streams where the volume of data is huge and unbounded. At the same time, it cannot find the recent change of knowledge in data streams adaptively. The existing approach has many other limitations such as considering only forward references of web access sequences, suffers in the level-wise candidate generation-and-test methodology, needs several database scans, etc. In this paper, we propose a new approach for high utility web access sequence mining over data streams with a sliding window method. Our approach can not only handle large-scale data but also efficiently discover the recently generated information from data streams. Moreover, it can solve the other limitations of the existing algorithm over data streams. Extensive performance analyses show that our approach is very efficient and outperforms the existing algorithm.

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method (EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.59-72
    • /
    • 2012
  • Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.

Multiple SVM Classifier for Pattern Classification in Data Mining (데이터 마이닝에서 패턴 분류를 위한 다중 SVM 분류기)

  • Kim Man-Sun;Lee Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.289-293
    • /
    • 2005
  • Pattern classification extracts various types of pattern information expressing objects in the real world and decides their class. The top priority of pattern classification technologies is to improve the performance of classification and, for this, many researches have tried various approaches for the last 40 years. Classification methods used in pattern classification include base classifier based on the probabilistic inference of patterns, decision tree, method based on distance function, neural network and clustering but they are not efficient in analyzing a large amount of multi-dimensional data. Thus, there are active researches on multiple classifier systems, which improve the performance of classification by combining problems using a number of mutually compensatory classifiers. The present study identifies problems in previous researches on multiple SVM classifiers, and proposes BORSE, a model that, based on 1:M policy in order to expand SVM to a multiple class classifier, regards each SVM output as a signal with non-linear pattern, trains the neural network for the pattern and combine the final results of classification performance.