• Title/Summary/Keyword: 빈발패턴트리 마이닝

Search Result 25, Processing Time 0.038 seconds

Ontology based Retrieval System for Shopping Sites Customer (온톨로지 기반의 쇼핑 사이트 고객을 위한 검색 시스템)

  • Gu Mi-Sug;Hwang Jeong-Hee;Ryu Keun-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.11a
    • /
    • pp.51-54
    • /
    • 2004
  • 시멘틱 웹은 기존의 웹과는 달리 정보의 의미가 정의되고, 이들 간의 의미적 연결을 지원한다는 특징이 있어서, 최근 차세대 웹으로 부각되고 있다. 이러한 의미적 연결을 위해서 시맨틱 웹의 기반인 온톨로지가 필요하다. 온톨로지는 리소스에 대한 메타데이터를 정의하여 의미적 연결이 가능하게 하므로 효율적인 정보 검색이 가능하다. 이 논문에서는 정보 검색의 효율을 증가시키기 위해서 시맨틱 웹의 핵심인 온톨로지 기반의 정보 검색 시스템을 제안한다. 쇼핑 사이트에서 효율적인 마케팅을 위해 사용자의 구매 패턴을 조사하여 고객에게 알맞은 정보 추천을 하기 위한 것을 목적으로 한다. 온톨로지의 구축은 XTM을 기반으로 토픽맵을 이용하였다. 그리고 온톨로지를 기반으로, 사용자의 구매패턴을 찾아서 정확한 정보 전달을 위해서 데이터 마이닝 기법을 이용하였다. 빈발패턴 트리 기법을 기반으로 하는 멀티 레벨 멀티 디멘션 빈발 패턴 마이닝 알고리즘을 이용하여 사용자 패턴을 분석하여 정보 검색에 효율을 기하였다.

  • PDF

Efficient Dynamic Weighted Frequent Pattern Mining by using a Prefix-Tree (Prefix-트리를 이용한 동적 가중치 빈발 패턴 탐색 기법)

  • Jeong, Byeong-Soo;Farhan, Ahmed
    • The KIPS Transactions:PartD
    • /
    • v.17D no.4
    • /
    • pp.253-258
    • /
    • 2010
  • Traditional frequent pattern mining considers equal profit/weight value of every item. Weighted Frequent Pattern (WFP) mining becomes an important research issue in data mining and knowledge discovery by considering different weights for different items. Existing algorithms in this area are based on fixed weight. But in our real world scenarios the price/weight/importance of a pattern may vary frequently due to some unavoidable situations. Tracking these dynamic changes is very necessary in different application area such as retail market basket data analysis and web click stream management. In this paper, we propose a novel concept of dynamic weight and an algorithm DWFPM (dynamic weighted frequent pattern mining). Our algorithm can handle the situation where price/weight of a pattern may vary dynamically. It scans the database exactly once and also eligible for real time data processing. To our knowledge, this is the first research work to mine weighted frequent patterns using dynamic weights. Extensive performance analyses show that our algorithm is very efficient and scalable for WFP mining using dynamic weights.

Mining Frequent Contiguous Sequence Patterns in Biological Sequences (생물학적 서열들에서 빈발한 연속 서열 패턴 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06b
    • /
    • pp.27-31
    • /
    • 2007
  • 생물학적 서열 데이터는 크게 DNA 염기 서열과 단백질 아미노산 서열이 있다. 이들 서열은 일반적으로 많은 수의 항목들을 가지고 있어 그 길이가 매우 길다. 생물학적 데이터 서열들에는 보통 빈번하게 발생하는 부분 연속 서열들이 존재하는데 이들 서열들을 찾아내는 것은 다양한 서열 분석에서 유용하게 사용될 수 있다. 이를 위해 초기에는 Apriori 알고리즘을 기반으로 하는 순차패턴 마이닝 알고리즘들을 활용하는 방법들이 많이 제시되었다. 그중 PrefixSpan 알고리즘은 Apriori기반의 가장 효율적인 순차패턴 마이닝 기법이다. 하지만 이 알고리즘은 길이-1인 빈발 패턴들로부터 서열 패턴을 확장해나가는 방식으로 길이가 긴 연속 서열을 포함하는 생물학적 데이터 서열들에 대한 검색방법으로는 적합하지 않다. 최근에는 기존의 PrefixSpan방식을 이용하면서도 반복적인 처리과정을 줄인 MacosVSpan이 제안되었다. 하지만 이 알고리즘 또한 원본 데이터베이스보다 크기가 큰 별도의 프로젝션 데이터베이스를 사용함으로서 많은 비용부담이 발생하고 특히 길이가 긴 서열에 대해서는 더욱 효율적이지 못하다. 이에 본 논문에서 많은 양의 생물학적 데이터 서열들로부터 빈번한 연속서열을 고정길이 확장 트리를 이용하여 효과적으로 찾아내는 방법을 제안한다. 그리고 다양한 환경에서 실험을 통해 제안하는 방식이 MacosVSpan알고리즘에 비해 검색 성능이 우수함을 증명한다.

  • PDF

Frequently Occurred Information Extraction from a Collection of Labeled Trees (라벨 트리 데이터의 빈번하게 발생하는 정보 추출)

  • Paik, Ju-Ryon;Nam, Jung-Hyun;Ahn, Sung-Joon;Kim, Ung-Mo
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.65-78
    • /
    • 2009
  • The most commonly adopted approach to find valuable information from tree data is to extract frequently occurring subtree patterns from them. Because mining frequent tree patterns has a wide range of applications such as xml mining, web usage mining, bioinformatics, and network multicast routing, many algorithms have been recently proposed to find the patterns. However, existing tree mining algorithms suffer from several serious pitfalls in finding frequent tree patterns from massive tree datasets. Some of the major problems are due to (1) modeling data as hierarchical tree structure, (2) the computationally high cost of the candidate maintenance, (3) the repetitious input dataset scans, and (4) the high memory dependency. These problems stem from that most of these algorithms are based on the well-known apriori algorithm and have used anti-monotone property for candidate generation and frequency counting in their algorithms. To solve the problems, we base a pattern-growth approach rather than the apriori approach, and choose to extract maximal frequent subtree patterns instead of frequent subtree patterns. The proposed method not only gets rid of the process for infrequent subtrees pruning, but also totally eliminates the problem of generating candidate subtrees. Hence, it significantly improves the whole mining process.

  • PDF

Efficient Mining of Frequent Itemsets in a Sparse Data Set (희소 데이터 집합에서 효율적인 빈발 항목집합 탐사 기법)

  • Park In-Chang;Chang Joong-Hyuk;Lee Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.12D no.6 s.102
    • /
    • pp.817-828
    • /
    • 2005
  • The main research problems in a mining frequent itemsets are reducing memory usage and processing time of the mining process, and most of the previous algorithms for finding frequent itemsets are based on an Apriori-property, and they are multi-scan algorithms. Moreover, their processing time are greatly increased as the length of a maximal frequent itemset. To overcome this drawback, another approaches had been actively proposed in previous researches to reduce the processing time. However, they are not efficient on a sparse .data set This paper proposed an efficient mining algorithm for finding frequent itemsets. A novel tree structure, called an $L_2$-tree, was proposed int, and an efficient mining algorithm of frequent itemsets using $L_2$-tree, called an $L_2$-traverse algorithm was also proposed. An $L_2$-tree is constructed from $L_2$, i.e., a set of frequent itemsets of size 2, and an $L_2$-traverse algorithm can find its mining result in a short time by traversing the $L_2$-tree once. To reduce the processing more, this paper also proposed an optimized algorithm $C_3$-traverse, which removes previously an itemset in $L_2$ not to be a frequent itemsets of size 3. Through various experiments, it was verified that the proposed algorithms were efficient in a sparse data set.

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.11a
    • /
    • pp.645-648
    • /
    • 2006
  • 생물학적 데이터 서열에는 크게 DNA 서열과 단백질 서열이 있다. 이들 서열 데이터들은 여러 데이터베이스에 걸쳐 매우 방대한 양을 가지고 있으며, 각각의 서열은 수백 또는 수천 개의 항목들을 가지고 있어 길이가 매우 길다. 일반적으로 유전적인 변형, 또는 변이로부터 보존된 영역이나 특정 패턴들을 서열 안에 포함하고 있는데 생물학적 서열 데이터에서 보존된 영역이나 패턴들은 계통발생학적 근거로 활용 될 수도 있으며 기능과 밀접한 관계를 가지기도 한다. 따라서 서열들로부터 빈번하게 발생하는 패턴을 발견하고자 하는 알고리즘 개발이 요구되고 있다. 초창기 Apriori 알고리즘을 변형하여 빈발 패턴을 발견하고자 하는 노력들로부터 근래에는 PrefixSpan 트리를 이용하여 효과적으로 성능을 개선하고 있지만 아직까지는 여러 번의 데이터베이스 접근이 요구되고 있어 성능저하가 발생한다. 이에 본 논문에서는 접미사 트리를 변형하여 데이터베이스 접근을 획기적으로 줄이고 많은 서열들로부터 빈번하게 발생하는 연속적인 서열을 효과적으로 발견하는 방법을 제안한다.

  • PDF

High Utility Pattern Mining using a Prefix-Tree (Prefix-Tree를 이용한 높은 유틸리티 패턴 마이닝 기법)

  • Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan;Lee, In-Gi;Yong, Hwan-Seong
    • Journal of KIISE:Databases
    • /
    • v.36 no.5
    • /
    • pp.341-351
    • /
    • 2009
  • Recently high utility pattern (HUP) mining is one of the most important research issuer in data mining since it can consider the different weight Haloes of items. However, existing mining algorithms suffer from the performance degradation because it cannot easily apply Apriori-principle for pattern mining. In this paper, we introduce new high utility pattern mining approach by using a prefix-tree as in FP-Growth algorithm. Our approach stores the weight value of each item into a node and utilizes them for pruning unnecessary patterns. We compare the performance characteristics of three different prefix-tree structures. By thorough experimentation, we also prove that our approach can give performance improvement to a degree.

Personalized Recommendation System using FP-tree Mining based on RFM (RFM기반 FP-tree 마이닝을 이용한 개인화 추천시스템)

  • Cho, Young-Sung;Ho, Ryu-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.197-206
    • /
    • 2012
  • A exisiting recommedation system using association rules has the problem, such as delay of processing speed from a cause of frequent scanning a large data, scalability and accuracy as well. In this paper, using a Implicit method which is not used user's profile for rating, we propose the personalized recommendation system which is a new method using the FP-tree mining based on RFM. It is necessary for us to keep the analysis of RFM method and FP-tree mining to be able to reflect attributes of customers and items based on the whole customers' data and purchased data in order to find the items with high purchasability. The proposed makes frequent items and creates association rule by using the FP-tree mining based on RFM without occurrence of candidate set. We can recommend the items with efficiency, are used to generate the recommendable item according to the basic threshold for association rules with support, confidence and lift. To estimate the performance, the proposed system is compared with existing system. As a result, it can be improved and evaluated according to the criteria of logicality through the experiment with dataset, collected in a cosmetic internet shopping mall.

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method (EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.59-72
    • /
    • 2012
  • Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.

The Goods Recommendation System based on modified FP-Tree Algorithm (변형된 FP-Tree를 기반한 상품 추천 시스템)

  • Kim, Jong-Hee;Jung, Soon-Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.11
    • /
    • pp.205-213
    • /
    • 2010
  • This study uses the FP-tree algorithm, one of the mining techniques. This study is an attempt to suggest a new recommended system using a modified FP-tree algorithm which yields an association rule based on frequent 2-itemsets extracted from the transaction database. The modified recommended system consists of a pre-processing module, a learning module, a recommendation module and an evaluation module. The study first makes an assessment of the modified recommended system with respect to the precision rate, recall rate, F-measure, success rate, and recommending time. Then, the efficiency of the system is compared against other recommended systems utilizing the sequential pattern mining. When compared with other recommended systems utilizing the sequential pattern mining, the modified recommended system exhibits 5 times more efficiency in learning, and 20% improvement in the recommending capacity. This result proves that the modified system has more validity than recommended systems utilizing the sequential pattern mining.