• Title/Summary/Keyword: 빈발 항목

Search Result 99, Processing Time 0.028 seconds

Technique for Improving performance of FP-Tree and DRFP (FP-Tree 및 DRFP 의 성능 개선 기법)

  • Cho, Kyung Soo;Jeong, Jae-ho;Kim, Young Hee;Kim, Ung-mo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.844-847
    • /
    • 2010
  • FP-tree는 연관성 규칙 알고리즘 전체의 성능을 향상 시키며 DB 스캔을 단 2회로 줄였다. 하지만 빈발 항목과 모든 트랜잭션의 tree 정보를 메모리에 상주 시키면서 많은 메모리 공간을 요구했다. 그래서 나온 DRFP알고리즘은 메모리 요구 문제를 저장장치에 저장함으로 해결 하였으나 FP-tree와는 달리 시간 성능에서의 문제점을 가졌다. 그래서 우리는 이러한 문제점을 보완할 NRFP-tree(Nare disc-Resident Frequent pattern Tree)를 제안한다.

Offering system for major article Using Text Mining and Data Mining (텍스트마이닝과 데이터마이닝을 이용한 주요기사 제공 시스템)

  • Song, Sung-Mook;Ryu, Joon-Suk;Kim, Ung-Mo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.733-734
    • /
    • 2009
  • 현대사회에서 인터넷의 비약적인 발전과 빠른 보급으로 우리가 접할 수 있는 정보의 양이 늘어나고 이들 중에서 필요한 정보만을 얻어내기에는 쉽지 않다. 특히 비구조적이고 정형화되지 않은 텍스트 데이터인 기사들을 텍스트마이닝을 이용하여 기사 헤드라인을 용어 단위로 구분하여 추출하고 데이터마이닝의 연관 규칙을 적용하여 빈발항목의 지지도와 용어간의 연관성을 통해 기사의 내용에 효과적으로 접근하는 시스템을 제안하고자 한다.

An Efficient Algorithm for Mining Association Rules using a Binary Representation (이진 표현을 이용한 효율적인 연관 규칙 탐사 알고리즘)

  • Won-Young Kim;Won-Gil Choi;Ung-Mo Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.375-378
    • /
    • 2008
  • 오늘날 지식을 기반으로 하는 고도의 정보사회로 나아가는 시점에서 우리는 대량의 데이터 속에서 필요한 지식을 찾아내는 것에 초점을 모으게 되었다. 따라서 대량의 데이터 속에서 필요한 지식을 자동으로 찾아내는 데이터 마이닝에 대한 연구가 활발히 진행되고 있다. 데이터 마이닝은 대용량의 데이터를 대상으로 하기 때문에 정확도뿐만이 아니라 소요시간도 중요하기 때문에 성능 향상을 위한 알고리즘들이 많이 개발되었다. 데이터 마이닝의 성능을 향상시키기 위해서 가장 좋은 방법이 데이터베이스의 스캔의 횟수를 줄이는 것이다. 본 논문에서는 연관 규칙 탐사에서 빈발 항목 집합을 찾아내는 부분을 이진 표현을 이용하여 좀 더 성능을 향상시킬 수 있는 알고리즘을 제안한다.

Personalized Recommendation System using FP-tree Mining based on RFM (RFM기반 FP-tree 마이닝을 이용한 개인화 추천시스템)

  • Cho, Young-Sung;Ho, Ryu-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.197-206
    • /
    • 2012
  • A exisiting recommedation system using association rules has the problem, such as delay of processing speed from a cause of frequent scanning a large data, scalability and accuracy as well. In this paper, using a Implicit method which is not used user's profile for rating, we propose the personalized recommendation system which is a new method using the FP-tree mining based on RFM. It is necessary for us to keep the analysis of RFM method and FP-tree mining to be able to reflect attributes of customers and items based on the whole customers' data and purchased data in order to find the items with high purchasability. The proposed makes frequent items and creates association rule by using the FP-tree mining based on RFM without occurrence of candidate set. We can recommend the items with efficiency, are used to generate the recommendable item according to the basic threshold for association rules with support, confidence and lift. To estimate the performance, the proposed system is compared with existing system. As a result, it can be improved and evaluated according to the criteria of logicality through the experiment with dataset, collected in a cosmetic internet shopping mall.

Comparison of confidence measures useful for classification model building (분류 모형 구축에 유용한 신뢰도 측도 간의 비교)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.365-371
    • /
    • 2014
  • Association rule of the well-studied techniques in data mining is the exploratory data analysis for understanding the relevance among the items in a huge database. This method has been used to find the relationship between each set of items based on the interestingness measures such as support, confidence, lift, similarity measures, etc. By typical association rule technique, we generate association rule that satisfy minimum support and confidence values. Support and confidence are the most frequently used, but they have the drawback that they can not determine the direction of the association because they have always positive values. In this paper, we compared support, basic confidence, and three kinds of confidence measures useful for classification model building to overcome this problem. The result confirmed that the causal confirmed confidence was the best confidence in view of the association mining because it showed more precisely the direction of association.

A Study on Selecting Bitmap Join Index to Speed up Complex Queries in Relational Data Warehouses (관계형 데이터 웨어하우스의 복잡한 질의의 처리 효율 향상을 위한 비트맵 조인 인덱스 선택에 관한 연구)

  • An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.1-14
    • /
    • 2012
  • As the size of the data warehouse is large, the selection of indices on the data warehouse affects the efficiency of the query processing of the data warehouse. Indices induce the lower query processing cost, but they occupy the large storage areas and induce the index maintenance cost which are accompanied by database updates. The bitmap join indices are well applied when we optimize the star join queries which join a fact table and many dimension tables and the selection on dimension tables in data warehouses. Though the bitmap join indices with the binary representations induce the lower storage cost, the task to select the indexing attributes among the huge candidate attributes which are generated is difficult. The processes of index selection are to reduce the number of candidate attributes to be indexed and then select the indexing attributes. In this paper on bitmap join index selection problem we reduce the number of candidate attributes by the data mining techniques. Compared to the existing techniques which reduce the number of candidate attributes by the frequencies of attributes we consider the frequencies of attributes and the size of dimension tables and the size of the tuples of the dimension tables and the page size of disk. We use the mining of the frequent itemsets as mining techniques and reduce the great number of candidate attributes. We make the bitmap join indices which have the least costs and the least storage area adapted to storage constraints by using the cost functions applied to the bitmap join indices of the candidate attributes. We compare the existing techniques and ours and analyze them in order to evaluate the efficiencies of ours.

High Utility Pattern Mining using a Prefix-Tree (Prefix-Tree를 이용한 높은 유틸리티 패턴 마이닝 기법)

  • Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan;Lee, In-Gi;Yong, Hwan-Seong
    • Journal of KIISE:Databases
    • /
    • v.36 no.5
    • /
    • pp.341-351
    • /
    • 2009
  • Recently high utility pattern (HUP) mining is one of the most important research issuer in data mining since it can consider the different weight Haloes of items. However, existing mining algorithms suffer from the performance degradation because it cannot easily apply Apriori-principle for pattern mining. In this paper, we introduce new high utility pattern mining approach by using a prefix-tree as in FP-Growth algorithm. Our approach stores the weight value of each item into a node and utilizes them for pruning unnecessary patterns. We compare the performance characteristics of three different prefix-tree structures. By thorough experimentation, we also prove that our approach can give performance improvement to a degree.

An Efficient Algorithm For Mining Association Rules In Main Memory Systems (대용량 주기억장치 시스템에서 효율적인 연관 규칙 탐사 알고리즘)

  • Lee, Jae-Mun
    • The KIPS Transactions:PartD
    • /
    • v.9D no.4
    • /
    • pp.579-586
    • /
    • 2002
  • This paper propose an efficient algorithm for mining association rules in the large main memory systems. To do this, the paper attempts firstly to extend the conventional algorithms such as DHP and Partition in order to be compatible to the large main memory systems and proposes secondly an algorithm to improve Partition algorithm by applying the techniques of the hash table and the bit map. The proposed algorithm is compared to the extended DHP within the experimental environments and the results show up to 65% performance improvement in comparison to the expanded DHP.

Method of Associative Group Using FP-Tree in Personalized Recommendation System (개인화 추천 시스템에서 FP-Tree를 이용한 연관 군집 방법)

  • Cho, Dong-Ju;Rim, Kee-Wook;Lee, Jung-Hyun;Chung, Kyung-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.10
    • /
    • pp.19-26
    • /
    • 2007
  • Since collaborative filtering has used the nearest-neighborhood method based on item preference it cannot only reflect exact contents but also has the problem of sparsity and scalability. The item-based collaborative filtering has been practically used improve these problems. However it still does not reflect attributes of the item. In this paper, we propose the method of associative group using the FP-Tree to solve the problem of existing recommendation system. The proposed makes frequent item and creates association rule by using FP-Tree without occurrence of candidate set. We made the efficient item group using $\alpha-cut$ according to the confidence of the association rule. To estimate the performance, the suggested method is compared with Gibbs Sampling, Expectation Maximization, and K-means in the MovieLens dataset.

A Sliding Window Technique for Open Data Mining over Data Streams (개방 데이터 마이닝에 효율적인 이동 윈도우 기법)

  • Chang Joong-Hyuk;Lee Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.335-344
    • /
    • 2005
  • Recently open data mining methods focusing on a data stream that is a massive unbounded sequence of data elements continuously generated at a rapid rate are proposed actively. Knowledge embedded in a data stream is likely to be changed over time. Therefore, identifying the recent change of the knowledge quickly can provide valuable information for the analysis of the data stream. This paper proposes a sliding window technique for finding recently frequent itemsets, which is applied efficiently in open data mining. In the proposed technique, its memory usage is kept in a small space by delayed-insertion and pruning operations, and its mining result can be found in a short time since the data elements within its target range are not traversed repeatedly. Moreover, the proposed technique focused in the recent data elements, so that it can catch out the recent change of the data stream.