• Title/Summary/Keyword: Frequent Item-sets

Search Result 14, Processing Time 0.029 seconds

A Fast Algorithm for Mining Association Rules in Web Log Data (상품간 연관 규칙의 효율적 탐색 방법에 관한 연구 : 인터넷 쇼핑몰을 중심으로)

  • 오은정;오상봉
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2003.11a
    • /
    • pp.621-626
    • /
    • 2003
  • Mining association rules in web log files can be divided into two steps: 1) discovering frequent item sets in web data; 2) extracting association rules from the frequent item sets found in the previous step. This paper suggests an algorithm for finding frequent item sets efficiently The essence of the proposed algorithm is to transform transaction data files into matrix format. Our experimental results show that the suggested algorithm outperforms the Apriori algorithm, which is widely used to discover frequent item sets, in terms of scan frequency and execution time.

  • PDF

Designing OLAP Cube Structures for Market Basket Analysis (장바구니 분석용 OLAP 큐브 구조의 설계)

  • Yu, Han-Ju;Choi, In-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.4
    • /
    • pp.179-189
    • /
    • 2007
  • Every purchase a customer makes builds patterns about how products are purchased together. The process of finding these patterns, called market basket analysis, is composed of two steps in the Microsoft Association Algorithm. The first step is to find frequent item-sets. The second step which requires much less time than the first step does is to generate association rules based on frequent item-sets. Even though the first step, finding frequent item-sets, is the core part of market basket analysis, when applied to Online Analytical Processing(OLAP) cubes it always raises several points such as longitudinal analysis becomes impossible and many unpractical transactions are built up. In this paper, a new OLAP cube structures designing method which makes longitudinal analysis be possible and also makes only real customers' purchase patterns be identified is proposed for market basket analysis.

  • PDF

An Efficient Algorithm for Updating Discovered Association Rules in Data Mining (데이터 마이닝에서 기존의 연관규칙을 갱신하는 효율적인 앨고리듬)

  • 김동필;지영근;황종원;강맹규
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.21 no.45
    • /
    • pp.121-133
    • /
    • 1998
  • This study suggests an efficient algorithm for updating discovered association rules in large database, because a database may allow frequent or occasional updates, and such updates may not only invalidate some existing strong association rules, but also turn some weak rules into strong ones. FUP and DMI update efficiently strong association rules in the whole updated database reusing the information of the old large item-sets. Moreover, these algorithms use a pruning technique for reducing the database size in the update process. This study updates strong association rules efficiently in the whole updated database reusing the information of the old large item-sets. An updating algorithm that is suggested in this study generates the whole candidate item-sets at once in an incremental database in view of the fact that it is difficult to find the new set of large item-sets in the whole updated database after an incremental database is added to the original database. This method of generating candidate item-sets is different from that of FUP and DMI. After generating the whole candidate item-sets, if each item-set in the whole candidate item-sets is large at an incremental database, the original database is scanned and the support of each item-set in the whole candidate item-sets is updated. So, the whole large item-sets in the whole updated database is found out. An updating algorithm that is suggested in this study does not use a pruning technique for reducing the database size in the update process. As a result, an updating algoritm that is suggested updates fast and efficiently discovered large item-sets.

  • PDF

Border-based HSFI Algorithm for Hiding Sensitive Frequent Itemsets (민감한 빈발항목집합을 숨기기 위한 경계기반 HSFI 알고리즘)

  • Lee, Dan-Young;An, Hyoung-Keun;Koh, Jae-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1323-1334
    • /
    • 2011
  • This paper suggests the border based HSFI algorithm to hide sensitive frequent itemsets. Node formation of FP-Tree which is different from the previous one uses the border to minimize the impacts of nonsensitive frequent itemsets in hiding process, including the organization of sensitive and border information, and all transaction as well. As a result of applying HSFI algorithms, it is possible to be the example transaction database, by significantly reducing the lost items, it turns out that HSFI algorithm is more effective than the existing algorithm for maintaining the quality of more improved database.

An Extended Frequent Pattern Tree for Hiding Sensitive Frequent Itemsets (민감한 빈발 항목집합 숨기기 위한 확장 빈발 패턴 트리)

  • Lee, Dan-Young;An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.18D no.3
    • /
    • pp.169-178
    • /
    • 2011
  • Recently, data sharing between enterprises or organizations is required matter for task cooperation. In this process, when the enterprise opens its database to the affiliates, it can be occurred to problem leaked sensitive information. To resolve this problem it is needed to hide sensitive information from the database. Previous research hiding sensitive information applied different heuristic algorithms to maintain quality of the database. But there have been few studies analyzing the effects on the items modified during the hiding process and trying to minimize the hided items. This paper suggests eFP-Tree(Extended Frequent Pattern Tree) based FP-Tree(Frequent Pattern Tree) to hide sensitive frequent itemsets. Node formation of eFP-Tree uses border to minimize impacts of non sensitive frequent itemsets in hiding process, by organizing all transaction, sensitive and border information differently to before. As a result to apply eFP-Tree to the example transaction database, the lost items were less than 10%, proving it is more effective than the existing algorithm and maintain the quality of database to the optimal.

Development of Network Event Audit Module Using Data Mining (데이터 마이닝을 통한 네트워크 이벤트 감사 모듈 개발)

  • Han, Seak-Jae;Soh, Woo-Young
    • Convergence Security Journal
    • /
    • v.5 no.2
    • /
    • pp.1-8
    • /
    • 2005
  • Network event analysis gives useful information on the network status that helps protect attacks. It involves finding sets of frequently used packet information such as IP addresses and requires real-time processing by its nature. Apriori algorithm used for data mining can be applied to find frequent item sets, but is not suitable for analyzing network events on real-time due to the high usage of CPU and memory and thus low processing speed. This paper develops a network event audit module by applying association rules to network events using a new algorithm instead of Apriori algorithm. Test results show that the application of the new algorithm gives drastically low usage of both CPU and memory for network event analysis compared with existing Apriori algorithm.

  • PDF

An Efficient Data Mining Algorithm based on the Database Characteristics (데이터 베이스 특성에 따른 효율적인 데이터 마이닝 알고리즘)

  • Park, Ji-Hyun;Koh, Chan
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.10 no.1
    • /
    • pp.107-119
    • /
    • 2006
  • Recently with developments of an internet and web techniques, the amount of data that are stored in database is increasing rapidly. So the range of adaption in database has been expanded and a research of Data Mining techniques finding useful skills from the huge database has been progressed. Many original algorithms have been developed by cutting down the item set and the size of database isn't required in the entire course of creating frequent item sets. Although those skills could save time in some course, it requires too much time for adapting those techniques in other courses. In this paper, an algorithm is proposed. In an Transaction Database that the length of it's transactions are short or the number of items are relatively small, this algorithm scans a database once by using a Hashing Technique and at the same time, stores all parts of the set, can be appeared at each transaction, in an Hash-table. So without an influence of n minimum percentage of support, it can discover a set of frequent items in more shorter time than the time what is used by an original algorithm.

  • PDF

Product-group Recommendation based on Association Rule Mining and Collaborative Filtering in Ubiquitous Computing Environment (유비쿼터스 환경에서 연관규칙과 협업필터링을 이용한 상품그룹추천)

  • Kim, Jae-Kyeong;Oh, Hee-Young;Kwon, Oh-Byung
    • Journal of Information Technology Services
    • /
    • v.6 no.2
    • /
    • pp.113-123
    • /
    • 2007
  • In ubiquitous computing environment such as ubiquitous marketplace (u-market), there is a need of providing context-based personalization service while considering the nomadic user preference and corresponding requirements. To do so, the recommendation systems should deal with the tremendous amount of context data. Hence, the purpose of this paper is to propose a novel recommendation method which provides the products-group list of the customers in u-market based on the shopping intention and preferences. We have developed FREPIRS(FREquent Purchased Item-sets Recommendation Service), which makes recommendation listof product-group, not individual product. Collaborative filtering and apriori algorithm are adopted in FREPIRS to build product-group.

A Study on the Implementation of an optimized Algorithm for association rule mining system using Fuzzy Utility (Fuzzy Utility를 활용한 연관규칙 마이닝 시스템을 위한 알고리즘의 구현에 관한 연구)

  • Park, In-Kyu;Choi, Gyoo-Seok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.19-25
    • /
    • 2020
  • In frequent pattern mining, the uncertainty of each item is accompanied by a loss of information. AAlso, in real environment, the importance of patterns changes with time, so fuzzy logic must be applied to meet these requirements and the dynamic characteristics of the importance of patterns should be considered. In this paper, we propose a fuzzy utility mining technique for extracting frequent web page sets from web log databases through fuzzy utility-based web page set mining. Here, the downward closure characteristic of the fuzzy set is applied to remove a large space by the minimum fuzzy utility threshold (MFUT)and the user-defined percentile(UDP). Extensive performance analyses show that our algorithm is very efficient and scalable for Fuzzy Utility Mining using dynamic weights.

Finding Pseudo Periods over Data Streams based on Multiple Hash Functions (다중 해시함수 기반 데이터 스트림에서의 아이템 의사 주기 탐사 기법)

  • Lee, Hak-Joo;Kim, Jae-Wan;Lee, Won-Suk
    • Journal of Information Technology Services
    • /
    • v.16 no.1
    • /
    • pp.73-82
    • /
    • 2017
  • Recently in-memory data stream processing has been actively applied to various subjects such as query processing, OLAP, data mining, i.e., frequent item sets, association rules, clustering. However, finding regular periodic patterns of events in an infinite data stream gets less attention. Most researches about finding periods use autocorrelation functions to find certain changes in periodic patterns, not period itself. And they usually find periodic patterns in time-series databases, not in data streams. Literally a period means the length or era of time that some phenomenon recur in a certain time interval. However in real applications a data set indeed evolves with tiny differences as time elapses. This kind of a period is called as a pseudo-period. This paper proposes a new scheme called FPMH (Finding Periods using Multiple Hash functions) algorithm to find such a set of pseudo-periods over a data stream based on multiple hash functions. According to the type of pseudo period, this paper categorizes FPMH into three, FPMH-E, FPMH-PC, FPMH-PP. To maximize the performance of the algorithm in the data stream environment and to keep most recent periodic patterns in memory, we applied decay mechanism to FPMH algorithms. FPMH algorithm minimizes the usage of memory as well as processing time with acceptable accuracy.