• Title/Summary/Keyword: Frequent Pattern Mining

Search Result 103, Processing Time 0.028 seconds

An Efficient Method for Mining Frequent Patterns based on Weighted Support over Data Streams (데이터 스트림에서 가중치 지지도 기반 빈발 패턴 추출 방법)

  • Kim, Young-Hee;Kim, Won-Young;Kim, Ung-Mo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.8
    • /
    • pp.1998-2004
    • /
    • 2009
  • Recently, due to technical developments of various storage devices and networks, the amount of data increases rapidly. The large volume of data streams poses unique space and time constraints on the data mining process. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Most of the researches based on the support are concerned with the frequent itemsets, but ignore the infrequent itemsets even if it is crucial. In this paper, we propose an efficient method WSFI-Mine(Weighted Support Frequent Itemsets Mine) to mine all frequent itemsets by one scan from the data stream. This method can discover the closed frequent itemsets using DCT(Data Stream Closed Pattern Tree). We compare the performance of our algorithm with DSM-FI and THUI-Mine, under different minimum supports. As results show that WSFI-Mine not only run significant faster, but also consume less memory.

Finding Frequent Itemsets Over Data Streams in Confined Memory Space (한정된 메모리 공간에서 데이터 스트림의 빈발항목 최적화 방법)

  • Kim, Min-Jung;Shin, Se-Jung;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.15D no.6
    • /
    • pp.741-754
    • /
    • 2008
  • Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process regardless of the amount of information generated in the data stream. For this purpose, this paper proposes the Prime pattern tree(PPT) for finding frequent itemsets over data streams with using the confined memory space. Unlike a prefix tree, a node of a PPT can maintain the information necessary to estimate the current supports of several itemsets together. The length of items in a prime pattern can be reduced the total number of nodes and controlled by split_delta $S_{\delta}$. The size and the accuracy of the PPT is determined by $S_{\delta}$. The accuracy is better as the value of $S_{\delta}$ is smaller since the value of $S_{\delta}$ is large, many itemsets are estimated their frequencies. So it is important to consider trade-off between the size of a PPT and the accuracy of the mining result. Based on this characteristic, the size and the accuracy of the PPT can be flexibly controlled by merging or splitting nodes in a mining process. For finding all frequent itemsets over the data stream, this paper proposes a PPT to replace the role of a prefix tree in the estDec method which was proposed as a previous work. It is efficient to optimize the memory usage for finding frequent itemsets over a data stream in confined memory space. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.

Analysis of Graph Mining based on Free-Tree (자유트리 기반의 그래프마이닝 기법 분석)

  • YoungSang No;Unil Yun;Keun Ho Ryu;Myung Jun Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.275-278
    • /
    • 2008
  • Recently, there are many research of datamining. On the transaction dataset, association rules is made by finding of interesting patterns. A part of mining, sub-structure mining is increased in interest of and applied to many high technology. But graph mining has more computing time then itemset mining. Therefore, that need efficient way for avoid duplication. GASTON is best algorithm of duplication free. This paper analyze GASTON and expect the future work.

High Utility Itemset Mining by Using Binary PSO Algorithm with V-shaped Transfer Function and Nonlinear Acceleration Coefficient Strategy

  • Tao, Bodong;Shin, Ok Keun;Park, Hyu Chan
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.2
    • /
    • pp.103-112
    • /
    • 2022
  • The goal of pattern mining is to identify novel patterns in a database. High utility itemset mining (HUIM) is a research direction for pattern mining. This is different from frequent itemset mining (FIM), which additionally considers the quantity and profit of the commodity. Several algorithms have been used to mine high utility itemsets (HUIs). The original BPSO algorithm lacks local search capabilities in the subsequent stage, resulting in insufficient HUIs to be mined. Compared to the transfer function used in the original PSO algorithm, the V-shaped transfer function more sufficiently reflects the probability between the velocity and position change of the particles. Considering the influence of the acceleration factor on the particle motion mode and trajectory, a nonlinear acceleration strategy was used to enhance the search ability of the particles. Experiments show that the number of mined HUIs is 73% higher than that of the original BPSO algorithm, which indicates better performance of the proposed algorithm.

An associative service mining based on dynamic weight (동적 가중치 기반의 연관 서비스 탐사 기법)

  • Hwang, Jeong Hee
    • Journal of Digital Contents Society
    • /
    • v.17 no.5
    • /
    • pp.359-366
    • /
    • 2016
  • In order to provide useful services for user in ubiquitous environment, a technique that can get the helpful information considering user activity and preference is needed and also user's interest actually changes as time passes. Therefore, the discovering method which reflects the concern degree of service information is needed. In this paper, we present the finding method of frequent pattern with dynamic weight on individual item based on service ontology we design. Our method can be applied to provide interested service information for user depending on context.

Discovery of Frequent Sequence Pattern in Moving Object Databases (이동 객체 데이터베이스에서 빈발 시퀀스 패턴 탐색)

  • Vu, Thi Hong Nhan;Lee, Bum-Ju;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.179-186
    • /
    • 2008
  • The converge of location-aware devices, GIS functionalities and the increasing accuracy and availability of positioning technologies pave the way to a range of new types of location-based services. The field of spatiotemporal data mining where relationships are defined by spatial and temporal aspect of data is encountering big challenges since the increased search space of knowledge. Therefore, we aim to propose algorithms for mining spatiotemporal patterns in mobile environment in this paper. Moving patterns are generated utilizing two algorithms called All_MOP and Max_MOP. The first one mines all frequent patterns and the other discovers only maximal frequent patterns. Our proposed approach is able to reduce consuming time through comparison with DFS_MINE algorithm. In addition, our approach is applicable to location-based services such as tourist service, traffic service, and so on.

Mining Frequent Service Patterns using Graph (그래프를 이용한 빈발 서비스 탐사)

  • Hwang, Jeong-Hee
    • Journal of Digital Contents Society
    • /
    • v.19 no.3
    • /
    • pp.471-477
    • /
    • 2018
  • As time changes, users change their interest. In this paper, we propose a method to provide suitable service for users by dynamically weighting service interests in the context of age, timing, and seasonal changes in ubiquitous environment. Based on the service history data presented to users according to the age or season, we also offer useful services by continuously adding the most recent service rules to reflect the changing of service interest. To do this, a set of services is considered as a transaction and each service is considered as an item in a transaction. And also we represent the association of services in a graph and extract frequent service items that refer to the latest information services for users.

Probabilistic Models for Local Patterns Analysis

  • Salim, Khiat;Hafida, Belbachir;Ahmed, Rahal Sid
    • Journal of Information Processing Systems
    • /
    • v.10 no.1
    • /
    • pp.145-161
    • /
    • 2014
  • Recently, many large organizations have multiple data sources (MDS') distributed over different branches of an interstate company. Local patterns analysis has become an effective strategy for MDS mining in national and international organizations. It consists of mining different datasets in order to obtain frequent patterns, which are forwarded to a centralized place for global pattern analysis. Various synthesizing models [2,3,4,5,6,7,8,26] have been proposed to build global patterns from the forwarded patterns. It is desired that the synthesized rules from such forwarded patterns must closely match with the mono-mining results (i.e., the results that would be obtained if all of the databases are put together and mining has been done). When the pattern is present in the site, but fails to satisfy the minimum support threshold value, it is not allowed to take part in the pattern synthesizing process. Therefore, this process can lose some interesting patterns, which can help the decider to make the right decision. In such situations we propose the application of a probabilistic model in the synthesizing process. An adequate choice for a probabilistic model can improve the quality of patterns that have been discovered. In this paper, we perform a comprehensive study on various probabilistic models that can be applied in the synthesizing process and we choose and improve one of them that works to ameliorate the synthesizing results. Finally, some experiments are presented in public database in order to improve the efficiency of our proposed synthesizing method.

WebPR : A Dynamic Web Page Recommendation Algorithm Based on Mining Frequent Traversal Patterns (WebPR :빈발 순회패턴 탐사에 기반한 동적 웹페이지 추천 알고리즘)

  • Yoon, Sun-Hee;Kim, Sam-Keun;Lee, Chang-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.2
    • /
    • pp.187-198
    • /
    • 2004
  • The World-Wide Web is the largest distributed Information space and has grown to encompass diverse information resources. However, although Web is growing exponentially, the individual's capacity to read and digest contents is essentially fixed. From the view point of Web users, they can be confused by explosion of Web information, by constantly changing Web environments, and by lack of understanding needs of Web users. In these Web environments, mining traversal patterns is an important problem in Web mining with a host of application domains including system design and Information services. Conventional traversal pattern mining systems use the inter-pages association in sessions with only a very restricted mechanism (based on vector or matrix) for generating frequent k-Pagesets. We develop a family of novel algorithms (termed WebPR - Web Page Recommend) for mining frequent traversal patterns and then pageset to recommend. Our algorithms provide Web users with new page views, which Include pagesets to recommend, so that users can effectively traverse its Web site. The main distinguishing factors are both a point consistently spanning schemes applying inter-pages association for mining frequent traversal patterns and a point proposing the most efficient tree model. Our experimentation with two real data sets, including Lady Asiana and KBS media server site, clearly validates that our method outperforms conventional methods.

An Efficient Candidate Pattern Tree Structure and Algorithm for Incremental Web Mining (점진적인 웹 마이닝을 위한 효율적인 후보패턴 저장 트리구조 및 알고리즘)

  • Kang, Hee-Seong;Park, Byung-Joon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.1
    • /
    • pp.71-79
    • /
    • 2007
  • Recent advances in the internet infrastructure have resulted in a large number of huge Web sites and portals worldwide. These Web sites are being visited by various types of users in many different ways. Among all the web page access sequences from different users, some of them occur so frequently that may need an attention from those who are interested. We call them frequent access patterns and access sequences that can be frequent the candidate patterns. Since these candidate patterns play an important role in the incremental Web mining, it is important to efficiently generate, add, delete, and search for them. This thesis presents a novel tree structure that can efficiently store the candidate patterns and a related set of algorithms for generating the tree structure, adding new patterns, deleting unnecessary patterns, and searching for the needed ones. The proposed tree structure has a kind of the 3 dimensional link structure and its nodes are layered.