• Title/Summary/Keyword: Frequent Pattern Mining

Search Result 103, Processing Time 0.023 seconds

Performance Evaluation of the FP-tree and the DHP Algorithms for Association Rule Mining (FP-tree와 DHP 연관 규칙 탐사 알고리즘의 실험적 성능 비교)

  • Lee, Hyung-Bong;Kim, Jin-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.3
    • /
    • pp.199-207
    • /
    • 2008
  • The FP-tree(Frequency Pattern Tree) mining association rules algorithm was proposed to improve mining performance by reducing DB scan overhead dramatically, and it is recognized that the performance of it is better than that of any other algorithms based on different approaches. But the FP-tree algorithm needs a few more memory because it has to store all transactions including frequent itemsets of the DB. This paper implements a FP-tree algorithm on a general purpose UNK system and compares it with the DHP(Direct Hashing and Pruning) algorithm which uses hash tree and direct hash table from the point of memory usage and execution time. The results show surprisingly that the FP-tree algorithm is poor than the DHP algorithm in some cases even if the system memory is sufficient for the FP-tree. The characteristics of the test data are as follows. The site of DB is look, the number of total items is $1K{\sim}7K$, avenrage length of transactions is $5{\sim}10$, avergage size of maximal frequent itemsets is $2{\sim}12$(these are typical attributes of data for large-scale convenience stores).

Algorithm for Extracting the General Web Search Path Pattern (일반적인 웹 검색 경로패턴 추출 알고리즘)

  • Jang, Min-Seok;Ha, Eun-Mi
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.771-773
    • /
    • 2005
  • There have been researches about analyzing the information retrieval patterns of log file to efficiently obtain the users' information research patters in web environment. The methods frequently used in their researches is to suggest the algorithms by which the frequent one is derived from the path traversal patterns in efficient way. But one of their general problems is not to provide the proper solution in case of complex, that is, general topological patterns. Therefore this paper tries to suggest a efficient algorithm after defining the general information retrieval pattern.

  • PDF

User Access Patterns Discovery based on Apriori Algorithm under Web Logs (웹 로그에서의 Apriori 알고리즘 기반 사용자 액세스 패턴 발견)

  • Ran, Cong-Lin;Joung, Suck-Tae
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.6
    • /
    • pp.681-689
    • /
    • 2019
  • Web usage pattern discovery is an advanced means by using web log data, and it's also a specific application of data mining technology in Web log data mining. In education Data Mining (DM) is the application of Data Mining techniques to educational data (such as Web logs of University, e-learning, adaptive hypermedia and intelligent tutoring systems, etc.), and so, its objective is to analyze these types of data in order to resolve educational research issues. In this paper, the Web log data of a university are used as the research object of data mining. With using the database OLAP technology the Web log data are preprocessed into the data format that can be used for data mining, and the processing results are stored into the MSSQL. At the same time the basic data statistics and analysis are completed based on the processed Web log records. In addition, we introduced the Apriori Algorithm of Web usage pattern mining and its implementation process, developed the Apriori Algorithm program in Python development environment, then gave the performance of the Apriori Algorithm and realized the mining of Web user access pattern. The results have important theoretical significance for the application of the patterns in the development of teaching systems. The next research is to explore the improvement of the Apriori Algorithm in the distributed computing environment.

A Review of Machine Learning Algorithms for Fraud Detection in Credit Card Transaction

  • Lim, Kha Shing;Lee, Lam Hong;Sim, Yee-Wai
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.31-40
    • /
    • 2021
  • The increasing number of credit card fraud cases has become a considerable problem since the past decades. This phenomenon is due to the expansion of new technologies, including the increased popularity and volume of online banking transactions and e-commerce. In order to address the problem of credit card fraud detection, a rule-based approach has been widely utilized to detect and guard against fraudulent activities. However, it requires huge computational power and high complexity in defining and building the rule base for pattern matching, in order to precisely identifying the fraud patterns. In addition, it does not come with intelligence and ability in predicting or analysing transaction data in looking for new fraud patterns and strategies. As such, Data Mining and Machine Learning algorithms are proposed to overcome the shortcomings in this paper. The aim of this paper is to highlight the important techniques and methodologies that are employed in fraud detection, while at the same time focusing on the existing literature. Methods such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), naïve Bayesian, k-Nearest Neighbour (k-NN), Decision Tree and Frequent Pattern Mining algorithms are reviewed and evaluated for their performance in detecting fraudulent transaction.

The Goods Recommendation System based on modified FP-Tree Algorithm (변형된 FP-Tree를 기반한 상품 추천 시스템)

  • Kim, Jong-Hee;Jung, Soon-Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.11
    • /
    • pp.205-213
    • /
    • 2010
  • This study uses the FP-tree algorithm, one of the mining techniques. This study is an attempt to suggest a new recommended system using a modified FP-tree algorithm which yields an association rule based on frequent 2-itemsets extracted from the transaction database. The modified recommended system consists of a pre-processing module, a learning module, a recommendation module and an evaluation module. The study first makes an assessment of the modified recommended system with respect to the precision rate, recall rate, F-measure, success rate, and recommending time. Then, the efficiency of the system is compared against other recommended systems utilizing the sequential pattern mining. When compared with other recommended systems utilizing the sequential pattern mining, the modified recommended system exhibits 5 times more efficiency in learning, and 20% improvement in the recommending capacity. This result proves that the modified system has more validity than recommended systems utilizing the sequential pattern mining.

Discovering Association Rules using Item Clustering on Frequent Pattern Network (빈발 패턴 네트워크에서 아이템 클러스터링을 통한 연관규칙 발견)

  • Oh, Kyeong-Jin;Jung, Jin-Guk;Ha, In-Ay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.1
    • /
    • pp.1-17
    • /
    • 2008
  • Data mining is defined as the process of discovering meaningful and useful pattern in large volumes of data. In particular, finding associations rules between items in a database of customer transactions has become an important thing. Some data structures and algorithms had been proposed for storing meaningful information compressed from an original database to find frequent itemsets since Apriori algorithm. Though existing method find all association rules, we must have a lot of process to analyze association rules because there are too many rules. In this paper, we propose a new data structure, called a Frequent Pattern Network (FPN), which represents items as vertices and 2-itemsets as edges of the network. In order to utilize FPN, We constitute FPN using item's frequency. And then we use a clustering method to group the vertices on the network into clusters so that the intracluster similarity is maximized and the intercluster similarity is minimized. We generate association rules based on clusters. Our experiments showed accuracy of clustering items on the network using confidence, correlation and edge weight similarity methods. And We generated association rules using clusters and compare traditional and our method. From the results, the confidence similarity had a strong influence than others on the frequent pattern network. And FPN had a flexibility to minimum support value.

  • PDF

Frequent Items Mining based on Regression Model in Data Streams (스트림 데이터에서 회귀분석에 기반한 빈발항목 예측)

  • Lee, Uk-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.147-158
    • /
    • 2009
  • Recently, the data model in stream data environment has massive, continuous, and infinity properties. However the stream data processing like query process or data analysis is conducted using a limited capacity of disk or memory. In these environment, the traditional frequent pattern discovery on transaction database can be performed because it is difficult to manage the information continuously whether a continuous stream data is the frequent item or not. In this paper, we propose the method which we are able to predict the frequent items using the regression model on continuous stream data environment. We can use as a prediction model on indefinite items by constructing the regression model on stream data. We will show that the proposed method is able to be efficiently used on stream data environment through a variety of experiments.

Precision Analysis of the STOMP(FW) Algorithm According to the Spatial Conceptual Hierarchy (공간 개념 계층에 따른 STOMP(FW) 알고리즘의 정확도 분석)

  • Lee, Yon-Sik;Kim, Young-Ja;Park, Sung-Sook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5015-5022
    • /
    • 2010
  • Most of the existing pattern mining techniques are capable of searching patterns according to the continuous change of the spatial information of an object but there is no constraint on the spatial information that must be included in the extracted pattern. Thus, the existing techniques are not applicable to the optimal path search between specific nodes or path prediction considering the nodes that a moving object is required to round during a unit time. In this paper, the precision of the path search according to the spatial hierarchy is analyzed using the Spatial-Temporal Optimal Moving Pattern(with Frequency & Weight) (STOPM(FW)) algorithm which searches for the optimal moving path by considering the most frequent pattern and other weighted factors such as time and cost. The result of analysis shows that the database retrieval time is minimized through the reduction of retrieval range applying with the spatial constraints. Also, the optimal moving pattern is efficiently obtained by considering whether the moving pattern is included in each hierarchical spatial scope of the spatial hierarchy or not.

Privacy Preserving Sequential Patterns Mining for Network Traffic Data (사이트의 접속 정보 유출이 없는 네트워크 트래픽 데이타에 대한 순차 패턴 마이닝)

  • Kim, Seung-Woo;Park, Sang-Hyun;Won, Jung-Im
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.741-753
    • /
    • 2006
  • As the total amount of traffic data in network has been growing at an alarming rate, many researches to mine traffic data with the purpose of getting useful information are currently being performed. However, network users' privacy can be compromised during the mining process. In this paper, we propose an efficient and practical privacy preserving sequential pattern mining method on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model and the retention replacement technique. In addition, our method accelerates the overall mining process by maintaining the meta tables so as to quickly determine whether candidate patterns have ever occurred. The various experiments with real network traffic data revealed tile efficiency of the proposed method.

Mining Frequent Sequential Patterns over Sequence Data Streams with a Gap-Constraint (순차 데이터 스트림에서 발생 간격 제한 조건을 활용한 빈발 순차 패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.9
    • /
    • pp.35-46
    • /
    • 2010
  • Sequential pattern mining is one of the essential data mining tasks, and it is widely used to analyze data generated in various application fields such as web-based applications, E-commerce, bioinformatics, and USN environments. Recently data generated in the application fields has been taking the form of continuous data streams rather than finite stored data sets. Considering the changes in the form of data, many researches have been actively performed to efficiently find sequential patterns over data streams. However, conventional researches focus on reducing processing time and memory usage in mining sequential patterns over a target data stream, so that a research on mining more interesting and useful sequential patterns that efficiently reflect the characteristics of the data stream has been attracting no attention. This paper proposes a mining method of sequential patterns over data streams with a gap constraint, which can help to find more interesting sequential patterns over the data streams. First, meanings of the gap for a sequential pattern and gap-constrained sequential patterns are defined, and subsequently a mining method for finding gap-constrained sequential patterns over a data stream is proposed.