• Title/Summary/Keyword: Frequent itemset

Search Result 25, Processing Time 0.027 seconds

Frequent Itemset Search Using LSI Similarity (LSI 유사도를 이용한 효율적인 빈발항목 탐색 알고리즘)

  • Ko, Younhee;Kim, Hyeoncheol;Lee, Wongyu
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.1
    • /
    • pp.1-8
    • /
    • 2003
  • We introduce a efficient vertical mining algorithm that reduces searching complexity for frequent k-itemsets significantly. This method includes sorting items by their LSI(Least Support Itemsets) similarity and then searching frequent itemsets in tree-based manner. The search tree structure provides several useful heuristics and therefore, reduces search space significantly at early stages. Experimental results on various data sets shows that the proposed algorithm improves searching performance compared to other algorithms, especially for a database having long pattern.

  • PDF

Multi-Sized cumulative Summary Structure Driven Light Weight in Frequent Closed Itemset Mining to Increase High Utility

  • Siva S;Shilpa Chaudhari
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.2
    • /
    • pp.117-129
    • /
    • 2023
  • High-utility itemset mining (HIUM) has emerged as a key data-mining paradigm for object-of-interest identification and recommendation systems that serve as frequent itemset identification tools, product or service recommendation systems, etc. Recently, it has gained widespread attention owing to its increasing role in business intelligence, top-N recommendation, and other enterprise solutions. Despite the increasing significance and the inability to provide swift and more accurate predictions, most at-hand solutions, including frequent itemset mining, HUIM, and high average- and fast high-utility itemset mining, are limited to coping with real-time enterprise demands. Moreover, complex computations and high memory exhaustion limit their scalability as enterprise solutions. To address these limitations, this study proposes a model to extract high-utility frequent closed itemsets based on an improved cumulative summary list structure (CSLFC-HUIM) to reduce an optimal set of candidate items in the search space. Moreover, it employs the lift score as the minimum threshold, called the cumulative utility threshold, to prune the search space optimal set of itemsets in a nested-list structure that improves computational time, costs, and memory exhaustion. Simulations over different datasets revealed that the proposed CSLFC-HUIM model outperforms other existing methods, such as closed- and frequent closed-HUIM variants, in terms of execution time and memory consumption, making it suitable for different mined items and allied intelligence of business goals.

An efficient algorithm to search frequent itemsets using TID Lists (TID List를 이용한 빈발항목의 효율적인 탐색 알고리즘)

  • 고윤희;김현철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.136-139
    • /
    • 2002
  • 연관규칙 마이닝과정에서의 빈발항목 탐색의 대표적인 방법으로 알려진 Apriori 알고리즘의 성능을 향상시키기 위한 많은 연구가 진행되어 왔다. 본 논문에서는 트랜잭션 데이터베이스(TDB)에서 생성되는 각 패스의 k-itemset들에 대해 각각 트랜잭션 ID List(TIDist)를 유지하고 이를 이용해 (k+1)-itemset을 효율적으로 찾아내는 방법을 제안한다. 이 방법은 frequent (k+1)-itemset(k>0)의 빈도수 및 TIDList를 TDB 에 대한 스캔이 전혀 없이 k-itemset의 TIDList로부터 직접 구한다. 이는 빈발항목집합을 찾기 위한 탐색 complexity는 크게 줄여줄 뿐 아니라 시간 변화에 따른 빈발항목집합의 분포 정보를 제공해 준다.

  • PDF

Dummy Data Insert Scheme for Privacy Preserving Frequent Itemset Mining in Data Stream (데이터 스트림 빈발항목 마이닝의 프라이버시 보호를 위한 더미 데이터 삽입 기법)

  • Jung, Jay Yeol;Kim, Kee Sung;Jeong, Ik Rae
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.3
    • /
    • pp.383-393
    • /
    • 2013
  • Data stream mining is a technique to obtain the useful information by analyzing the data generated in real time. In data stream mining technology, frequent itemset mining is a method to find the frequent itemset while data is transmitting, and these itemsets are used for the purpose of pattern analyze and marketing in various fields. Existing techniques of finding frequent itemset mining are having problems when a malicious attacker sniffing the data, it reveals data provider's real-time information. These problems can be solved by using a method of inserting dummy data. By using this method, a attacker cannot distinguish the original data from the transmitting data. In this paper, we propose a method for privacy preserving frequent itemset mining by using the technique of inserting dummy data. In addition, the proposed method is effective in terms of calculation because it does not require encryption technology or other mathematical operations.

Distributed Incremental Approximate Frequent Itemset Mining Using MapReduce

  • Mohsin Shaikh;Irfan Ali Tunio;Syed Muhammad Shehram Shah;Fareesa Khan Sohu;Abdul Aziz;Ahmad Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.207-211
    • /
    • 2023
  • Traditional methods for datamining typically assume that the data is small, centralized, memory resident and static. But this assumption is no longer acceptable, because datasets are growing very fast hence becoming huge from time to time. There is fast growing need to manage data with efficient mining algorithms. In such a scenario it is inevitable to carry out data mining in a distributed environment and Frequent Itemset Mining (FIM) is no exception. Thus, the need of an efficient incremental mining algorithm arises. We propose the Distributed Incremental Approximate Frequent Itemset Mining (DIAFIM) which is an incremental FIM algorithm and works on the distributed parallel MapReduce environment. The key contribution of this research is devising an incremental mining algorithm that works on the distributed parallel MapReduce environment.

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

  • Yun, Unil;Pyun, Gwangbum
    • Journal of Internet Computing and Services
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2015
  • In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.

PRMS: Page Reallocation Method for SSDs (PRMS: SSDs에서의 Page 재배치 방법)

  • Lee, Dong-Hyun;Roh, Hong-Chan;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.17D no.6
    • /
    • pp.395-404
    • /
    • 2010
  • Solid-State Disks (SSDs) have been currently considered as a promising candidate to replace hard disks, due to their significantly short access time, low power consumption, and shock resistance. SSDs, however, have drawbacks such that their write throughput and life span are decreased by random-writes, nearly regardless of SSDs controller designs. Previous studies have mostly focused on better designs of SSDs controller and reducing the number of write operations to SSDs. We suggest another method that reallocates data pages that tend to be simultaneously written to contiguous blocks. Our method gathers write operations during a period of time and generates write traces. After transforming each trace to a set of transactions, our method mines frequent itemsets from the transactions and reallocates the pages of the frequent itemsets. In addition, we introduce an algorithm that reallocates the pages of the frequent itemsets with moderate time complexity. Experiments using TPC-C workload demonstrated that our method successfully reduce 6% of total logical block access.

Association Rule Discovery using TID List Table (TID 리스트 테이블을 이용한 연관 규칙 탐사)

  • Chai, Duck-Jin;Hwang, Bu-Hyun
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.219-227
    • /
    • 2005
  • In this paper, we propose an efficient algorithm which generates frequent itemsets by only one database scanning. A frequent itemset is subset of an itemset which is accessed by a transaction. For each item, if informations about transactions accessing the item are exist, it is possible to generate frequent itemsets only by the extraction of items haying an identical transaction ID. Proposed method in this paper generates the data structure which stores transaction ID for each item by only one database scanning and generates 2-frequent itemsets by using the hash technique at the same time. k(k$\geq$3)-frequent itemsets are simply found by comparing previously generated data structure and transaction ID. Proposed algorithm can efficiently generate frequent itemsets by only one database scanning .

High Utility Itemset Mining by Using Binary PSO Algorithm with V-shaped Transfer Function and Nonlinear Acceleration Coefficient Strategy

  • Tao, Bodong;Shin, Ok Keun;Park, Hyu Chan
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.2
    • /
    • pp.103-112
    • /
    • 2022
  • The goal of pattern mining is to identify novel patterns in a database. High utility itemset mining (HUIM) is a research direction for pattern mining. This is different from frequent itemset mining (FIM), which additionally considers the quantity and profit of the commodity. Several algorithms have been used to mine high utility itemsets (HUIs). The original BPSO algorithm lacks local search capabilities in the subsequent stage, resulting in insufficient HUIs to be mined. Compared to the transfer function used in the original PSO algorithm, the V-shaped transfer function more sufficiently reflects the probability between the velocity and position change of the particles. Considering the influence of the acceleration factor on the particle motion mode and trajectory, a nonlinear acceleration strategy was used to enhance the search ability of the particles. Experiments show that the number of mined HUIs is 73% higher than that of the original BPSO algorithm, which indicates better performance of the proposed algorithm.

An Efficient Algorithm for Mining Frequent Closed Itemsets Using Transaction Link Structure (트랜잭션 연결 구조를 이용한 빈발 Closed 항목집합 마이닝 알고리즘)

  • Han, Kyong Rok;Kim, Jae Yearn
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.32 no.3
    • /
    • pp.242-252
    • /
    • 2006
  • Data mining is the exploration and analysis of huge amounts of data to discover meaningful patterns. One of the most important data mining problems is association rule mining. Recent studies of mining association rules have proposed a closure mechanism. It is no longer necessary to mine the set of all of the frequent itemsets and their association rules. Rather, it is sufficient to mine the frequent closed itemsets and their corresponding rules. In the past, a number of algorithms for mining frequent closed itemsets have been based on items. In this paper, we use the transaction itself for mining frequent closed itemsets. An efficient algorithm is proposed that is based on a link structure between transactions. Our experimental results show that our algorithm is faster than previously proposed methods. Furthermore, our approach is significantly more efficient for dense databases.