• Title/Summary/Keyword: Apriori 알고리즘

Search Result 89, Processing Time 0.027 seconds

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.155-162
    • /
    • 2008
  • Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Analysis of efficiency of FP-Growth algorithm based on data cardinality (데이터 카디널리티에 따른 FP-Growth 알고리즘의 효율성 분석)

  • Kim, Jin-Hyung;Kim, Byoung-Wook
    • Annual Conference of KIPS
    • /
    • 2019.05a
    • /
    • pp.33-35
    • /
    • 2019
  • 서로 다른 아이템 집합의 연관성을 분석하는 것을 연관규칙분석이라 한다. 대표적인 알고리즘으로 Apriori 알고리즘이 있지만 DB스캔 횟수가 많아질 수 있고 후보 집합 생성으로 인해서 속도가 느려질 수 있다는 단점이 있다. 이를 효율적으로 개선한 FP-Growth 알고리즘을 구현하여 임의의 데이터를 이용하여 알고리즘의 속도에 대해 연구한다.

Creation of Frequent Patterns using K-means Algorithm for Data Mining Preprocess (데이터 마이닝의 전처리를 위한 K-means 알고리즘을 이용한 빈발패턴 생성)

  • Heui-Jong Yoo;Chi-Yeon Park
    • Annual Conference of KIPS
    • /
    • 2008.11a
    • /
    • pp.336-339
    • /
    • 2008
  • 우리가 사용하는 데이터베이스 내에는 많은 양의 데이터 들이 들어 있으며, 계속적으로 그 양은 늘어나고 있다. 이러한 데이터들로부터 질의를 통해 얻을 수 있는 기본적이고 단순한 정보들과 달리 고급 정보를 얻게 해주는 방법이 데이터 마이닝이다. 데이터 마이닝의 기법 중에서 본 논문에서는 k-means 알고리즘을 사용하여 트랜잭션을 클러스터링 함으로써 데이터베이스의 트랜잭션 수를 줄여 연관규칙의 대표적인 알고리즘인 Apriori 알고리즘의 단점인 트랜잭션 스캔으로 인한 성능 저하를 개선하고자 한다.

Automatic Error Detection of Morpho-syntactic Errors of English Writing Using Association Rule Analysis Algorithm (연관 규칙 분석 알고리즘을 활용한 영작문 형태.통사 오류 자동 발견)

  • Kim, Dong-Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.3-8
    • /
    • 2010
  • 본 연구에서는 일련의 연구에서 수집된 영작문 오류 유형의 정제된 자료를 토대로 연관 규칙을 생성하고, 학습을 통해서 효용성이 검증된 연관 규칙을 활용해서 영작문 데이터의 형태 통사 오류를 자동으로 탐지한다. 영작문 데이터에서 형태 통사 오류를 찾아내는 작업은 많은 시간과 자원이 소요되는 작업이므로 자동화가 필수적이다. 기존의 연구들이 통계적 모델을 활용한 어휘적 오류에 치중하거나 언어 이론적 틀에 근거한 통사 처리에 집중하는 반면에, 본 연구는 데이터 마이닝을 통해서 정제된 데이터에서 연관 규칙을 생성하고 이를 검증한 후 형태 통사 오류를 감지한다. 이전 연구들에서는 이론적 틀에 맞추어진 규칙 생성이나 언어 모델 생성을 위한 대량의 코퍼스 데이터와 같은 다량의 지식 베이스 생성이 필수적인데, 본 연구는 적은 양의 정제된 데이터를 활용한다. 영작문 오류 유형의 형태 통사 연관 규칙을 생성하기 위해서 Apriori 알고리즘을 활용하였다. 알고리즘을 통해서 생성된 연관 규칙 중 잘못된 규칙이 생성될 가능성이 있으므로, 상관성 검정, 코사인 유사도와 같은 규칙 효용성의 통계적 검증을 활용해서 타당한 규칙만을 학습하였다. 이를 통해서 축적된 연관 규칙들을 영작문 오류를 자동으로 탐지하는 실험에 활용하였다.

  • PDF

An Analysis on the Predictor Keyword of Successful Aging: Focused on Data Mining (데이터마이닝을 활용한 성공적 노후 예측 키워드 분석)

  • Hong, Seo-Youn
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.3
    • /
    • pp.223-234
    • /
    • 2020
  • This research is the association rule analysis using Apriori algorithm of data mining focusing on 32 predictive key words extracted from Hong (2019) affecting successful aging in Korea. And, to examine rules and patterns of those key words or predictive variables, this research used support, confidence, and lift. The data was analyzed with the R version 3. 5. 1 program, and visualized using arulesViz package and visNetwork. It was found that the variables highly associated with successful aging in Korea were 'hobby', 'volunteer service', 'preparation', and 'exercise'. This research concludes that, the variable which needs to be considered first of all for successful aging in Korea is 'hobby', followed by 'volunteer service', 'preparation', and 'exercise'.

Frequently Occurred Information Extraction from a Collection of Labeled Trees (라벨 트리 데이터의 빈번하게 발생하는 정보 추출)

  • Paik, Ju-Ryon;Nam, Jung-Hyun;Ahn, Sung-Joon;Kim, Ung-Mo
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.65-78
    • /
    • 2009
  • The most commonly adopted approach to find valuable information from tree data is to extract frequently occurring subtree patterns from them. Because mining frequent tree patterns has a wide range of applications such as xml mining, web usage mining, bioinformatics, and network multicast routing, many algorithms have been recently proposed to find the patterns. However, existing tree mining algorithms suffer from several serious pitfalls in finding frequent tree patterns from massive tree datasets. Some of the major problems are due to (1) modeling data as hierarchical tree structure, (2) the computationally high cost of the candidate maintenance, (3) the repetitious input dataset scans, and (4) the high memory dependency. These problems stem from that most of these algorithms are based on the well-known apriori algorithm and have used anti-monotone property for candidate generation and frequency counting in their algorithms. To solve the problems, we base a pattern-growth approach rather than the apriori approach, and choose to extract maximal frequent subtree patterns instead of frequent subtree patterns. The proposed method not only gets rid of the process for infrequent subtrees pruning, but also totally eliminates the problem of generating candidate subtrees. Hence, it significantly improves the whole mining process.

  • PDF

A Partition Mining Method of Sequential Patterns using Suffix Checking (서픽스 검사를 이용한 단계적 순차패턴 분할 탐사 방법)

  • 허용도;조동영;박두순
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.590-598
    • /
    • 2002
  • For efficient sequential pattern mining, we need to reduce the cost to generate candidate patterns and searching space for the generated ones. Although Apriori-like methods like GSP[8] are simple, they have some problems such as generating of many candidate patterns and repetitive searching of a large database. PrefixSpan[2], which was proposed as an alternative of GSP, constructs the prefix projected databases which are stepwise partitioned in the mining process. It can reduce the searching space to estimate the support of candidate patterns, but the construction cost of projected databases is still high. To solve these problems, we proposed SuffixSpan(Suffix checked Sequential Pattern mining) as a new sequential pattern mining method. It generates a small size of candidate pattern sets using partition property and suffix property at a low cost and also uses 1-prefix projected databases as the searching space in order to reduce the cost of estimating the support of candidate patterns.

  • PDF

SME Bakery's Marketing Strategies Based on Apriori Algorithm (Apriori 알고리즘 기반의 중소 베이커리 기업의 대응 전략)

  • Kim, Do Hoon;Lee, Hyeon June;Lee, Bong Gyou
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.4
    • /
    • pp.328-337
    • /
    • 2022
  • The importance of online marketing is emerging due to the prevalence of COVID-19. In order to respond to the changing business environment, we have collected ten years of sales data of SME bakery company that have experienced a decrease in sales due to the COVID-19. As a result of the analysis, we found that switching from offline markets to omnichannel B2B and B2C markets and taking 'small quantity batch production' to 'mass production in a small variety can improve management. This study presented online and offline marketing strategies through data analysis of small and medium-sized bakery companies, which have relatively insufficient digital capabilities compared to large companies, and could be a guideline for many SMEs.

A Personalized Clothing Recommender System Based on the Algorithm for Mining Association Rules (연관 규칙 생성 알고리즘 기반의 개인화 의류 추천 시스템)

  • Lee, Chong-Hyeon;Lee, Suk-Hoon;Kim, Jang-Won;Baik, Doo-Kwon
    • Journal of the Korea Society for Simulation
    • /
    • v.19 no.4
    • /
    • pp.59-66
    • /
    • 2010
  • We present a personalized clothing recommender system - one that mines association rules from transaction described in ontologies and infers a recommendation from the rules. The recommender system can forecast frequently changing trends of clothing using the Onto-Apriori algorithm, and it makes appropriate recommendations for each users possible through the inference marked as meta nodes. We simulates the rule generator and the inferential search engine of the system with focus on accuracy and efficiency, and our results validate the system.

An efficient algorithm to search frequent itemsets using TID Lists (TID List를 이용한 빈발항목의 효율적인 탐색 알고리즘)

  • 고윤희;김현철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.136-139
    • /
    • 2002
  • 연관규칙 마이닝과정에서의 빈발항목 탐색의 대표적인 방법으로 알려진 Apriori 알고리즘의 성능을 향상시키기 위한 많은 연구가 진행되어 왔다. 본 논문에서는 트랜잭션 데이터베이스(TDB)에서 생성되는 각 패스의 k-itemset들에 대해 각각 트랜잭션 ID List(TIDist)를 유지하고 이를 이용해 (k+1)-itemset을 효율적으로 찾아내는 방법을 제안한다. 이 방법은 frequent (k+1)-itemset(k>0)의 빈도수 및 TIDList를 TDB 에 대한 스캔이 전혀 없이 k-itemset의 TIDList로부터 직접 구한다. 이는 빈발항목집합을 찾기 위한 탐색 complexity는 크게 줄여줄 뿐 아니라 시간 변화에 따른 빈발항목집합의 분포 정보를 제공해 준다.

  • PDF