• Title/Summary/Keyword: 연관 규칙 알고리즘

Search Result 200, Processing Time 0.025 seconds

Technique for Improving performance of FP-Tree and DRFP (FP-Tree 및 DRFP 의 성능 개선 기법)

  • Cho, Kyung Soo;Jeong, Jae-ho;Kim, Young Hee;Kim, Ung-mo
    • Annual Conference of KIPS
    • /
    • 2010.04a
    • /
    • pp.844-847
    • /
    • 2010
  • FP-tree는 연관성 규칙 알고리즘 전체의 성능을 향상 시키며 DB 스캔을 단 2회로 줄였다. 하지만 빈발 항목과 모든 트랜잭션의 tree 정보를 메모리에 상주 시키면서 많은 메모리 공간을 요구했다. 그래서 나온 DRFP알고리즘은 메모리 요구 문제를 저장장치에 저장함으로 해결 하였으나 FP-tree와는 달리 시간 성능에서의 문제점을 가졌다. 그래서 우리는 이러한 문제점을 보완할 NRFP-tree(Nare disc-Resident Frequent pattern Tree)를 제안한다.

An analysis of operation status depending on the characteristics of R&D projects in Sciences and Engineering universities (이공계 대학 연구과제 특성 별 운영 형태 현황)

  • Lee, Sang-Soog;Yoo, Inhyeok;Kim, Jinhee
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.93-100
    • /
    • 2022
  • This study aimed to understand the current status of science and engineering university(SEU) R&D operations depending on the research project characteristics(e.g., stages and characteristics), then provide implications for future university R&D support systems and related policies. Hence, an online survey targeting SEU R&D recipients was conducted between October 4th to November 5th, 2021. Analyzing 445 valid data using the Apriori algorithm, 16 association rules for R&D operation according to the research project characteristics show that regardless of research characteristics, SEU's R&D projects, particularly in applied research, were funded or operated under the leadership of government or public institutions. For basic research, individual researchers had a higher level of autonomy in determining research topics; yet, they had a short duration (3 years) and a unit of evaluation period of more than 3 years. These findings can be empirical evidence for revealing the relationship among various variables in operating SEUs' R&D.

When is the best time to run SNS AD per topic?: through conversation data analysis (SNS 대화 분석을 통한 주제별 적합 광고 시간대 도출)

  • Lee, Jimin;Jeon, Yerim;Lee, Jisun;Woo, Jiyoung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.335-336
    • /
    • 2022
  • 본 논문에서는 시간대와 대화 주제를 활용하여 카테고리별로 적절한 SNS 광고 시간대 예측 방법을 제시한다. 위의 분석으로 광고주들에게 적절한 광고시간을 제안할 수 있다. 연관규칙분석 알고리즘인 apriori를 사용하였다. 주제는 상거래(쇼핑), 미용과 건강, 시사/교육, 식음료, 여가생활로 추려서 분석하였다. 연관분석 결과, 미용과 건강이 18시, 17시, 16시에 가장 활발히 대화를 나누었다. 상거래(쇼핑)이 14시, 16시, 17시 순으로 가장 활발히 대화를 나누었으며, 시사/교육이 15시, 17시, 16시 순으로 많은 대화를 나누었으며, 식음료가 18시, 17시, 19시 순으로 대화를 많이 나눈 것을 확인했다. 마지막으로, 여가생활은 22시, 23시, 21시 순으로 각각의 대화 주제별로 가장 많이 대화를 나눈 시간대가 달라지는 것을 확인할 수 있었다. 이를 통해 소비자 입장에서는 알맞은 광고를 적절한 시간대에 추천받을 수 있다.

  • PDF

An Item-based Collaborative Filtering Technique by Associative Relation Clustering in Personalized Recommender Systems (개인화 추천 시스템에서 연관 관계 군집에 의한 아이템 기반의 협력적 필터링 기술)

  • 정경용;김진현;정헌만;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.467-477
    • /
    • 2004
  • While recommender systems were used by a few E-commerce sites former days, they are now becoming serious business tools that are re-shaping the world of I-commerce. And collaborative filtering has been a very successful recommendation technique in both research and practice. But there are two problems in personalized recommender systems, it is First-Rating problem and Sparsity problem. In this paper, we solve these problems using the associative relation clustering and “Lift” of association rules. We produce “Lift” between items using user's rating data. And we apply Threshold by -cut to the association between items. To make an efficiency of associative relation cluster higher, we use not only the existing Hypergraph Clique Clustering algorithm but also the suggested Split Cluster method. If the cluster is completed, we calculate a similarity iten in each inner cluster. And the index is saved in the database for the fast access. We apply the creating index to predict the preference for new items. To estimate the Performance, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

An Algorithm for reducing the search time of Frequent Items (빈발 항목의 탐색 시간을 단축하기 위한 알고리즘)

  • Yun, So-Young;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.1
    • /
    • pp.147-156
    • /
    • 2011
  • With the increasing utility of the recent information system, the methods to pick up necessary products rapidly by using a lot of data has been studied. Association rule search methods to find hidden patterns has been drawing much attention, and the Apriori algorithm is a major method. However, the Apriori algorithm increases search time due to its repeated scans. This paper proposes an algorithm to reduce searching time of frequent items. The proposed algorithm creates matrix using transaction database and search for frequent items using the mean number of items of transactions at matrix and a defined minimum support. The mean number of items of transactions is used to reduce the number of transactions, and the minimum support to cut down on items. The performance of the proposed algorithm is assessed by the comparison of search time and precision with existing algorithms. The findings from this study indicated that the proposed algorithm has been searched more quickly and efficiently when extracting final frequent items, compared to existing Apriori and Matrix algorithm.

Transaction Pattern Discrimination of Malicious Supply Chain using Tariff-Structured Big Data (관세 정형 빅데이터를 활용한 우범공급망 거래패턴 선별)

  • Kim, Seongchan;Song, Sa-Kwang;Cho, Minhee;Shin, Su-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.2
    • /
    • pp.121-129
    • /
    • 2021
  • In this study, we try to minimize the tariff risk by constructing a hazardous cargo screening model by applying Association Rule Mining, one of the data mining techniques. For this, the risk level between supply chains is calculated using the Apriori Algorithm, which is an association analysis algorithm, using the big data of the import declaration form of the Korea Customs Service(KCS). We perform data preprocessing and association rule mining to generate a model to be used in screening the supply chain. In the preprocessing process, we extract the attributes required for rule generation from the import declaration data after the error removing process. Then, we generate the rules by using the extracted attributes as inputs to the Apriori algorithm. The generated association rule model is loaded in the KCS screening system. When the import declaration which should be checked is received, the screening system refers to the model and returns the confidence value based on the supply chain information on the import declaration data. The result will be used to determine whether to check the import case. The 5-fold cross-validation of 16.6% precision and 33.8% recall showed that import declaration data for 2 years and 6 months were divided into learning data and test data. This is a result that is about 3.4 times higher in precision and 1.5 times higher in recall than frequency-based methods. This confirms that the proposed method is an effective way to reduce tariff risks.

Competitor Extraction based on Machine Learning Methods (기계학습 기반 경쟁자 자동추출 방법)

  • Lee, Chung-Hee;Kim, Hyun-Jin;Ryu, Pum-Mo;Kim, Hyun-Ki;Seo, Young-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.107-112
    • /
    • 2012
  • 본 논문은 일반 텍스트에 나타나는 경쟁 관계에 있는 고유명사들을 경쟁자로 자동 추출하는 방법에 대한 것으로, 규칙 기반 방법과 기계 학습 기반 방법을 모두 제안하고 비교하였다. 제안한 시스템은 뉴스 기사를 대상으로 하였고, 문장에 경쟁관계를 나타내는 명확한 정보가 있는 경우에만 추출하는 것을 목표로 하였다. 규칙기반 경쟁어 추출 시스템은 2개의 고유명사가 경쟁관계임을 나타내는 단서단어에 기반해서 경쟁어를 추출하는 시스템이며, 경쟁표현 단서단어는 620개가 수집되어 사용됐다. 기계학습 기반 경쟁어 추출시스템은 경쟁어 추출을 경쟁어 후보에 대한 경쟁여부의 바이너리 분류 문제로 접근하였다. 분류 알고리즘은 Support Vector Machines을 사용하였고, 경쟁어 주변 문맥 정보를 대표할 수 있는 언어 독립적 5개 자질에 기반해서 모델을 학습하였다. 성능평가를 위해서 이슈화되고 있는 핫키워드 54개에 대해서 623개의 경쟁어를 뉴스 기사로부터 수집해서 평가셋을 구축하였다. 비교 평가를 위해서 기준시스템으로 연관어에 기반해서 경쟁어를 추출하는 시스템을 구현하였고, Recall/Precision/F1 성능으로 0.119/0.214/0.153을 얻었다. 제안 시스템의 실험 결과로 규칙기반 시스템은 0.793/0.207/0.328 성능을 보였고, 기계 학습기반 시스템은 0.578/0.730/0.645 성능을 보였다. Recall 성능은 규칙기반 시스템이 0.793으로 가장 좋았고, 기준시스템에 비해서 67.4%의 성능 향상이 있었다. Precision과 F1 성능은 기계학습기반 시스템이 0.730과 0.645로 가장 좋았고, 기준시스템에 비해서 각각 61.6%, 49.2%의 성능향상이 있었다. 기준시스템에 비해서 제안한 시스템이 Recall, Precision, F1 성능이 모두 대폭적으로 향상되었으므로 제안한 방법이 효과적임을 알 수 있다.

  • PDF

Context Prediction based on Sequence Matching for Contexts with Discrete Attribute (이산 속성 컨텍스트를 위한 시퀀스 매칭 기반 컨텍스트 예측)

  • Choi, Young-Hwan;Lee, Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.4
    • /
    • pp.463-468
    • /
    • 2011
  • Context prediction methods have been developed in two ways - one is a prediction for discrete context and the other is for continuous context. As most of the prediction methods have been used with prediction algorithms in specific domains suitable to the environment and characteristics of contexts, it is difficult to conduct a prediction for a user's context which is based on various environments and characteristics. This study suggests a context prediction method available for both discrete and continuous contexts without being limited to the characteristics of a specific domain or context. For this, we conducted a context prediction based on sequence matching by generating sequences from contexts in consideration of association rules between context attributes and by applying variable weights according to each context attribute. Simulations for discrete and continuous contexts were conducted to evaluate proposed methods and the results showed that the methods produced a similar performance to existing prediction methods with a prediction accuracy of 80.12% in discrete context and 81.43% in continuous context.

Developing an Intelligent System for the Analysis of Signs Of Disaster (인적재난사고사례기반의 새로운 재난전조정보 등급판정 연구)

  • Lee, Young Jai
    • Journal of Korean Society of societal Security
    • /
    • v.4 no.2
    • /
    • pp.29-40
    • /
    • 2011
  • The objective of this paper is to develop an intelligent decision support system that is able to advise disaster countermeasures and degree of incidents on the basis of the collected and analyzed signs of disasters. The concepts derived from ontology, text mining and case-based reasoning are adapted to design the system. The functions of this system include term-document matrix, frequency normalization, confidency, association rules, and criteria for judgment. The collected qualitative data from signs of new incidents are processed by those functions and are finally compared and reasoned to past similar disaster cases. The system provides the varying degrees of how dangerous the new signs of disasters are and the few countermeasures to the disaster for the manager of disaster management. The system will be helpful for the decision-maker to make a judgment about how much dangerous the signs of disaster are and to carry out specific kinds of countermeasures on the disaster in advance. As a result, the disaster will be prevented.

  • PDF

Common XML Structure Extracting Algorithm for Applying Data Mining Techniques (데이터마이닝 기법 적용을 위한 공용 XML 구조 추출 알고리즘)

  • Jang, Min-Seok;Bang, Hyun-Jin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.1072-1076
    • /
    • 2005
  • Importance of XML as a target of Data Mining is growing because XML is used generally as a standard markup language for describing structured data. Especially researches have been done about extracting wanted informations by applying association rules to XML documents. But there are few development about solving the problems of method for efficiently obtaining informations from similar kinds of XML documents. To solve the problem this paper tries to suggest the method by which common XML structure is extracted form the same kinds of XML documents having a various XML schemas. The resulted schema structure is supposed to be important one as a preliminary job because it helps us to acquire the useful informations from various kinds of documents by unifying their structures.

  • PDF