• Title/Summary/Keyword: Association Rules Mining

Search Result 308, Processing Time 0.035 seconds

A Post-analysis of the Association Rule Mining Applied to Internee Shopping Mall

  • Kim, Jae-Kyeong;Song, Hee-Seok
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.06a
    • /
    • pp.253-260
    • /
    • 2001
  • Understanding and adapting to changes of customer behavior is an important aspect for a company to survive in continuously changing environment. The aim of this paper is to develop a methodology which detects changes of customer behavior automatically from customer profiles and sales data at different time snapshots. For this purpose, we first define three types of changes as emerging pattern, unexpected change and the added / perished rule. Then we develop similarity and difference measures for rule matching to detect all types of change. Finally, the degree of change is evaluated to detect significantly changed rules. Our proposed methodology can evaluate degree of changes as well as detect all kinds of change automatically from different time snapshot data. A case study for evaluation and practical business implications for this methodology are also provided.

  • PDF

The Development of Relative Interestingness Measure for Comparing with Degrees of Association

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1269-1279
    • /
    • 2008
  • Data mining is the technique to find useful information in huge databases. One of the well-studied problems in data mining is exploration for association rules. An association rule technique finds the relation among each items in massive volume databases by several interestingness measures. An important and useful classification scheme of interestingness measures may be based on user-involvement. This results in two categories - objective and subjective measures. This paper present some relative interestingess measures to compare with degrees of association for two groups. A comparative study with some relative interestingness measures is shown by numerical example. The results show that the relative net confidence is the best relative interestingness measure.

  • PDF

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

Effective Studying Methods during a School Vacation: A Data Mining Approach (데이타 마이닝을 사용한 방학 중 학습방법과 학업성취도의 관계 분석)

  • Kim, Hea-Suk;Moon, Yang-Sae;Kim, Jin-Ho;Loh, Woong-Kee
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.40-51
    • /
    • 2007
  • To improve academic achievement, the most students not only participate in regular classes but also take various extra programs such as private lessons, private institutes, and educational TV programs. In this paper, we propose a data mining approach to identify which studying methods or usual life patterns during a school vacation affect changes in the academic achievement. First, we derive various studying methods and life patterns that are thought to be affecting changes in the academic achievement during a school vacation. Second, we propose the method of transforming and analyzing data to apply them to decision trees and association rules, which are representative data mining techniques. Third, we construct decision trees and find association rules from the real survey data of middle school students. We have discovered four representative results from the decision trees. First, for students in the higher rank, there is a tendency that private institutes give a positive effect on the academic achievement. Second, for the most students, the Internet teaming sites nay give a negative effect on the achievement. Third, private lessons that have thought to be making a large impact to the achievement, however, do not make a positive effect on the achievement. Fourth, taking several studying methods in parallel nay give a negative effect on the achievement. In association rules, however, we cannot find any meaningful relationships between academic achievement and usual life patterns during a school vacation. We believe that our approach will be very helpful for teachers and parents to give a good direction both in preparing a studying plan and in selecting studying methods during a school vacation.

Development of Automatic Rule Extraction Method in Data Mining : An Approach based on Hierarchical Clustering Algorithm and Rough Set Theory (데이터마이닝의 자동 데이터 규칙 추출 방법론 개발 : 계층적 클러스터링 알고리듬과 러프 셋 이론을 중심으로)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.6
    • /
    • pp.135-142
    • /
    • 2009
  • Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for analysis of large data sets. The major techniques used in data mining are mining association rules, classification and clustering. Since these techniques are used individually, it is necessary to develop the methodology for rule extraction using a process of integrating these techniques. Rule extraction techniques assist humans in analyzing of large data sets and to turn the meaningful information contained in the data sets into successful decision making. This paper proposes an autonomous method of rule extraction using clustering and rough set theory. The experiments are carried out on data sets of UCI KDD archive and present decision rules from the proposed method. These rules can be successfully used for making decisions.

Semi-Automatic Ontology Generation about XML Documents using Data Mining Method (데이터 마이닝 기법을 이용한 XML 문서의 온톨로지 반자동 생성)

  • Gu Mi-Sug;Hwang Jeong-Hee;Ryu Keun-Ho;Hong Jang-Eui
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.299-308
    • /
    • 2006
  • As recently XML is becoming the standard of exchanging web documents and public documentations, XML data are increasing in many areas. To retrieve the information about XML documents efficiently, the semantic web based on the ontology is appearing. The existing ontology has been constructed manually and it was time and cost consuming. Therefore in this paper, we propose the semi-automatic ontology generation technique using the data mining technique, the association rules. The proposed method solves what type and how many conceptual relationships and determines the ontology domain level for the automatic ontology generation, using the data mining algorithm. Appying the association rules to the XML documents, we intend to find out the conceptual relationships to construct the ontology, finding the frequent patterns of XML tags in the XML documents. Using the conceptual ontology domain level extracted from the data mining, we implemented the semantic web based on the ontology by XML Topic Maps (XTM) and the topic map engine, TM4J.

Affinity Analysis Between Factors of Fatal Occupational Accidents in Construction Using Data Mining Techniques (데이터마이닝 기법을 활용한 건설 중대 재해요인 간 연관성 분석)

  • Lim, Jiseon;Han, Sanguk;Kang, Youngcheol;Kang, Sanghyeok
    • Korean Journal of Construction Engineering and Management
    • /
    • v.22 no.5
    • /
    • pp.29-38
    • /
    • 2021
  • Governments and companies are trying to reduce occupational accidents in the construction industry; however, the number of disasters are not decreasing significantly. This study aims to identify the correlation between factors affecting construction disasters quantitatively. To this end, 1,197 cases of serious disasters provided by Korea Occupational Safety and Health Administration (KOSHA) were analyzed using affinity analysis, one of the data mining techniques. The data from KOSHA were preprocessed and analyzed with variables of accident type, project type, activity type, original cause materials, sensory temperature, time of the accident, and fall height, and the association rules were derived for fall accidents and the others. For fall accidents, 64 association rules with lift ratios of 1.38 or greater were derived, and for the other accidents, 59 association rules with lift ratios of 1.54 or greater were derived. After analyzing the derived association rules focusing on the relationship among accident factors, this study presented the significance of applying the affinity analysis to address the study's limitations. The significance of this study can be found in that the correlation among factors affecting construction accidents is presented quantitatively.

Group-wise Keyword Extraction of the External Audit using Text Mining and Association Rules (텍스트마이닝과 연관규칙을 이용한 외부감사 실시내용의 그룹별 핵심어 추출)

  • Seong, Yoonseok;Lee, Donghee;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.1
    • /
    • pp.77-89
    • /
    • 2022
  • Purpose: In order to improve the audit quality of a company, an in-depth analysis is required to categorize the audit report in the form of a text document containing the details of the external audit. This study introduces a systematic methodology to extract keywords for each group that determines the differences between groups such as 'audit plan' and 'interim audit' using audit reports collected in the form of text documents. Methods: The first step of the proposed methodology is to preprocess the document through text mining. In the second step, the documents are classified into groups using machine learning techniques and based on this, important vocabularies that have a dominant influence on the performance of classification are extracted. In the third step, the association rules for each group's documents are found. In the last step, the final keywords for each group representing the characteristics of each group are extracted by comparing the important vocabulary for classification with the important vocabulary representing the association rules of each group. Results: This study quantitatively calculates the importance value of the vocabulary used in the audit report based on machine learning rather than the qualitative research method such as the existing literature search, expert evaluation, and Delphi technique. From the case study of this study, it was found that the extracted keywords describe the characteristics of each group well. Conclusion: This study is meaningful in that it has laid the foundation for quantitatively conducting follow-up studies related to key vocabulary in each stage of auditing.

An Efficient Search Method for High Confidence Association Rules Using CP(Confidence Pattern)-Tree Structure (CP-Tree구조를 이용한 높은 신뢰도를 갖는 연관 규칙의 효율적 탐색 방법)

  • 송한규;김재련
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-8
    • /
    • 2002
  • The traditional approaches of association rule mining have relied on high support condition to find interesting rules. However, in some application such as analyzing the web page link and discovering some unusual combinations of some factors that have always caused some disease, we are interested in rules with high confidence that have very low support or need not have high support. In these cases, the traditional algorithms are not suitable since it relies on first satisfying high support. In this paper, we propose a new model, CP(Confidence Pattern)-Tree, to identify high confidence rule between 2-items without support constraint. constraint. In addition, we discuss confidence association rule between two more items without support constraint.

A Study on the Advanced Association Rules Algorithm of n-Items (개선된 n-항목 연관 규칙 알고리즘 연구)

  • 황현숙;어윤양
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.27 no.4
    • /
    • pp.29-39
    • /
    • 2002
  • The transaction tables of the existing association algorithms have two column attributes : It is composed of transaction identifier (Transaction_id) and an item identifier (item). In this kind of structure, as the volume of data becomes larger, the performance for the SQL query statements came applicable decreases. Therefore, we propose advanced association rules algorithm of n-items which can transact multiple items (Transaction_id, Item 1, Item 2…, Item n). In this structure, performance hours can be contracted more than the single item structures, because count can be computed by query of the input transaction tables. Our experimental results indicate that performance of the n items structure is up to 2 times better than the single item. As a result of this paper, the proposed algorithm can be applied to internet shopping, searching engine and etc.