• Title/Summary/Keyword: Association Mining

Search Result 1,060, Processing Time 0.027 seconds

Anomaly Intrusion Detection based on Association Rule Mining in a Database System (데이터베이스 시스템에서 연관 규칙 탐사 기법을 이용한 비정상 행위 탐지)

  • Park, Jeong-Ho;Oh, Sang-Hyun;Lee, Won-Suk
    • The KIPS Transactions:PartC
    • /
    • v.9C no.6
    • /
    • pp.831-840
    • /
    • 2002
  • Due to the advance of computer and communication technology, intrusions or crimes using a computer have been increased rapidly while tremendous information has been provided to users conveniently Specially, for the security of a database which stores important information such as the private information of a customer or the secret information of a company, several basic suity methods of a database management system itself or conventional misuse detection methods have been used. However, a problem caused by abusing the authority of an internal user such as the drain of secret information is more serious than the breakdown of a system by an external intruder. Therefore, in order to maintain the sorority of a database effectively, an anomaly defection technique is necessary. This paper proposes a method that generates the normal behavior profile of a user from the database log of the user based on an association mining method. For this purpose, the Information of a database log is structured by a semantically organized pattern tree. Consequently, an online transaction of a user is compared with the profile of the user, so that any anomaly can be effectively detected.

Preventing the Musculoskeletal Disorders using Association Rule - Based on Result of Multiple Logistic Regression - (연관규칙을 이용한 근골격계 질환 예방 - 다변량 로지스틱 회귀분석의 결과를 기반으로 -)

  • Park, Seung-Hun;Lee, Seog-Hwan
    • Journal of the Korea Safety Management & Science
    • /
    • v.9 no.4
    • /
    • pp.29-38
    • /
    • 2007
  • We adapted association rules of data mining in order to investigate the relation among the factors of musculoskeletal disorders and proposed the method of preventing the musculoskeletal disorders associated with multiple logistic regression in previous study. This multiple logistic regression was difficult to establish the method of preventing musculoskeletal disorders in case factors can't be managed by worker himself, i.e., age, gender, marital status. In order to solve this problem, we devised association rules of factors of musculoskeletal disorders and proposed the interactive method of preventing the musculoskeletal disorders, by applying association rules with the result of multiple logistic regression in previous study. The result of correlation analysis showed that prevention method of one part also prevents musculoskeletal disorders of other parts of body.

Discovery Temporal Association Rules in Distributed Database (분산데이터베이스 환경하의 시간연관규칙 적용)

  • Yan Zhao;Kim, Long;Sungbo Seo;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.115-117
    • /
    • 2004
  • Recently, mining far association rules in distributed database environments is a central problem in knowledge discovery area. While the data are located in different share-nothing machines, and each data site grows by time. Mining global frequent itemsets is hard and not efficient in large number of distributed sewen. In many distributed databases. time component(which is usually attached to transactions in database), contains meaningful time-related rules. In this paper, we design a new DTA(distributed temporal association) algorithm that combines temporal concepts inside distributed association rules. The algorithm confirms the time interval for applying association rules in distributed databases. The experiment results show that DTA can generate interesting correlation frequent itemsets related with time periods.

  • PDF

Proposition of causally confirmed measures in association rule mining (인과적 확인 측도에 의한 연관성 규칙 탐색)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.857-868
    • /
    • 2014
  • Data mining is the representative analysis methodology in the era of big data, and is the process to analyze a massive volume database and summarize it into meaningful information. Association rule technique finds the relationship among several items in huge database using the interestingness measures such as support, confidence, lift, etc. But these interestingness measures cannot be used to establish a causality relationship between antecedent and consequent item sets. Moreover, we can not know association direction by them. This paper propose causally confirmed association thresholds to compensate for these problems, and then check the three conditions of interestingness measures. The comparative studies with basic association thresholds, causal association thresholds, and causally confirmed association thresholds are shown by simulation studies. The results show that causally confirmed association thresholds are better than basic and causal association thresholds.

Association rule ranking function by decreased lift influence (향상도 영향 감소화에 의한 연관성 순위결정함수)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.397-405
    • /
    • 2010
  • Data mining is the method to find useful information for large amounts of data in database, and one of the important goals is to search and decide the association for several variables. The task of association rule mining is to find certain association relationships among a set of data items in a database. There are three primary measures for association rule, support and confidence and lift. In this paper we developed a association rule ranking function by decreased lift influence to generate association rule for items satisfying at least one of three criteria. We compared our function with the functions suggested by Park (2010), and Wu et al. (2004) using some numerical examples. As the result, we knew that our decision function was better than the function of Park's and Wu's functions because our function had a value between -1 and 1regardless of the range for three association thresholds. Our function had the value of 1 if all of three association measures were greater than their thresholds and had the value of -1 if all of three measures were smaller than the thresholds.

The application for predictive similarity measures of binary data in association rule mining (이분형 예측 유사성 측도의 연관성 평가 기준 적용 방안)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.495-503
    • /
    • 2011
  • The most widely used data mining technique is to find association rules. Association rule mining is the method to quantify the relationship between each set of items in very huge database based on the association thresholds. There are some basic association thresholds to explore meaningful association rules ; support, confidence, lift, etc. Among them, confidence is the most frequently used, but it has the drawback that it can not determine the direction of the association. The net confidence and the attributably pure confidence were developed to compensate for this drawback, but they have other drawbacks.In this paper we consider some predictive similarity measures for binary data in cluster analysis and multi-dimensional analysis as association threshold to compensate for these drawbacks. The comparative studies with net confidence, attributably pure confidence, and some predictive similarity measures are shown by numerical example.

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining (온라인 연관관계 분석의 장바구니 기준에 대한 연구)

  • Kim, Mi-Sung;Kim, Nam-Gyu
    • CRM연구
    • /
    • v.4 no.2
    • /
    • pp.19-29
    • /
    • 2011
  • There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems.

  • PDF

Association rule thresholds considering the number of possible rules of interest items (관심 항목의 발생 가능한 규칙의 수를 고려한 연관성 평가기준)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.717-725
    • /
    • 2012
  • Data mining is a method to find useful information for large amounts of data in database. One of the well-studied problems in data mining is exploration for association rules. Association rule mining searches for interesting relationships among items in a given database by support, confidence, and lift. If we use the existing association rules, we can commit some errors by information loss not to consider the size of occurrence frequency. In this paper, we proposed a new association rule thresholds considering the number of possible rules of interest items and compare with existing association rule thresholds by example and real data. As the results, the new association rule thresholds were more useful than existing thresholds.

The HCARD Model using an Agent for Knowledge Discovery

  • Gerardo Bobby D.;Lee Jae-Wan;Joo Su-Chong
    • The Journal of Information Systems
    • /
    • v.14 no.3
    • /
    • pp.53-58
    • /
    • 2005
  • In this study, we will employ a multi-agent for the search and extraction of data in a distributed environment. We will use an Integrator Agent in the proposed model on the Hierarchical Clustering and Association Rule Discovery(HCARD). The HCARD will address the inadequacy of other data mining tools in processing performance and efficiency when use for knowledge discovery. The Integrator Agent was developed based on CORBA architecture for search and extraction of data from heterogeneous servers in the distributed environment. Our experiment shows that the HCARD generated essential association rules which can be practically explained for decision making purposes. Shorter processing time had been noted in computing for clusters using the HCARD and implying ideal processing period than computing the rules without HCARD.

  • PDF

A New Interestingness Measure in Association Rules Mining (연관규칙 탐색에서 새로운 흥미도 척도의 제안)

  • Ahn, Kwang-Il;Kim, Seong-Jip
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.29 no.1
    • /
    • pp.41-48
    • /
    • 2003
  • In this paper, we present a new measure to evaluate the interestingness of association rules. Ultimately. to evaluate whether a rule is interesting or not is subjective. However, an interestingness measure is useful in that it shows the cause for pruning uninteresting rules statistically or logically. Some interestingness measures have been developed in association rules mining. We present an overview of interestingness measures and propose a new measure. A comparative study of some interestingness measures is made on an example dataset and a real dataset. Our experiments show that the new measure can avoid the discovery of misleading rules.