• Title/Summary/Keyword: 속성기반 연관규칙

Search Result 32, Processing Time 0.028 seconds

Prediction of Yeast Protein-Protein Interactions by Neural Feature Association Rule (Neural Feature Association Rule을 이용한 효모 단백질-단백질 상호작용의 예측)

  • Eom Jae-Hong;Zhang Byoung-Tak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.277-279
    • /
    • 2005
  • 단백질들은 서로 다른 단백질들과 상호작용하거나 복합물을 형성함으로써 생물학적으로 중요한 기능을 한다고 알려져 있다. 때문에 대부분의 세포작용에 있어 중요한 역할을 하는 단백질들 간의 상호작용 분석 및 예측에 대한 연구는 여러 연구그룹으로부터 풍부한 데이터가 산출된 후게놈시대(post-genomic era)에서 또 하나의 중요한 이슈가 되고 있다. 본 논문에서는 효모에 대해 공개되어있는 단백질 상호작용 데이터들에서 속성들 간의 연관규칙 학습을 통해 잠재적 단백질 상호작용들을 예측하기 위한 연관규칙 기반의 상호작용 예측 방법을 제시한다. 단백질들 간의 상호작용 예측을 위해 고려되는 각 단백질의 다수의 속성차원은 정보이론 기반의 속성선택 알고리즘을 이용하여 효율적으로 줄이며 상호작용의 속성집합을 이용하여 신경망을 훈련시키고 이렇게 훈련된 신경망에서 속성들 간의 연관규칙을 디코딩하여 연관규칙 기반의 상호작용 예측에 활용한다. 연관속성 발굴을 통한 상호작용 예측을 위한 마이닝 방법으로는 연관규칙 발견 알고리즘을 사용하였으며 예측 정확도를 높이기 위하여 신경망 예측 모델의 학습 결과를 디코딩한 규칙들이 추가적으로 사용하였다. 논문에서 제안한 방법을 발견된 연관규칙을 통한 단백질 상호작용 예측문제에 있어 평균 약 $94.5\%$의 예측 정확도를 보였다.

  • PDF

Mining Quantitative Association Rules using Commercial Data Mining Tools (상용 데이타 마이닝 도구를 사용한 정량적 연관규칙 마이닝)

  • Kang, Gong-Mi;Moon, Yang-Sae;Choi, Hun-Young;Kim, Jin-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.2
    • /
    • pp.97-111
    • /
    • 2008
  • Commercial data mining tools basically support binary attributes only in mining association rules, that is, they can mine binary association rules only. In general, however. transaction databases contain not only binary attributes but also quantitative attributes. Thus, in this paper we propose a systematic approach to mine quantitative association rules---association rules which contain quantitative attributes---using commercial mining tools. To achieve this goal, we first propose an overall working framework that mines quantitative association rules based on commercial mining tools. The proposed framework consists of two steps: 1) a pre-processing step which converts quantitative attributes into binary attributes and 2) a post-processing step which reconverts binary association rules into quantitative association rules. As the pre-processing step, we present the concept of domain partition, and based on the domain partition, we formally redefine the previous bipartition and multi-partition techniques, which are mean-based or median-based techniques for bipartition, and are equi-width or equi-depth techniques for multi-partition. These previous partition techniques, however, have the problem of not considering distribution characteristics of attribute values. To solve this problem, in this paper we propose an intuitive partition technique, named standard deviation minimization. In our standard deviation minimization, adjacent attributes are included in the same partition if the change of their standard deviations is small, but they are divided into different partitions if the change is large. We also propose the post-processing step that integrates binary association rules and reconverts them into the corresponding quantitative rules. Through extensive experiments, we argue that our framework works correctly, and we show that our standard deviation minimization is superior to other partition techniques. According to these results, we believe that our framework is practically applicable for naive users to mine quantitative association rules using commercial data mining tools.

Association rule Mining between Climate factors and Fruits yields (과실 생산량과 기상요소간의 연관분석 마이닝)

  • Woo, Jong-Seon;Batbaatar, Erdenbileg;Ryu, Keun-Ho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.01a
    • /
    • pp.23-25
    • /
    • 2016
  • 이 논문에서는 기후조건과 농업 생산량을 포함하는 농업/기상 데이터에 데이터 마이닝의 연관규칙 기법을 적용하여 농업 생산의 기반이 되는 기후요인들과 생산량 간의 연관성을 분석하고자 한다. 기후 속성들의 값을 포함하고 있는 기상청 기후 데이터와 농업 생산량을 포함하는 통계청의 데이터를 통합 한 후 기후 속성들의 값을 이산화 하여 연관규칙 기법을 적용한다. 실험 결과 각 기후요소들과 생산량 간의 연관 규칙을 표현 할 수 있었다. 이를 통해 기후조건 변화에 따른 농업생산기반 취약성을 예방하는 지표를 마련하고 농업 생산성 향상에 기여 할 수 있을 것으로 기대한다.

  • PDF

Prediction of Implicit Protein - Protein Interaction Using Optimal Associative Feature Rule (최적 연관 속성 규칙을 이용한 비명시적 단백질 상호작용의 예측)

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.4
    • /
    • pp.365-377
    • /
    • 2006
  • Proteins are known to perform a biological function by interacting with other proteins or compounds. Since protein interaction is intrinsic to most cellular processes, prediction of protein interaction is an important issue in post-genomic biology where abundant interaction data have been produced by many research groups. In this paper, we present an associative feature mining method to predict implicit protein-protein interactions of Saccharomyces cerevisiae from public protein interaction data. We discretized continuous-valued features by maximal interdependence-based discretization approach. We also employed feature dimension reduction filter (FDRF) method which is based on the information theory to select optimal informative features, to boost prediction accuracy and overall mining speed, and to overcome the dimensionality problem of conventional data mining approaches. We used association rule discovery algorithm for associative feature and rule mining to predict protein interaction. Using the discovered associative feature we predicted implicit protein interactions which have not been observed in training data. According to the experimental results, the proposed method accomplished about 96.5% prediction accuracy with reduced computation time which is about 29.4% faster than conventional method with no feature filter in association rule mining.

Temporal Associative Classification based on Calendar Patterns (캘린더 패턴 기반의 시간 연관적 분류 기법)

  • Lee Heon Gyu;Noh Gi Young;Seo Sungbo;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.6
    • /
    • pp.567-584
    • /
    • 2005
  • Temporal data mining, the incorporation of temporal semantics to existing data mining techniques, refers to a set of techniques for discovering implicit and useful temporal knowledge from temporal data. Association rules and classification are applied to various applications which are the typical data mining problems. However, these approaches do not consider temporal attribute and have been pursued for discovering knowledge from static data although a large proportion of data contains temporal dimension. Also, data mining researches from temporal data treat problems for discovering knowledge from data stamped with time point and adding time constraint. Therefore, these do not consider temporal semantics and temporal relationships containing data. This paper suggests that temporal associative classification technique based on temporal class association rules. This temporal classification applies rules discovered by temporal class association rules which extends existing associative classification by containing temporal dimension for generating temporal classification rules. Therefore, this technique can discover more useful knowledge in compared with typical classification techniques.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

Design of knowledge search algorithm for PHR based personalized health information system (PHR 기반 개인 맞춤형 건강정보 탐사 알고리즘 설계)

  • SHIN, Moon-Sun
    • Journal of Digital Convergence
    • /
    • v.15 no.4
    • /
    • pp.191-198
    • /
    • 2017
  • It is needed to support intelligent customized health information service for user convenience in PHR based Personal Health Care Service Platform. In this paper, we specify an ontology-based health data model for Personal Health Care Service Platform. We also design a knowledge search algorithm that can be used to figure out similar health record by applying machine learning and data mining techniques. Axis-based mining algorithm, which we proposed, can be performed based on axis-attributes in order to improve relevance of knowledge exploration and to provide efficient search time by reducing the size of candidate item set. And K-Nearest Neighbor algorithm is used to perform to do grouping users byaccording to the similarity of the user profile. These algorithms improves the efficiency of customized information exploration according to the user 's disease and health condition. It can be useful to apply the proposed algorithm to a process of inference in the Personal Health Care Service Platform and makes it possible to recommend customized health information to the user. It is useful for people to manage smart health care in aging society.

Association Rules Extraction from GML Data (GML 데이터에서 연관규칙 추출)

  • Kim, Eui-Chan;Hwang, Byung-Yeon
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 2005.11a
    • /
    • pp.55-60
    • /
    • 2005
  • 지리 공간 정보에 대한 관심 증가와 더불어 활용 분야도 다양해지고 있다. OGC(Open GIS Consortium)에서는 XML(extensible Markup Language)을 GIS 분야에 도입한 GML(Geography Markup Language)을 개발하였으며 여러 활용 분야에서 GML을 사용하고 계속적으로 연구되고 있다. 본 연구에서는 기존의 XML 문서를 기반으로 연구되었던 데이터 마이닝 방법 중 하나인 연관규칙을 GML 데이터에 사용하여 의미 있는 규칙을 찾아내려 한다. 규칙을 찾는 방법에는 2가지가 있을 수 있는데 하나는 GML 데이터의 내용만을 뽑아내어 그에 따른 규칙을 찾아내는 방법이고, 다른 하나는 사용된 태그와 속성을 기반으로 규칙을 찾아내는 방법이다. 본 연구에서는 2가지 방법을 통해 규칙을 찾는 것에 대하여 기술할 것이다. 본 연구를 바탕으로 GML문서를 사용하는 여러 분야에서 기본 정보뿐만 아니라 함축적이고 의미 있는 정보도 얻어 낼 수 있을 것으로 기대한다.

  • PDF

A Study on Data Association-Rules Mining of Content-Based Multimedia (내용 기반의 멀티미디어 데이터 연관규칙 마이닝에 대한 연구)

  • Kim, Jin-Ok;Hwang, Dae-Jun
    • The KIPS Transactions:PartD
    • /
    • v.9D no.1
    • /
    • pp.57-64
    • /
    • 2002
  • Few studies have been systematically pursued on a multimedia data mining in despite of the overwhelming amounts of multimedia data by the development of computer capacity, storage technology and Internet. Based on the preliminary image processing and content-based image retrieval technology, this paper presents the methods for discovering association rules from recurrent items with spatial relationships in huge data repositories. Furthermore, multimedia mining algorithm is proposed to find implicit association rules among objects of which content-based descriptors such as color, texture, shape and etc. are recurrent and of which descriptors have spatial relationships. The algorithm with recurrent items in images shows high efficiency to find set of frequent items as compared to the Apriori algorithm. The multimedia association-rules algorithm is specially effective when the collection of images is homogeneous and it can be applied to many multimedia-related application fields.

A Method for Mining Interval Event Association Rules from a Set of Events Having Time Property (시간 속성을 갖는 이벤트 집합에서 인터벌 연관 규칙 마이닝 기법)

  • Han, Dae-Young;Kim, Dae-In;Kim, Jae-In;Na, Chol-Su;Hwang, Bu-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.16D no.2
    • /
    • pp.185-190
    • /
    • 2009
  • The event sequence of the same type from a set of events having time property can be summarized in one event. But if the event sequence having an interval, It is reasonable to be summarized more than one in independent sub event sequence of each other. In this paper, we suggest a method of temporal data mining that summarizes the interval events based on Allen's interval algebra and finds out interval event association rule from interval events. It provides better knowledge than others by using concept of an independent sub sequence and finding interval event association rules.